Skip to main content

How to add a column with a number or string that changes every time it finds a number on other colu [Resolved]

I have a file with different columns (three in this simplified example). The rows contain data of different replicates (Replicate_A, Replicate_B, Replicate_C). However, there is not the same amount of rows per replicate (It can be 3, 4, 5 rows for example). The only thing I know is that each replicate starts with the number 1. I would like to add an extra column with the Replicate name. Any suggestion of how to add that column? I could create an extra file with the list of names to add in this extra column. Any suggestion is helpful.

The file I have is a tab delimited file.

1 x x  
2 x x  
3 x x  
4 x x  
1 x x  
2 x x  
3 x x  
1 x x  
2 x x  
3 x x

The file I want to have

1 x x Replicate_A
2 x x Replicate_A
3 x x Replicate_A
4 x x Replicate_A
1 x x Replicate_B
2 x x Replicate_B
3 x x Replicate_B
1 x x Replicate_C
2 x x Replicate_C
3 x x Replicate_C

Question Credit: Andrea Cabrera
Question Reference
Asked September 21, 2019
Posted Under: Unix Linux
21 views
2 Answers

If you'd be OK using numbers instead of letters, you can do this very easily in awk (file has the output of running sed -i 's/ */\t/g' on your example data, to replace all consecutive spaces with tabs since you said your data are tab separated):

$ awk -F"\t" -vOFS="\t" '{if($1==1){num++}{print $0,"Replicate_"num}}' file 
1   x   x       Replicate_1
2   x   x       Replicate_1
3   x   x       Replicate_1
4   x   x       Replicate_1
1   x   x       Replicate_2
2   x   x       Replicate_2
3   x   x       Replicate_2
1   x   x       Replicate_3
2   x   x       Replicate_3
3   x   x   Replicate_3

If you need letters, it's a little more complex, but not too bad:

$ awk '{
        if(NR==FNR){
            a[++n]=$1
        }
        else{
            if($1==1){
                num++
            }
            print $0,"Replicate_"a[num]
        }
       }' <(printf '%s\n' {A..Z}) file
1   x   x    Replicate_A
2   x   x    Replicate_A
3   x   x    Replicate_A
4   x   x    Replicate_A
1   x   x    Replicate_B
2   x   x    Replicate_B
3   x   x    Replicate_B
1   x   x    Replicate_C
2   x   x    Replicate_C
3   x   x Replicate_C

credit: terdon
Answered September 21, 2019
Your Answer
D:\Adnan\Candoerz\CandoProject\vQA