awk

Row Manipulation

Extract specific lines from a text file

Suppose we have a file

cat -n lorem_ipsum.txt
 1    Lorem Ipsum is simply dummy text of the printing and typesetting industry.
 2    Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.
 3    It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
 4    It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum

We want to extract lines 2 and 3 from this file

awk 'NR==2,NR==3' lorem_ipsum.txt

This will print lines 2 and 3:

 2    Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.
 3    It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.

Extract specific column/field from specific line

If you have the following data file

cat data.csv
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50

maybe you need to read the fourth column of the third line, this would be “24”

awk 'NR==3 { print $4 }' data.csv

gives

24

Modifying rows on-the-fly (e.g. to fix Windows line-endings)

If a file may contain Windows or Unix-like line endings (or even a mixture of both) then the intended text replacement may not work as expected.

Sample:

$ echo -e 'Entry 1\nEntry 2.1\tEntry 2.2\r\nEntry 3\r\n\r\n' \
> | awk -F'\t' '$1 != "" { print $1 }' \
> | hexdump -c
0000000   E   n   t   r   y       1  \n   E   n   t   r   y       2   .
0000010   1  \n   E   n   t   r   y       3  \r  \n  \r  \n            
000001d

This can be easily fixed by an additional rule which is inserted at the beginning of the awk script:

/\r$/ { $0 = substr($0, 1, length($0) - 1) }

Because the action does not end with next, the following rules are applied as before.

Sample (with fix of line-endings):

$ echo -e 'Entry 1\nEntry 2.1\tEntry 2.2\r\nEntry 3\r\n\r\n' \
> | awk -F'\t' '/\r$/ { $0 = substr($0, 1, length($0) - 1) } $1 != "" { print $1 }' \
> | hexdump -c
0000000   E   n   t   r   y       1  \n   E   n   t   r   y       2   .
0000010   1  \n   E   n   t   r   y       3  \n                        
000001a

This modified text is an extract of the original Stack Overflow Documentation created by the contributors and released under CC BY-SA 3.0 This website is not affiliated with Stack Overflow