awk

Built-in Variables

FS - Field Separator

Used by awk to split each record into multiple fields:

echo "a-b-c
d-e-f" | awk 'BEGIN {FS="-"} {print $2}'

will result in:

b
e

The variable FS can also be set using the option -F:

echo "a-b-c
d-e-f" | awk -F '-' '{print $2}'

By default, the fields are separated by whitespace (spaces and tabs) and multiple spaces and tabs count as a single separator.

RS - Record Separator

Used by awk to split the input into multiple records. For example:

echo "a b c|d e f" | awk 'BEGIN {RS="|"} {print $0}'

produces:

a b c
d e f

By default, the record separator is the newline character.

Similarly: echo “a b c|d e f” | awk ‘BEGIN {RS=”|”} {print $2}’

produces:

b
e

OFS - Output Field Separator

Used by awk to separate fields output by the print statement. For example:

echo "a b c
d e f" | awk 'BEGIN {OFS="-"} {print $2, $3}'

produces:

b-c
e-f

The default value is , a string consisting of a single space.

ORS - Output Record Separator

Used by awk to separate records and is output at the end of every print statement. For example:

echo "a b c
d e f" | awk 'BEGIN {ORS="|"} {print $2, $3}'

produces:

b c|e f

The default value is \n (newline character).

ARGV, ARGC - Array of Command Line Arguments

Command line arguments passed to awk are stored in the internal array ARGV of ARGC elements. The first element of the array is the program name. For example:

awk 'BEGIN {
   for (i = 0; i < ARGC; ++i) {
      printf "ARGV[%d]=\"%s\"\n", i, ARGV[i]
   }
}' arg1 arg2 arg3

produces:

ARGV[0]="awk"
ARGV[1]="arg1"
ARGV[2]="arg2"
ARGV[3]="arg3"

FS - Field Separator

The variable FS is used to set the input field separator. In awk, space and tab act as default field separators. The corresponding field value can be accessed through $1, $2, $3… and so on.

awk -F'=' '{print $1}' file
  • -F - command-line option for setting input field separator.

    awk ‘BEGIN { FS=”=” } { print $1 }’ file

OFS - Output Field Separator

This variable is used to set the output field separator which is a space by default.

awk -F'=' 'BEGIN { OFS=":" } { print $1 }' file

Example:

$ cat file.csv 
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5

$ awk -F',' 'BEGIN { OFS="|" } { $1=$1 } 1' file.csv
col1|col2|col3|col4
col1|col2|col3
col1|col2
col1
col1|col2|col3|col4|col5

Assigning $1 to $1 in $1=$1 modifies a field ($1 in this case) and that results in awk rebuilding the record $0. Rebuilding the record replaces the delimiters FS with OFS.

RS - Input Record Separator

ORS - Output Record Separator

NF - Number of Fields

NR - Total Number of Records

Will provide the total number of records processed in the current awk instance.

cat > file1
suicidesquad
harley quinn
joker
deadshot

cat > file2
avengers
ironman
captainamerica
hulk

awk '{print NR}' file1 file2
1
2
3
4
5
6
7
8

A total on 8 records were processed in the instance.

FNR - Number of Records in File

Provides the total number of records processed by the awk instance relative to the files awk is processing

cat > file1
suicidesquad
harley quinn
joker
deadshot

cat > file2
avengers
ironman
captainamerica
hulk

awk '{print FNR}' file1 file2
1
2
3
4
1
2
3
4

Each file had 4 lines each, so whenever awk encountered an EOF FNR was reset to 0.

NF - Number of Fields

Provides the number of columns or fields in each record (record corresponds to each line). Each line is demarcated by RS which defaults to newline.

cat > file1
Harley Quinn Loves Joker
Batman Loves Wonder Woman
Superman is not dead
Why is everything I type four fielded!?

awk '{print NF}' file1
4
4
4
7

FS (somewhere up there) defaults to tab or space. So Harley, Quinn, Loves, Joker are each considered as columns. The case holds for the next two lines, but the last line has 7 space separated words, which means 7 columns.

FNR - The Current Record Number being processed

FNR contains the number of the input file row being processed. In this example you will see awk starting on 1 again when starting to process the second file.

Example with one file

$ cat file1
AAAA
BBBB
CCCC
$ awk '{ print FNR }' file1
1
2
3

Example with two files

$ cat file1
AAAA
BBBB
CCCC
$ cat file2
WWWW
XXXX
YYYY
ZZZZ
$ awk '{ print FNR, FILENAME, $0 }' file1 file2
1 file1 AAAA
2 file1 BBBB
3 file1 CCCC
1 file2 WWWW
2 file2 XXXX
3 file2 YYYY
4 file2 ZZZZ

Extended example with two files

FNR can be used to detect if awk is processing the first file since NR==FNR is true only for the first file. For example, if we want to join records from files file1 and file2 on their FNR:

$ awk 'NR==FNR { a[FNR]=$0; next } (FNR in a) { print FNR, a[FNR], $1 }' file1 file2
1 AAAA WWWW
2 BBBB XXXX
3 CCCC YYYY

Record ZZZZ from file2 is missing as FNR has different max value for file1 and file2 and there is no join for differing FNRs.


This modified text is an extract of the original Stack Overflow Documentation created by the contributors and released under CC BY-SA 3.0 This website is not affiliated with Stack Overflow