UNIX awk

#awk In awk, a file is treated as a sequence of records, and by default each line is a record and every word is a field.

$ cat text
this is the first line
second one
third one
who cares anymore?
....

$ awk '{print $1}' text
               ^
     print the first field of every record

this
second
third
who
....

$ awk '{print $1 $2}' text
               ^  ^
         print field 1 and 2
thisis
secondone
thirdone
whocares
....

$ awk '{print $1, $2}' text
                ^
         the comma acts as a space
this is
second one
third one
who cares
....

###Using patterns

$ awk '/who/' text
       ^   ^
   this would return all lines containing the word 'who'

who cares anymore?

$ awk '/e$/' text
         ^
     return all lines that ends in an 'e'
       
this is the first line
second one
third one

$ awk '/[a-z]/' text
          ^
    return lines that contain lowercase letters

this is the first line
second one
third one
who cares anymore?

$ awk '/d+ / text
        ^ ^
  return all lines that contain one or more letters 'd' followed by a space

second one
third one

$ awk '/\.+|^t/' text
         ^   ^
   all lines that contain one or more dots or start with a 't'

this is the first line
third one
....

###awk variables

FS: input field separator value

OFS: output field separator value

NF: number of fields on the current line

NR: number of record in the current file

RS: record separator value

ORS: output record separator

FILENAME: current file name being processed

$ awk '{print NR}' text
1
2
3
4
5

$ awk 'END{print NR}' text
        ^
  END indicates that the next statement should be executed one awk 
  has finish processing the whole file
5

$ awk 'END{print $1}' text
        ^         ^
    print the first word of the last line
....

$ awk 'BEGIN{print "STARTING..."};{print NR, $1};END{print "WORK COMPLETED..."}' text
        ^                         
    BEGIN indicates that the next statement should be executed 
    before the file is processed
STARTING...
1 this
2 second
3 third
4 who
5 ....
WORK COMPLETED...

$ awk '{if(NR~/^2/)print}' text
             ^
      ~ indicates a regular expression match (this would return 
      lines number 2, 21, 22, 23 ...)
second one

$ awk '{if(NR!~/^2/)print}' text
           ^
   !~ negates the match (this would return lines
   number 1, 3, 4, 5, ..., 19, 30, 31 ...
this is the first line
third one
who cares anymore?
....

$ awk '{if(NR%2==0)print}' text
             ^ ^
     return even number lines (2, 4, 6 ...)
second one
who cares anymore?

###Using pipe to pass information to awk

$ ls *.sh
$ ls | awk '/\.sh$/'
              ^
      filter files that end in '.sh'
pia_upscript.sh

$ ls | awk '/^pia_[a-z]+$/'
                ^ ^   ^   ^
    files that start with 'pia_' and then continue
    and end with one or more lower case letters
pia_psid
pia_route

$ ls | awk '/[.](err|out|sh)$/'
              ^   ^   ^   ^ ^
      files terminated by '.err', '.out' or '.sh'

AlTest1.err
AlTest1.out
pia_upscript.sh

$ ifconfig en0 | awk '{if(NR==4)print $2}'
192.168.1.102

$ echo 'http://192.168.0.1' | awk 'BEGIN{FS="//"}{print $2}'
                                              ^
                                    replace the field (word) separator by //
                                    and print the second field
192.168.0.1

$ man awk | awk 'BEGIN{var=0};{if(/print/){var++}};END{print var};'
                        ^           ^       ^                 ^
        count the number of times that the word 'print' is found in the 
        awk man pages
10

$ cat numbers
Renzo 90 48 68 29 100
Marleen 29 49 87 90 77
Karl 83 84 10 40 90
Celine 100 89 20 100 78

$ cat numbers | awk 'BEGIN{print "Averages";print "----------";total=0};{var=0};{var=($2+$3+$4+$5+$6)/5};{total+=var};{print $1, var};END{print "----------";print "AVG:", total/NR}'
                      ^1                                                  ^2                ^3              ^4            ^5            ^6                                    ^7   
^1 before processing the file's lines, set variable total to 0 and print a message
LOOP
^2 for every line, set variable var to 0
^3 sum line's numbers, divide them by 5 and then assign that value to var
^4 increase variable total by variable val's value
^5 print the first word of the line followed by the variable var's value
END
^6 after processing all lines, print a message
^7 print the value of variable total divided by the number of lines processed

Averages
----------
Renzo 67
Marleen 66.4
Karl 61.4
Celine 77.4
----------
AVG: 68.05

###Executing system calls

$ echo "js .sh" | awk '{if($2==".sh"){system("ls | grep " $2 "$")}}'
                                 ^                         ^
                 if the second word of the line is equal to '.sh', then
                 execute ls | grep '.sh$'
pia_upscript.sh

rcanepa/awk.md