Introduction to awk

awk is a popular text processing utility in Unix operating system. It is useful in analyzing the text files, especially those which are organized into rows and columns.

By the way, awk stands for the names of its inventors Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan who wrote it in 1977 at Bell Laboratories.

Using awk

Let us use a file empl.txt for the first few examples, which has names of employees, their (dummy) ids and departments

$cat empl.txt
Suresh 1 ADM
Ramesh 2 TECH
Sita 3 TECH
Maran 4 ADM
Shreya 5 SAL
Partha 6 SAL
Vandana 7 ADM
Nandini 8 ADM
Sudama 9 SAL

First let us use awk to print only employees from ADM department.

$ awk '/ADM/' empl.txt
Suresh 1 ADM
Maran 4 ADM
Vandana 7 ADM
Nandini 8 ADM

Here the parameters for awk command must be written in single quotes. This should be followed by input file on which awk command must be applied. Output of the command is displayed on screen.

The command '/string/' filters only those lines which contain the given string. So /ADM/ will give us lines containing ADM.

Similarly, if we want to display only let us say, Vandana's record, we can use command

awk '/Vandana/' empl.txt

But what if we do not want to display all columns? For this we can use predefined variables.

Print command

print command in awk can be used to print selected columns as follows

$1 - 1st column

$2 - 2nd column etc.

$0 - entire line

Let us print only 1st and second columns from empl.txt

$ awk '/ADM/ {print $1,$2}' empl.txt
Suresh 1
Maran 4
Vandana 7
Nandini 8

Here /ADM/ is the condition to be applied and print tells awk which columns are to be printed in the output lines.

The condition need not always be string. We can use numerical fields and operators <, > or ==.

$ awk '$2>5 {print $1,$2}' empl.txt
Partha 6
Vandana 7
Nandini 8
Sudama 9

Here our condition is value of second column must be greater than 5. So only last 4 records are printed.

$ awk '$2==5 {print $1,$2}' empl.txt
Shreya 5

This will only print the record where employee id (second column) is equal to 5.

Similarly
$awk '$1=="Shreya" {print $1,$2}' empl.txt
Shreya 5
Will only print Shreya's record

Now let us change the file a little bit. One of the lines has Sal instead SAL as department.

$ cat empl.txt
Suresh 1 ADM
Ramesh 2 TECH
Sita 3 TECH
Maran 4 ADM
Shreya 5 SAL
Partha 6 Sal
Vandana 7 ADM
Nandini 8 ADM
Sudama 9 SAL

To print all people in Sales department, we can use regex within slashes like this.

$ awk '$3~/S[aA][lL]/ {print $1,$2,$3}' empl.txt
Shreya 5 SAL
Partha 6 Sal
Sudama 9 SAL

[aA] is regex way of saying either lower case a or A. So we are searching for "Sal" or "SAL". When a filed has to be compared using regex, we should use tilde ~, instead of ==

awk commands in a file

For complex awk expressions, we can store awk commands in a file and use the file for processing.

Let us create a text file called awk3.sh.

$cat >awk3.sh
#!/bin/bash
awk '/SAL/ {print $1, $2}' empl.txt

This file has our awk command. Now to execute this we must give execute permission to this file. Let us do that and run the file.

$ chmod u+x awk3.sh
$ ./awk3.sh
Shreya 5
Sudama 9

Do you want title for these columns? That can be done with BEGIN in awk script.

Here is our awk3.sh file now.

#!/bin/bash
awk 'BEGIN {print "Name", "Id"}
/SAL/ {print $1, $2}' empl.txt

Here BEGIN script is executed only once, before processing any line. So the title is printed at the top.

If you want formatting for your output, you can experiment with printf and format specifier like C language specifiers (printf "%-7s %4d",$1,$2 )

Here is the output with title now.
./awk3.sh
Name Id
Shreya 5
Sudama 9

Just like BEGIN script, you can even use END script which will be executed only once at the end.

Let us change our awk script file to

#!/bin/bash
awk 'BEGIN { print "================="
print "Name", "Id"
print"================="}
/SAL/ {print $1, $2}
END {print "=================="}' empl.txt

And execute this script. Output will be slightly better looking than earlier :)

$ ./awk3.sh
=================
Name Id
=================
Shreya 5
Sudama 9
==================
Note that here we have used multiple prints in BEGIN block which is in a pair of braces and one END block.

Using option -f to specify awk command file

So far we hard coded awk command and input file in our script file. This is not needed. Using -f option in awk, allows us to specify the file where only awk script is stored.

So if we say

awk -f awk3.sh empl.txt

we are telling that "please process the file empl.txt using awk commands stored in awk3.sh.

We also need to remove keyword "awk" and input file name from our awk script file.

$ cat awk3.sh
#!/bin/bash
BEGIN { print "================="
print "Name", "Id"
print"================="}
/SAL/ {print $1, $2}
END {print "=================="}

Now let us run awk with this file.

$ awk -f awk3.sh empl.txt
=================
Name Id
=================
Shreya 5
Sudama 9
==================

Same output!

NR and NC

NR when processing stands for row number and NC stands for number of columns in that row.

Let us write a script file which will print only records from 1 to 4

#!/bin/bash
BEGIN { print "================="
print "Name", "Id"
print"================="}
NR==1,NR==4 {print $1, $2}
END {print "=================="}

We are saying that take records where NR is from 1 to 4. Now if we run this file we get only first 4 records.

$ awk -f awk3.sh empl.txt
=================
Name Id
=================
Suresh 1
Ramesh 2
Sita 3
Maran 4
==================

Last point to be noted is how to specify field separator which is not default - a white space. e.g. if we want to process passwd file in /etc directory, the field separator is :.

-Fc takes c as field separator. In this example we have to use -F:

So to print all user names in passwd file we can give awk

$ awk -F: '{print $1}' /etc/passwd
root
daemon
bin
sys
games
man
mail
news
uucp
proxy
www-data
backup

So with this I conclude the introduction to awk.

Coding Tricks

Search This Blog