Here is a quick guide to nawk. I prefer to use nawk over awk as it has more functionalities. Most systems now would have both programs installed. See also:
Quick guide to GNUPlot
Quick guide to Docker
Quick guide to CVS
Quick guide to Emacs
Quick guide to Git
To run nawk
- From command line : nawk ‘program’ inputfile1 inputfile2 …
- From a file : nawk -f programfile inputfile1 inputfile2 …
Structure of nawk program
- A nawk program can consist of three sections: nawk ‘BEGIN{…}{… /* BODY */ …}{END}’ inputfile
- Both ‘BEGIN’ and ‘END’ blocks are optional and are executed only once.
- The body is executed for each line in the input file.
Field Separators
- The following example adds the field ‘=’ separator, in addition to the blank space separator: nawk ‘BEGIN{FS = ” *|=”}{print $2}’ input file.
- For example, if the input file contains the line “Total = 500”, then the output will be 500.
Printing Environment Variables
- The following example appends the current path to a list of filenames/directories:
ls -alg | nawk ‘{print ENVIRON[“$PWD”] “/” $8}’ - ENVIRON is an array of environment variables indexed by the individual variable name.
- The variable FILENAME is a string that stores the current name of the file nawk is parsing.
Examples of usage
- To kill all the jobs of the current user : kill -9 `ps -ef | grep $LOGNAME | nawk ‘{print $2}’`
Multi-dimensional array
- To use 2D or multi-dimensional array, use comma to seperate the array index: matrix[3, 5] = $(i+5)
Another examples
- The example below calculates the averages for 16 items from 10 sets of readings.
- Example of an input line the program is trying to match: Total elapsed time is 560
BEGIN{
printf("--------- Execution Time -----------\n");
item=16;
set=10;
}
{# all new variables are initialized to 0for(;j < set;j++)
for(i=0;i < item; i++)
{# skip input until the second word matches "elapsed"while($2 != "elapsed")
getline;# notice the use of array without declaring its# dimensionsum[i]+=$5;
getline;
}
if(j==set){for(i=0;i < item;i++){
# this and the next 2 lines are comments
# you can use either print or printf for output
# print sum[i]/set;
printf("Set %d : %6.3f\n",i,sum[i]/set);
}
j++;
}
}END{
printf("-------------- End --------------");
}
Examples from the man page
- Write to the standard output all input lines for which field 3 is greater than 5:
$3 > 5 - Write every tenth line:
(NR % 10) == 0 - Write any line with a substring matching the regular expression:
/(G|D)(2[0-9][[:alpha:]]*)/ - Print any line with a substring containing a G or D, followed by a sequence of digits and characters:
/(G|D)([[:digit:][:alpha:]]*)/ - Write any line in which the second field contains a backslash:
$2 ~ /\\/ - Write any line in which the second field contains a backslash (alternate method). Note that backslash escapes are interpreted twice, once in lexical processing of the string and once in processing the regular expression.
$2 ~ “\\\\” - Write the second to the last and the last field in each line, separating the fields by a colon:
{OFS=”:”;print $(NF-1), $NF} - Write lines longer than 72 characters:
{length($0) > 72} - Write the first two fields in opposite order separated by the OFS:
{ print $2, $1 } - Same, with input fields separated by comma or space and tab characters, or both:
BEGIN { FS = “,[\t]*|[\t]+” }{ print $2, $1 } - Add up first column, print sum and average:
{s += $1 }END{print “sum is “, s, ” average is”, s/NR} - Write fields in reverse order, one per line (many lines out for each line in):
{ for (i = NF; i > 0; –i) print $i } - Write all lines between occurrences of the strings “start” and “stop”:
/start/, /stop/ - Write all lines whose first field is different from the previous one:
$1 != prev { print; prev = $1 } - Simulate the echo command:
BEGIN { for (i = 1; i < ARGC; ++i) printf “%s%s”, ARGV[i], i==ARGC-1?”\n”:””} - Write the path prefixes contained in the PATH environment variable, one per line:
BEGIN{n = split (ENVIRON[“PATH”], path, “:”) for (i = 1; i <= n; ++i) print path[i]}