Awk as tool and scripting language

Slashdot it! Delicious Share on Facebook Tweet! Digg!

Interpreting Log Files

Many larger network printers and print servers log print jobs in a text file. Logs usually include the print job originators, page sizes and counts, and other free format data fields for such things as project cost centers and other information. Each row is a print job record. In Listing 2, you can see the table headers and some sample records from a printlog.txt file.

Listing 2

Sample printlog.txt File

Document        User    Device  Format  Medium  col     b/w     costctr
C2.sxw         LAGO    pr04    DIN_A4  Normal  1       10      P01
prop.pdf       LEHM    pr03    DIN_A4  Normal  0       10      P01
offer.doc      LOHN    pr01    DIN_A4  Normal  3       0       P02

The Awk scripts for processing these kinds of files take up about six to eight lines, depending on the format. They are included in the examples you can download from the Ubuntu User site. You can also copy Listing 3 into a text editor and save it as eval1.awk . You can start by evaluating the total number of black-and-white or color printed pages. Each of the sums is handled by a variable.

Listing 3

Print log evaluation 1

01 #Evaluating the number of printed pages
02 NR==1 {
03    next;
04 }
05 {
06  sum_color+=$6;
07  sum_bw+=$7;
08 }
09 END {
10     print sum_color " Printed in color";
11     print sum_bw " Printed in B&W";
12 }

Invoke the script with awk -f eval1.awk printlog.txt , which first loads the script file, then executes the commands in Awk on printlog.txt . The script starts by skipping a row if NR==1 , which essentially ignores the table header row. It would also be possible to store this line in a variable and output it at the end.

Awk increments the sum variables depending on number of color ($6 ) and B&W ($7 ) pages and prints both totals. Processing this print job for a cost center is a bit more complicated, but Awk can handle the task quite effectively (see Listing 4 and eval2.awk ).

Listing 4

Print log evaluation 2

01 # Evaluating the number of B&W printed pages
02 # for a cost center
03 NR==1 {
04    next;
05 }
06 {printer[$8]+=$7}
07 END {
08   print "costctr. totals";
09   for (F in printer) {print F " " printer[F]}
10 }

Here again, the next command skips the first row. The remaining rows are to be evaluated to count the number of B&W pages for the cost center. The printer[$8]+=$7 command increments page counts in the printer[] array for the cost centers. Once all the datasets are read, the loop evaluates the printer data field in the END block and outputs the totals for each cost center.

The printer[] array represents the names of the cost centers. In this article's examples, you will also find the eval3.awk script, which sums up all the printings for each user by using two data fields.

Evaluating Number Values

The following example prepares some simple number values in a time column and five columns with floating point values (Listing 5). Because they are floating points, be sure to set the LC_ALL=C variable in shell before you try this example.

Listing 5

Sample Number Values

t               Val1            Val2            Val3            Val4            Val5
0.100000        0.194000        0.166000        0.162000        0.155000        0.194200
0.200000        0.440000        0.388000        0.359000        0.392000        0.400000

Listing 5 shows the table headers and the first few rows with sample data of the measureddata1.txt file. Calculating the average values is quite simple in Awk through the average.awk script in Listing 6.

Listing 6


01 # Averaging the values
02 NR==1 {next;}
03 {sum = 0;
04  for (i=2; i<=NF; i++) {
05       sum+=$i;
06   }
07   average = sum/(NF-1)
08   printf("%6s %8.2f\n",$1,average);
09 }

You also often have to find the minimum and maximum values from a list. You can do this by sorting the rows in a column.

Listing 7 shows the numbervalues1.awk file that evaluates the minimum and maximum values. The numbers are first stored in an array ($1 through $NF ). After sorting, the first value has the minimum and the last value has the maximum.

Listing 7


01 # Evaluating number values
02 BEGIN { print("     t         MIN        MAX"); }
03 NR==1 { next; }
04 {
05     t=0; n=NF;
06     /* Store the number values in an array: */
07     for (i=2; i<=n; i++) {
08          y[i-2] = $i;
09     }
10     /* Sort the values: */
11     for (i = 0; i <= n-2; i++) {
12        for (j = i; j > 0 && y[j-1] > y[j]; j--) {
13              t = y[j]; y[j] = y[j-1]; y[j-1] = t;
14        }
15     }
16     /* Output: t, MIN, MAX */
17     printf("%6s %8.6f %8.6f ",$1, y[0], y[n-2]);
18     printf("\n");
19 }

Sorting the values works through a cut-and-paste insertion sort method that is appropriate for limited number of values. You can find more efficient algorithms in Jon Bentley's excellent book [5], for example. More complicated sorting algorithms require Awk functions.

Buy this article as PDF

Express-Checkout as PDF

Pages: 5

Price $0.99
(incl. VAT)

Buy Ubuntu User

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content