(Click the public account above to quickly follow)
Source: ggjucheng
Link: http://www.cnblogs.com/ggjucheng/archive/2013/01/13/2858470.html
Introduction
awk is a powerful text analysis tool. Compared to grep for searching and sed for editing, awk is particularly powerful in data analysis and report generation. In simple terms, awk reads files line by line, slicing each line using space as the default delimiter, and then performs various analyses and processing on the sliced parts.
awk has three different versions: awk, nawk, and gawk. Unless otherwise specified, it generally refers to gawk, which is the GNU version of AWK.
The name awk comes from the initials of its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan. In fact, AWK has its own language: the AWK programming language, which the three creators have officially defined as a “pattern scanning and processing language.” It allows you to create short programs that read input files, sort data, process data, perform calculations on input, and generate reports, among countless other functions.
Usage
awk‘{pattern + action}’{filenames}
Although operations can be complex, the syntax is always like this, where pattern indicates what awk is looking for in the data, and action is a series of commands executed when a match is found. Curly braces ({}) do not always need to appear in the program, but they are used to group a series of instructions based on a specific pattern. The pattern is the regular expression to be represented, enclosed in slashes.
The most basic function of the awk language is to browse and extract information from files or strings based on specified rules. After extracting information, awk can perform other text operations. A complete awk script is usually used to format information in text files.
Typically, awk processes one line of a file at a time. awk receives one line of the file and then executes the corresponding command to process the text.
Calling awk
There are three ways to call awk
1. Command line method
awk[–F field–separator] ‘commands’ input–file(s)
Here, commands are the actual awk commands, and [–F field separator] is optional. input–file(s) is the file to be processed.
In awk, each item in a line of the file separated by the field separator is called a field. Typically, the default field separator is space unless specified with –F.
2. Shell script method
Insert all awk commands into a file, make the awk program executable, and then call the awk command interpreter as the first line of the script by typing the script name.
Equivalent to the first line of the shell script: #!/bin/sh
Can be changed to: #!/bin/awk
3. Insert all awk commands into a separate file and then call:
awk –fawk–script–fileinput–file(s)
Here, the –f option loads the awk–script–file containing the awk script, and input–file(s) is the same as above.
This chapter focuses on the command line method.
Getting Started Examples
Assuming the output of last -n 5 is as follows
# last -n 5 only retrieves the first five lines
root pts/1 192.168.1.100 Tue Feb1011:21 still logged in
root pts/1 192.168.1.100 Tue Feb1000:46 – 02:28 (01:41)
root pts/1 192.168.1.100 Mon Feb 911:41 – 18:30 (06:48)
dmtsai pts/1 192.168.1.100 Mon Feb 911:41 – 11:41 (00:00)
root tty1 Fri Sep 514:09 – 14:10 (00:01)
If you only want to display the last 5 logged-in accounts
#last -n 5 | awk ‘{print $1}’
root
root
root
dmtsai
root
The workflow of awk is as follows: it reads a record separated by newline characters, then divides the record into fields based on the specified field separator, filling the fields. $0 represents all fields, $1 represents the first field, $n represents the nth field. The default field separator is “whitespace” or “key”, so $1 represents the logged-in user, $3 represents the logged-in user’s IP, and so on.
If you only want to display the accounts in /etc/passwd
#cat /etc/passwd |awk -F ‘:’ ‘{print $1}’
root
daemon
bin
sys
This is an example of awk + action, where the action {print $1} is executed for each line.
-F specifies the field separator as ‘:’ .
If you only want to display the accounts in /etc/passwd and their corresponding shells, with accounts and shells separated by a tab key
#cat /etc/passwd |awk -F ‘:’ ‘{print $1″t”$7}’
root /bin/bash
daemon /bin/sh
bin /bin/sh
sys /bin/sh
If you only want to display the accounts in /etc/passwd and their corresponding shells, with accounts and shells separated by commas, and add column names name, shell to all lines, and add “blue,/bin/nosh” to the last line.
cat /etc/passwd |awk –F‘:’ ‘BEGIN {print “name,shell”} {print $1″,”$7} END {print “blue,/bin/nosh”}’
name,shell
root,/bin/bash
daemon,/bin/sh
bin,/bin/sh
sys,/bin/sh
….
blue,/bin/nosh
The workflow of awk is as follows: it first executes BEGIN, then reads the file, reads a record separated by newline characters, then divides the record into fields based on the specified field separator, filling the fields. $0 represents all fields, $1 represents the first field, $n represents the nth field, and then starts executing the actions corresponding to the pattern. It continues to read the second record until all records are read, and finally executes the END operation.
Search for all lines in /etc/passwd that contain the keyword root
#awk -F: ‘/root/’ /etc/passwd
root:x:0:0:root:/root:/bin/bash
This is an example of using pattern, where only lines matching the pattern (in this case, root) will execute the action (default is to output each line’s content).
Search supports regular expressions, for example, to find lines starting with root: awk -F: ‘/^root/’ /etc/passwd
Search for all lines in /etc/passwd that contain the keyword root and display the corresponding shell
# awk -F: ‘/root/{print $7}’ /etc/passwd
/bin/bash
Here, the action {print $7} is specified.
Built-in Variables in awk
awk has many built-in variables used to set environment information, which can be changed. Below are some of the most commonly used variables.
ARGC Number of command line arguments
ARGV Arrangement of command line arguments
ENVIRON Supports the use of system environment variables in the queue
FILENAME awk browsing file name
FNR Number of records read from the file
FS Sets the input field separator, equivalent to the command line –F option
NF Number of fields in the record
NR Number of records read
OFS Output field separator
ORS Output record separator
RS Controls the record separator
Additionally, the $0 variable refers to the entire record. $1 represents the first field of the current line, $2 represents the second field of the current line, and so on.
Count the number of lines, line numbers, number of columns, and corresponding complete line content in /etc/passwd:
#awk -F ‘:’ ‘{print “filename:” FILENAME “,linenumber:” NR “,columns:” NF “,linecontent:”$0}’ /etc/passwd
filename:/etc/passwd,linenumber:1,columns:7,linecontent:root:x:0:0:root:/root:/bin/bash
filename:/etc/passwd,linenumber:2,columns:7,linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin:/sh
filename:/etc/passwd,linenumber:3,columns:7,linecontent:bin:x:2:2:bin:/bin:/bin/sh
filename:/etc/passwd,linenumber:4,columns:7,linecontent:sys:x:3:3:sys:/dev:/bin/sh
Using printf instead of print can make the code more concise and readable.
awk –F‘:’ ‘{printf(“filename:%10s,linenumber:%s,columns:%s,linecontent:%sn”,FILENAME,NR,NF,$0)}’ /etc/passwd
print and printf
awk provides both print and printf functions for output.
The print function can take variables, numbers, or strings as parameters. Strings must be enclosed in double quotes, and parameters are separated by commas. If there are no commas, the parameters will concatenate together and cannot be distinguished. Here, the role of the comma is the same as the output file separator, except that the latter is a space.
The printf function is similar to the printf in C language, allowing for formatted strings. When outputting complex data, printf is more useful and makes the code easier to understand.
awk Programming
Variables and Assignments
In addition to built-in variables, awk also allows for user-defined variables.
Below is an example counting the number of accounts in /etc/passwd
awk‘{count++;print $0;} END{print “user count is “, count}’ /etc/passwd
root:x:0:0:root:/root:/bin/bash
……
user count is40
count is a user-defined variable. In previous actions, there was only one print, but the action can have multiple statements separated by semicolons.
Here, count is not initialized, although it defaults to 0, it is better practice to initialize it to 0:
awk‘BEGIN {count=0;print “[start]user count is “, count} {count=count+1;print $0;} END{print “[end]user count is “, count}’ /etc/passwd
[start]user count is 0
root:x:0:0:root:/root:/bin/bash
…
[end]user count is 40
Count the byte size occupied by files in a specific folder
ls –l |awk‘BEGIN {size=0;} {size=size+$5;} END{print “[end]size is “, size}’
[end]size is8657198
If you want to display in MB:
ls –l |awk‘BEGIN {size=0;} {size=size+$5;} END{print “[end]size is “, size/1024/1024,”M”}’
[end]size is8.25889M
Note that the count does not include subdirectories of the folder.
Conditional Statements
Conditional statements in awk are borrowed from C language, as seen in the following declaration:
if(expression){
statement;
statement;
……
}
if(expression){
statement;
}else{
statement2;
}
if(expression){
statement1;
}elseif(expression1){
statement2;
}else{
statement3;
}
Count the byte size occupied by files in a specific folder, filtering out files of size 4096 (which are generally folders):
ls –l |awk‘BEGIN {size=0;print “[start]size is “, size} {if($5!=4096){size=size+$5;}} END{print “[end]size is “, size/1024/1024,”M”}’
[end]size is8.22339M
Loop Statements
Loop statements in awk are also borrowed from C language, supporting while, do/while, for, break, and continue. The semantics of these keywords are identical to those in C language.
Arrays
In awk, array indices can be numbers or letters, and array indices are usually referred to as keys. Both values and keys are stored in an internal hash table for key/value applications. Since hashes are not stored in order, when displaying array contents, they may not appear in the expected order. Arrays, like variables, are automatically created when used, and awk will also automatically determine whether they store numbers or strings. Generally, arrays in awk are used to collect information from records, which can be used for summation, word counting, and tracking how many times a template is matched, etc.
Display the accounts in /etc/passwd
awk –F‘:’‘BEGIN {count=0;} {name[count] = $1;count++;}; END{for (i = 0; i ‘ /etc/passwd
0root
1daemon
2bin
3sys
4sync
5games
……
Here, a for loop is used to iterate through the array.
There is a lot to awk programming; here are just some simple common usages. For more, please refer to http://www.gnu.org/software/gawk/manual/gawk.html
【Today’s WeChat Public Account Recommendation↓】

For more recommendations, see《Recommended Technical and Design Public Accounts》
Among them, recommendations include popular public accounts related to technology, design, geeks, and IT matchmaking. Technology covers: Python, Web front-end, Java, Android, iOS, PHP, C/C++, .NET, Linux, databases, operations, big data, algorithms, IT careers, etc. Click on 《Recommended Technical and Design Public Accounts》 to discover exciting content!
