(Click the public account above to quickly follow)
Source: ggjucheng
Link: http://www.cnblogs.com/ggjucheng/archive/2013/01/13/2858470.html
Introduction
Awk is a powerful text analysis tool. Compared to grep for searching and sed for editing, awk is particularly powerful in data analysis and report generation. In simple terms, awk reads files line by line, slicing each line using space as the default delimiter, and then performs various analyses and processing on the sliced parts.
There are three different versions of awk: awk, nawk, and gawk. Unless otherwise specified, it generally refers to gawk, which is the GNU version of AWK.
The name awk comes from the initials of its creators Alfred Aho, Peter Weinberger, and Brian Kernighan. In fact, AWK has its own language: the AWK programming language, which the three creators have officially defined as a “pattern scanning and processing language.” It allows you to create short programs that read input files, sort data, process data, perform calculations on input, and generate reports, among countless other functions.
Usage
awk‘{pattern + action}’{filenames}
Although operations can be complex, the syntax is always like this, where pattern indicates what AWK is looking for in the data, and action is a series of commands executed when a match is found. Curly braces ({}) do not always need to appear in the program, but they are used to group a series of instructions based on a specific pattern. The pattern is the regular expression to be represented, enclosed in slashes.
The most basic function of the awk language is to browse and extract information from files or strings based on specified rules. After extracting information, awk can perform other text operations. A complete awk script is usually used to format information in text files.
Typically, awk processes one line of a file at a time. Awk receives one line of the file and then executes the corresponding command to process the text.
Calling awk
There are three ways to call awk:
1. Command line method
awk [-F field-separator] ‘commands’ input-file(s) where commands are the actual awk commands, [-F field-separator] is optional. input-file(s) are the files to be processed. In awk, each item separated by the field separator in each line of the file is called a field. Typically, the default field separator is space unless -F field-separator is specified.
2. Shell script method
Insert all awk commands into a file and make the awk program executable, then call it by typing the script name, with the awk command interpreter as the first line of the script. The first line of the shell script: #!/bin/sh can be replaced with: #!/bin/awk
3. Insert all awk commands into a separate file and then call: awk -f awk-script-file input-file(s) where the -f option loads the awk script from awk-script-file, and input-file(s) is the same as above.
This chapter focuses on the command line method.
Getting Started Examples
Assuming the output of last -n 5 is as follows:
# last -n 5 only takes the first five lines
root pts/1 192.168.1.100 Tue Feb1011:21 still logged in
root pts/1 192.168.1.100 Tue Feb1000:46 – 02:28 (01:41)
root pts/1 192.168.1.100 Mon Feb 911:41 – 18:30 (06:48)
dmtsai pts/1 192.168.1.100 Mon Feb 911:41 – 11:41 (00:00)
root tty1 Fri Sep 514:09 – 14:10 (00:01)
If you only want to display the last 5 logged-in accounts:
# last -n 5 | awk ‘{print $1}’
root
root
root
dmtsai
root
The workflow of awk is as follows: it reads a record split by ‘n’ newline characters, then divides the record into fields based on the specified field separator, filling the fields. $0 represents all fields, $1 represents the first field, $n represents the nth field. The default field separator is “whitespace” or “key”, so $1 represents the logged-in user, $3 represents the logged-in user’s IP, and so on.
If you only want to display the accounts in /etc/passwd:
# cat /etc/passwd | awk -F ‘:’ ‘{print $1}’
root
daemon
bin
sys
This is an example of awk + action, where the action {print $1} is executed for each line.
-F specifies the field separator as ‘:’ .
If you only want to display the accounts in /etc/passwd and their corresponding shells, with accounts and shells separated by a tab key:
# cat /etc/passwd | awk -F ‘:’ ‘{print $1″t”$7}’
root /bin/bash
daemon /bin/sh
bin /bin/sh
sys /bin/sh
If you only want to display the accounts in /etc/passwd and their corresponding shells, with accounts and shells separated by a comma, and add column names name, shell to all lines, and add “blue,/bin/nosh” to the last line:
cat /etc/passwd | awk -F ‘:’ ‘BEGIN {print “name,shell”} {print $1″,”$7} END {print “blue,/bin/nosh”}’
name,shell
root,/bin/bash
daemon,/bin/sh
bin,/bin/sh
sys,/bin/sh
….
blue,/bin/nosh
The workflow of awk is as follows: it first executes BEGIN, then reads the file, reads a record split by ‘/n’ newline characters, then divides the record into fields based on the specified field separator, filling the fields. $0 represents all fields, $1 represents the first field, $n represents the nth field, and then begins executing the actions corresponding to the patterns. It continues reading the second record… until all records are read, and finally executes the END operation.
Search for all lines in /etc/passwd that contain the keyword root:
# awk -F: ‘/root/’ /etc/passwd
root:x:0:0:root:/root:/bin/bash
This is an example of using a pattern, where only lines matching the pattern (here root) will execute the action (no action specified, default is to output each line’s content).
Search supports regular expressions, for example, to find lines starting with root: awk -F: ‘/^root/’ /etc/passwd
Search for all lines in /etc/passwd that contain the keyword root and display the corresponding shell:
# awk -F: ‘/root/{print $7}’ /etc/passwd
/bin/bash
Here, the action {print $7} is specified.
Built-in Variables in awk
Awk has many built-in variables used to set environment information, which can be changed. Below are some of the most commonly used variables.
ARGC Number of command line arguments
ARGV Arrangement of command line arguments
ENVIRON Supports the use of system environment variables in the queue
FILENAME awk browsing file name
FNR Number of records browsed in the file
FS Sets the input field separator, equivalent to the command line -F option
NF Number of fields in the browsed record
NR Number of records read
OFS Output field separator
ORS Output record separator
RS Controls the record separator
Additionally, the $0 variable refers to the entire record. $1 represents the first field of the current line, $2 represents the second field of the current line, and so on.
Count the number of lines, line numbers, number of columns, and corresponding complete line content in /etc/passwd:
# awk -F ‘:’ ‘{print “filename:” FILENAME “,linenumber:” NR “,columns:” NF “,linecontent:”$0}’ /etc/passwd
filename:/etc/passwd,linenumber:1,columns:7,linecontent:root:x:0:0:root:/root:/bin/bash
filename:/etc/passwd,linenumber:2,columns:7,linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin/sh
filename:/etc/passwd,linenumber:3,columns:7,linecontent:bin:x:2:2:bin:/bin:/bin/sh
filename:/etc/passwd,linenumber:4,columns:7,linecontent:sys:x:3:3:sys:/dev:/bin/sh
Using printf instead of print can make the code more concise and readable.
awk -F ‘:’ ‘{printf(“filename:%10s,linenumber:%s,columns:%s,linecontent:%sn”,FILENAME,NR,NF,$0)}’ /etc/passwd
Print and printf
Awk provides both print and printf functions for output.
The print function can take variables, numbers, or strings as parameters. Strings must be quoted with double quotes, and parameters are separated by commas. If there are no commas, the parameters will concatenate together and cannot be distinguished. Here, the role of the comma is the same as the output file separator, except that the latter is a space.
The printf function is similar to the printf in C language, allowing for formatted strings. When outputting complex data, printf is more useful and makes the code easier to understand.
Awk Programming
Variables and Assignments
In addition to built-in variables, awk also allows for user-defined variables.
Below is a count of the number of accounts in /etc/passwd:
awk ‘{count++;print $0;} END{print “user count is “, count}’ /etc/passwd
root:x:0:0:root:/root:/bin/bash
……
user count is40
Count is a user-defined variable. In previous action{} blocks, there was only one print; in fact, print is just a statement, while action{} can have multiple statements separated by semicolons.
Here, count is not initialized, although it defaults to 0, it is still better practice to initialize it to 0:
awk ‘BEGIN {count=0;print “[start]user count is “, count} {count=count+1;print $0;} END{print “[end]user count is “, count}’ /etc/passwd
[start]user count is 0
root:x:0:0:root:/root:/bin/bash
…
[end]user count is 40
Count the byte size occupied by files in a specific folder:
ls -l | awk ‘BEGIN {size=0;} {size=size+$5;} END{print “[end]size is “, size}’
[end]size is8657198
If displayed in MB:
ls -l | awk ‘BEGIN {size=0;} {size=size+$5;} END{print “[end]size is “, size/1024/1024,”M”}’
[end]size is8.25889M
Note that the count does not include the subdirectories of the folder.
Conditional Statements
Conditional statements in awk are borrowed from C language, as seen in the following declaration:
if(expression){
statement;
statement;
……
}
if(expression){
statement;
}else{
statement2;
}
if(expression){
statement1;
}elseif(expression1){
statement2;
}else{
statement3;
}
Count the byte size occupied by files in a specific folder, filtering out files of size 4096 (which are generally folders):
ls -l | awk ‘BEGIN {size=0;print “[start]size is “, size} {if($5!=4096){size=size+$5;}} END{print “[end]size is “, size/1024/1024,”M”}’
[end]size is8.22339M
Loop Statements
Loop statements in awk are also borrowed from C language, supporting while, do/while, for, break, continue, and these keywords have the same semantics as in C language.
Arrays
In awk, array indices can be numbers or letters, and array indices are usually referred to as keys. Both values and keys are stored in an internal hash table for key/value applications. Since hashes are not stored in order, when displaying array contents, they may not appear in the order you expect. Arrays, like variables, are automatically created when used, and awk will also automatically determine whether they store numbers or strings. Generally, arrays in awk are used to collect information from records, which can be used for summation, word counting, and tracking how many times a template is matched, etc.
Display the accounts in /etc/passwd:
awk -F ‘:’ ‘BEGIN {count=0;} {name[count] = $1;count++;}; END{for (i = 0; i < count; i++) print i, name[i]}’ /etc/passwd
Here, a for loop is used to iterate through the array.
There is a lot of content in awk programming; here only simple and commonly used usages are listed. For more, please refer to http://www.gnu.org/software/gawk/manual/gawk.html
Follow “Linux Enthusiasts”
See more Linux technical sharing
↓↓↓