Awk is a scripting language used for processing data and generating reports. The awk command programming language does not require compilation and allows users to use variables, numeric functions, string functions, and logical operators.
Awk is a utility that enables programmers to write small but effective programs in the form of statements that define the text patterns to search for in each line of a document, as well as the actions to take when matches are found in a line. Awk is primarily used for pattern scanning and processing. It searches one or more files to see if they contain lines that match a specified pattern and then performs associated actions.
Awk is named after its developers – Aho, Weinberger, and Kernighan.
Awk Syntax #
awk options 'pattern {action}' input > output
| Option | Description |
|---|---|
| -F | Custom delimiter |
| -f | Read awk program from file |
<span>'{}'</span> |
Action on matched content |
What Can Awk Do? #
-
Awk operation flow
- Scan files line by line
- Split each input line into multiple fields
- Compare input lines/fields with patterns
- Perform actions on matched lines
Uses of Awk
- Transform data files
- Generate formatted reports
Awk programming structure
- Set output line format
- Arithmetic and string operations
- Conditional statements and loops
Examples of Awk Commands #
The following is the content of the input file:
$ cat info.txt
朱八 皇帝 客户 45000
阿三 异族 客户 25000
赵大 皇帝 售出 50000
李二凤 皇帝 客户 47000
增阿牛 侠客 售出 15000
张三 盲流 售出 23000
三毛 狗 售出 13000
二狗 猫 购入 80000
Print All Lines (Default Action) #
By default, Awk prints every line of data in the specified file.
$ awk '{print}' info.txt
### Output
朱八 皇帝 客户 45000
阿三 异族 客户 25000
赵大 皇帝 售出 50000
李二凤 皇帝 客户 47000
增阿牛 侠客 售出 15000
张三 盲流 售出 23000
三毛 狗 售出 13000
二狗 猫 购入 80000
In the above example, no matching condition is specified. Therefore, the <span>print</span> action applies to all lines. By default, the print action without any parameters prints the entire line, so it prints all lines of the file without failure.
Keyword Search Lines #
$ awk '/客户/ {print}' info.txt
## Output
朱八 皇帝 客户 45000
阿三 异族 客户 25000
李二凤 皇帝 客户 47000
In the above example, the awk command prints all lines that match ‘客户’.
Print Specific Columns #
For each record (i.e., line), the awk command by default splits it into records separated by space characters and stores them in the <span>$n</span> variables. If the line has 4 words, they will be stored as <span>$1</span>, <span>$2</span>, <span>$3</span>, and <span>$4</span>. Additionally, <span>$0</span> represents the entire line.
$ awk '{print $1,$4}' info.txt
## Output
朱八 45000
阿三 25000
赵大 50000
李二凤 47000
增阿牛 15000
张三 23000
三毛 13000
二狗 80000
In the above example, <span>$1</span> and <span>$4</span> represent the <span>name</span> and <span>strength</span> fields, respectively.
Built-in Variables of Awk #
Awk’s built-in variables include field variables — <span>$1</span>, <span>$2</span>, <span>$3</span>, etc. (<span>$0</span> is the entire line), which divide a line of text into individual words or segments called fields.
-
NR NR represents the current count of input records. Remember, records are usually lines. The awk command executes the pattern/action statement once for each record in the file.
-
NF NF represents the number of fields in the current input record (line).
-
FS FS represents the field separator used to divide fields on the input line. The default is “white space”, which means
<span>spaces</span>and<span>tab characters</span>. FS can be reassigned to another character (usually in BEGIN) to change the field separator. -
RS RS represents the current record separator. Since records are lines by default, the default record separator is newline.
-
OFS
OFS represents the output field separator, which separates fields when Awk prints them. The default is a space. Whenever print has multiple parameters separated by commas, it will print the value of OFS between each parameter.
-
ORS
ORS represents the output record separator, which separates output lines when Awk prints them. The default is a newline. Print automatically outputs the content of ORS at the end of anything provided to print.
Example of Using NR (Display Line Numbers) #
$ awk '{print NR,$0}' info.txt
## Output
1 朱八 皇帝 客户 45000
2 阿三 异族 客户 25000
3 赵大 皇帝 售出 50000
4 李二凤 皇帝 客户 47000
5 增阿牛 侠客 售出 15000
6 张三 盲流 售出 23000
7 三毛 狗 售出 13000
8 二狗 猫 购入 80000
In the above example, the awk command with NR prints all lines along with their line numbers.
Example of Using NF (Display Last Field) #
$ awk '{print $1,$NF}' info.txt
## Output
朱八 45000
阿三 25000
赵大 50000
李二凤 47000
增阿牛 15000
张三 23000
三毛 13000
二狗 80000
In the above example, <span>$1</span> represents the name, and <span>$NF</span> represents the strength. We can use <span>$NF</span> to get the strength, where <span>$NF</span> represents the last field. The above command has the same effect as <span>awk '{print $1,$4}' info.txt</span>.
Another Example of NR (Display Lines 3 to 6) #
$ awk 'NR==3, NR==6 {print NR,$0}' info.txt
## Output
3 赵大 皇帝 售出 50000
4 李二凤 皇帝 客户 47000
5 增阿牛 侠客 售出 15000
6 张三 盲流 售出 23000
<span>NR==3, NR==6</span> is a range pattern that indicates processing records from line 3 (NR==3) to line 6 (NR==6), and <span>{print NR,$0}</span> is the action performed on the matching lines, where <span>print NR</span> prints the current line number, and <span>$0</span> represents the entire line.
More Awk Usage Examples #
Print Line Numbers and First Item Content Connected by <span>-</span> #
$ awk '{print NR "-" $1}' info.txt
1-朱八
2-阿三
3-赵大
4-李二凤
5-增阿牛
6-张三
7-三毛
8-二狗
Output Second Item (Column) Content #
$ awk '{print $2}' info.txt
皇帝
异族
皇帝
皇帝
侠客
盲流
狗
猫
Print Any Non-Empty Lines (If Exist) #
awk 'NF < 0' info.txt
awk 'NF == 0 {print NR}' info.txt
awk 'NF <= 0 {print NR}' info.txt
Find the Length of the Longest Line in the File #
$ awk '{ if (length($0) > max) max = length($0) } END { print max }' info.txt
15
<span>length($0)</span> indicates using the built-in function <span>length</span> to determine the length of the current line.
Count the Number of Lines in the File #
$ awk 'END { print NR }' info.txt
8
Print Lines with More Than 14 Characters #
$ awk 'length($0) > 14' info.txt
李二凤 皇帝 客户 47000
增阿牛 侠客 售出 15000
二狗 猫 购入 80000
Filter Specific Data by Condition #
$ awk '{ if($2 == "侠客") print $0;}' info.txt
增阿牛 侠客 售出 15000
Print Squares of Numbers from 1 to 9 #
$ awk 'BEGIN { for(i=1;i<=9;i++) print i,"的平方是",i*i; }'
1 的平方是 1
2 的平方是 4
3 的平方是 9
4 的平方是 16
5 的平方是 25
6 的平方是 36
7 的平方是 49
8 的平方是 64
9 的平方是 81
Conclusion #
The AWK command is a very simple yet extremely useful utility for any text files, logs, or command line data you are working with. Whether you are a beginner or an experienced system administrator, AWK can help you search, filter, and format data quickly and effectively, making your life easier.
With AWK, you do not have to write lengthy scripts. A single line of code can produce employee payrolls, delete logs, or even output quick reports. It has pattern recognition capabilities that can split lines into multiple fields and allows you to perform operations such as printing, counting, calculating, and formatting.
AWK can save time, prevent human errors, and increase productivity on the Linux platform.
Original article link https://awkgrepsed.com/docs/awk/awk_usage_linux/