1. Overview of AWK Basics
# Basic Structure
awk 'BEGIN{preprocessing} {line processing} END{postprocessing}' filename
# Common Variables
NR: line number | NF: number of fields | $0: entire line content | $1: first column
2. High-Frequency Practical Scenarios
1. Data Deduplication
Example: Retaining Unique Lines
# Deduplicate entire lines (keep the first occurrence)
awk '!seen[$0]++' data.log
# Deduplicate by specific column (keep the last occurrence)
awk -F',' '!a[$1] {a[$1]=$0} END{for(i in a) print a[i]}' data.csv
2. Data Statistics
Example: Sales Data Statistics
date,product,sales
2023-08-01,phone,15
2023-08-01,headphones,30
2023-08-02,phone,20
# Total sales by product
awk -F',' 'NR>1 {sum[$2]+=$3} END{for(k in sum) print k,sum[k]}' sales.csv
# Calculate average daily sales
awk -F',' 'NR>1 {total+=$3} END{print "Average daily:",total/(NR-1)}' sales.csv
3. Merging Two Files
Example: Merging Orders and User Information
# orders.txt
1001 2023-08-01 300
1002 2023-08-02 150
# users.txt
1001 Alice
1002 Bob
# Associate by user ID
awk 'NR==FNR {user[$1]=$2; next} $1 in user {print $0, user[$1]}' users.txt orders.txt
4. Generating Outlines
Example: Log Level Statistics
[ERROR] 2023-08-01 DB Connection Failed
[INFO] 2023-08-01 User login
[ERROR] 2023-08-02 API Timeout
# Generate error type statistics outline
awk '
BEGIN {FS="] "; print "## Error Log Statistics\n"}
$1 ~ /ERROR/ {
split($2, parts, " ");
date=parts[1];
message=substr($2, length(date)+2);
errors[date][message]++
}
END {
for (d in errors) {
print "### " d;
for (msg in errors[d]) {
print "- " msg " (Count: " errors[d][msg] ")"
}
}
}' app.log
3. Advanced Techniques
- Handling Multiple Delimiters:
<span>-F '[,:]'</span>
Use both comma and colon as delimiters - Field Rearrangement:
<span>awk '{print $3,$1}'</span>
Adjust column order - Regular Expression Filtering:
<span>awk '/ERROR/ && $2 > 100'</span>
Combine conditional queries - Calling External Commands:
<span>system("echo " $1)</span>
Execute Shell commands
4. Performance Optimization Suggestions
- Prefer using
<span>mawk</span>
for processing large files (a faster AWK implementation) - Reduce pipeline operations, try to complete data processing within AWK
- Pre-split fields that are used repeatedly
Practical Advice: Save the examples in this article, and adjust the <span>-F</span>
parameter and <span>$n</span>
values according to specific field positions and delimiters when using. Mastering these techniques will increase your text processing efficiency by more than 10 times!
Further Reading:
- [SED Magic: Advanced Text Replacement Techniques]
- [18 Weapons of GREP: Precision Search Guide]
This structure conforms to the characteristics of public account dissemination:
- Clear hierarchical structure
- Immediately usable code snippets
- Scenario-based case presentations
- Key points highlighted in bold
- Appropriate whitespace to enhance readability