Mastering the Linux Triad: AWK - The Swiss Army Knife of Data Processing

1. Overview of AWK Basics

# Basic Structure
awk 'BEGIN{preprocessing} {line processing} END{postprocessing}' filename
# Common Variables
NR: line number | NF: number of fields | $0: entire line content | $1: first column

2. High-Frequency Practical Scenarios

1. Data Deduplication

Example: Retaining Unique Lines

# Deduplicate entire lines (keep the first occurrence)
awk '!seen[$0]++' data.log
# Deduplicate by specific column (keep the last occurrence)
awk -F',' '!a[$1] {a[$1]=$0} END{for(i in a) print a[i]}' data.csv

2. Data Statistics

Example: Sales Data Statistics

date,product,sales
2023-08-01,phone,15
2023-08-01,headphones,30
2023-08-02,phone,20

# Total sales by product
awk -F',' 'NR>1 {sum[$2]+=$3} END{for(k in sum) print k,sum[k]}' sales.csv

# Calculate average daily sales
awk -F',' 'NR>1 {total+=$3} END{print "Average daily:",total/(NR-1)}' sales.csv

3. Merging Two Files

Example: Merging Orders and User Information

# orders.txt
1001 2023-08-01 300
1002 2023-08-02 150

# users.txt
1001 Alice
1002 Bob

# Associate by user ID
awk 'NR==FNR {user[$1]=$2; next} $1 in user {print $0, user[$1]}' users.txt orders.txt

4. Generating Outlines

Example: Log Level Statistics

[ERROR] 2023-08-01 DB Connection Failed
[INFO] 2023-08-01 User login
[ERROR] 2023-08-02 API Timeout

# Generate error type statistics outline
awk '
BEGIN {FS="] "; print "## Error Log Statistics\n"}
$1 ~ /ERROR/ {
    split($2, parts, " ");
    date=parts[1];
    message=substr($2, length(date)+2);
    errors[date][message]++
}
END {
    for (d in errors) {
        print "### " d;
        for (msg in errors[d]) {
            print "- " msg " (Count: " errors[d][msg] ")"
        }
    }
}' app.log

3. Advanced Techniques

Handling Multiple Delimiters: -F '[,:]' Use both comma and colon as delimiters
Field Rearrangement: awk '{print $3,$1}' Adjust column order
Regular Expression Filtering: awk '/ERROR/ && $2 > 100' Combine conditional queries
Calling External Commands: system("echo " $1) Execute Shell commands

4. Performance Optimization Suggestions

Prefer using mawk for processing large files (a faster AWK implementation)
Reduce pipeline operations, try to complete data processing within AWK
Pre-split fields that are used repeatedly

Practical Advice: Save the examples in this article, and adjust the -F parameter and $n values according to specific field positions and delimiters when using. Mastering these techniques will increase your text processing efficiency by more than 10 times!

Further Reading:

[SED Magic: Advanced Text Replacement Techniques]
[18 Weapons of GREP: Precision Search Guide]

This structure conforms to the characteristics of public account dissemination:

Clear hierarchical structure
Immediately usable code snippets
Scenario-based case presentations
Key points highlighted in bold
Appropriate whitespace to enhance readability

Mastering the Linux Triad: AWK – The Swiss Army Knife of Data Processing

1. Overview of AWK Basics

2. High-Frequency Practical Scenarios

1. Data Deduplication

Example: Retaining Unique Lines

2. Data Statistics

Example: Sales Data Statistics

3. Merging Two Files

Example: Merging Orders and User Information

4. Generating Outlines

Example: Log Level Statistics

3. Advanced Techniques

4. Performance Optimization Suggestions

Leave a Comment Cancel reply

1. Overview of AWK Basics

2. High-Frequency Practical Scenarios

1. Data Deduplication

Example: Retaining Unique Lines

2. Data Statistics

Example: Sales Data Statistics

3. Merging Two Files

Example: Merging Orders and User Information

4. Generating Outlines

Example: Log Level Statistics

3. Advanced Techniques

4. Performance Optimization Suggestions

Related posts

Leave a Comment Cancel reply