Daily Linux: Efficient Text Processing with awk Command

1. Command Introduction and Principles

1.1 Introduction

awk is a powerful text processing programming language, named after the initials of its three founders: Alfred Aho, Peter Weinberger, and Brian Kernighan. It is not only a text processing tool but also a complete programming language, particularly suitable for handling structured data and generating reports.

1.2 Working Principles

  • Record-Field Model: Treats the input text as a collection of records, each record consists of fields.

  • Pattern-Action Pairs: The program consists of a series of pattern { action } pairs.

  • Automatic Field Splitting: Automatically splits records into fields based on field separators.

  • Built-in Variable Management: Maintains built-in variables such as NR, NF, FS, OFS, etc.

  • Stream Processing: Processes input line by line, supporting both pipeline and file input.

2. Basic Syntax

awk [options] 'pattern { action }' input_file
awk [options] -f script_file input_file

2.1 Common Options

-F fs--field-separator=fs   # Specify field separator
-v var=value                  # Define variable and assign value
-f program-file, --file program-file  # Read awk program from file
-W option                     # Compatibility options
--posix                       # Enable POSIX compatibility mode
--dump-variables[=file]       # Output global variables to file
--profile[=file]              # Output performance analysis information

2.2 Basic Program Structure

# Complete structure
awk 'BEGIN { initialization action }
     pattern1 { action1 }
     pattern2 { action2 }
     END { ending action }' file

3. Classic Use Cases

3.1 Field Extraction and Display

# Print specific fields
awk '{print $1, $3}' file.txt           # Print the 1st and 3rd fields
awk '{print $NF}' file.txt              # Print the last field
awk '{print $(NF-1)}' file.txt          # Print the second to last field
# Rearrange fields
awk '{print $3, $1, $2}' file.txt       # Rearrange field order

3.2 Conditional Filtering

# Numeric conditions
awk '$3+0 > 100' data.txt               # Lines where the 3rd field is greater than 100
awk '$1 + $2 > 50' data.txt             # Sum of the first two fields greater than 50
awk 'NR % 2 == 0' file.txt              # Even lines
# String matching
awk '/error/' logfile.txt               # Lines containing "error"
awk '$2 == "success"' status.txt        # 2nd field equals "success"
awk 'tolower($2) == "success"' status    # Case insensitive
awk '$1 ~ /^[0-9]+$/' file.txt          # 1st field is a number

3.3 Data Statistics and Summary

# Sum and count
awk '{sum += $1} END {print sum}' data.txt          # Sum of the 1st field
awk '{count++} END {print count}' file.txt          # Count of lines
awk '$3 > 50 {count++} END {print count}' data.txt  # Conditional count
# Average calculation
awk '{sum += $1; count++} END {print sum/count}' data.txt

4. Combining with Other Tools and Commands

4.1 Combining with grep

# First filter with grep, then process with awk
grep "ERROR" logfile | awk '{print $1, $5}'
# Implement grep functionality in awk
awk '/ERROR/ {print $0}' logfile
# Combine filtering and processing
ps aux | grep python | awk '{sum += $4} END {print "Memory usage:", sum "%"}'

4.2 Combining with sed

# Preprocess with sed, analyze with awk
sed 's/:/ /g' data.txt | awk '{print $1, $2}'
# Process sed output with awk
sed -n '1,100p' largefile.txt | awk '{print NR, $0}'
# Complex pipeline processing
cat logfile | sed 's/  */ /g' | awk '{print $1, $NF}'

4.3 Combining with sort/uniq

# Count frequency distribution
awk '{print $1}' access.log | sort | uniq -c | sort -nr
# Sort output in awk
awk '{count[$1]++} END {for (ip in count) print ip, count[ip]}' access.log | sort -k2nr

4.4 Application in Shell Scripts

#!/bin/bash
# System monitoring script
monitor_system() {
    # Add error handling
    if ! command -v free > /dev/null; then
        echo "Error: free command does not exist"
        return 1
    fi
    # Memory monitoring - more precise
    free -m | awk 'NR==2{        if($2 > 0)             printf "Memory usage: %.2f%%\n", $3*100/$2        else            print "Error: Unable to retrieve memory information"    }'
    # Disk monitoring - exclude special file systems
    df -h | awk '$1 !~ /(tmpfs|devtmpfs)/ && $5+0 > 80 {        print "Warning: "$1" usage "$5    }'
}

5. Advanced Use Cases

5.1 Report Generation

#!/bin/bash
# System report generator
generate_system_report() {
    local report_file="/tmp/system_report_$(date +%Y%m%d).txt"
    cat > "$report_file" << 'EOF'
=== System Report ===
Generated Time: $(date)
EOF
    # Add user statistics
    echo "=== User Statistics ===" >> "$report_file"
    awk -F: '    $3 >= 1000 { regular_users++ }    $3 < 1000 { system_users++ }    END {        print "Regular Users:", regular_users        print "System Users:", system_users        print "Total Users:", regular_users + system_users    }' /etc/passwd >> "$report_file"
    # Add process statistics
    echo -e "\n=== Process Statistics ===" >> "$report_file"
    ps aux | awk '    NR>1 {        cpu_sum += $3        mem_sum += $4        user_count[$1]++    }    END {        print "Total CPU Usage:", cpu_sum "%"        print "Total Memory Usage:", mem_sum "%"        print "Active User Count:", length(user_count)    }' >> "$report_file"
    echo "Report generated: $report_file"
}

5.2 Comparison with Other Commands

# awk vs cut
awk '{print $1}' file.txt        # More flexible, supports conditions
cut -d' ' -f1 file.txt           # Simpler, better performance
# awk vs sed
awk '/pattern/ {print $1}'       # Strong field processing capability
sed -n '/pattern/p'              # Simple line filtering
# awk vs Perl/Python
awk '{sum+=$1} END{print sum}'   # Simple data summary
# More complex data processing is better suited for Perl or Python

6. Conclusion

By mastering the awk command in depth, you can build powerful text data processing pipelines. Whether for simple field extraction or complex data analysis, awk provides efficient and flexible solutions. As a complete programming language, the investment in learning awk will yield significant returns in daily work.

#awk command #Linux Triad #Daily Linux #Linux text commands #Linux operation and maintenance commands【If there are any omissions, please correct them!】

Leave a Comment