Source: Linux Technology Enthusiast
A deep reflection triggered by a production incident, a heartfelt summary from a veteran with 5 years of operations and maintenance experience.
Introduction: That Friday Night That Kept Me Awake
Do you remember that Friday night? Just as I was about to leave work, I received a monitoring alert: the CPU usage in the production environment soared to 95%, and the response time jumped from the usual 100ms to 3 seconds. User complaint calls were coming in like snowflakes, and my leader’s WeChat messages were flashing non-stop.
This was not the first time I encountered performance issues, but this time was different—it made me realize that merely restarting services and expanding machines was far from enough. A true operations engineer must possess system-level performance tuning capabilities.
After three sleepless nights of analysis and optimization, I not only resolved this crisis but also summarized a complete methodology for Linux performance tuning. Today, I will share it with you without reservation.
Part One: The “Way” and “Technique” of Performance Tuning
Tuning Philosophy: Problem-Driven vs. Proactive Prevention
Many operations colleagues fall into a misconception: they only think about tuning when problems arise. But true experts establish a comprehensive monitoring system and tuning mechanism before issues occur.
Golden Rules:
- • Monitor first, tune later
- • Analyze first, act later
- • Backup first, modify later
- • Validate first, go live later
The Four Levels of Performance Tuning
- 1. Hardware Level: CPU, memory, disk, network
- 2. Kernel Level: Scheduler, memory management, I/O subsystem
- 3. Application Level: Processes, threads, caching strategies
- 4. Business Level: Algorithm optimization, architectural adjustments
Part Two: Practical CPU Performance Tuning
2.1 “Observation, Inquiry, Diagnosis, and Treatment” of CPU Performance Issues
Symptom Identification
# View real-time CPU usage
top -p $(pgrep -d',' java)
# Check CPU context switches
vmstat 1 5
# Analyze CPU usage details
sar -u 1 10
# Check interrupt status
cat /proc/interrupts
Deep Diagnosis Script
#!/bin/bash
# cpu_analysis.sh - Deep analysis script for CPU performance
echo "=== CPU Basic Information ==="
lscpu | grep -E "(Architecture|CPU op-mode|Thread|Core|Socket)"
echo -e "\n=== Top 10 CPU Usage Processes ==="
ps aux --sort=-%cpu | head -11
echo -e "\n=== CPU Context Switch Analysis ==="
vmstat 1 3 | tail -2
echo -e "\n=== Interrupt Distribution ==="
grep -E "(CPU0|CPU1|CPU2|CPU3)" /proc/interrupts | head -10
echo -e "\n=== Load Average Trend ==="
uptime && cat /proc/loadavg
2.2 The Three Axes of CPU Tuning
First Axe: Process Priority Adjustment
# Check process priority
ps -eo pid,ni,pri,pcpu,comm --sort=-%cpu | head -10
# Adjust priority of critical processes (the lower the value, the higher the priority)
renice -10 $(pgrep nginx)
renice -5 $(pgrep mysql)
# Specify priority at startup
nice -n -10 ./critical_app
Second Axe: CPU Affinity Binding
# Check process CPU affinity
taskset -cp $(pgrep nginx)
# Bind nginx to CPU 0-3
taskset -cp 0-3 $(pgrep nginx)
# Bind database to CPU 4-7
taskset -cp 4-7 $(pgrep mysql)
# Interrupt load balancing
echo 2 > /proc/irq/24/smp_affinity # Bind to CPU1
Third Axe: Kernel Parameter Optimization
# CPU scheduler optimization
echo 'kernel.sched_migration_cost_ns = 5000000' >> /etc/sysctl.conf
echo 'kernel.sched_autogroup_enabled = 0' >> /etc/sysctl.conf
# Disable unnecessary kernel features
echo 'kernel.nmi_watchdog = 0' >> /etc/sysctl.conf
# Apply configuration
sysctl -p
2.3 Practical Case: Resolving Abnormal CPU Usage Spikes
Problem Phenomenon: A certain web application’s CPU usage suddenly spiked from 20% to 90%
Investigation Process:
# 1. Locate high CPU process
top -c | head -20
# 2. Analyze CPU usage of threads within the process
top -H -p [PID]
# 3. Check system calls
strace -cp [PID]
# 4. Analyze call stack
perf top -p [PID]
Root Cause Analysis: It was found that a scheduled task caused an infinite loop
Solution:
- 1. Temporarily lower the process priority
- 2. Fix the code logic
- 3. Add monitoring alerts
- 4. Optimize task scheduling
Part Three: Practical Memory Performance Tuning
3.1 The “Seven Weapons” of Memory Performance
Weapon One: Memory Usage Analysis
# Memory overview
free -h
# Detailed memory information
cat /proc/meminfo | grep -E "(MemTotal|MemFree|MemAvailable|Buffers|Cached|SwapTotal|SwapFree)"
# Top 10 processes by memory usage
ps aux --sort=-%mem | head -11
# Memory fragmentation analysis
cat /proc/buddyinfo
Weapon Two: Swap Optimization Strategies
# Check Swap usage
swapon -s
# Adjust Swap usage tendency (0-100, the smaller the value, the more it tends to use physical memory)
echo 'vm.swappiness = 10' >> /etc/sysctl.conf
# Memory reclamation strategy
echo 'vm.vfs_cache_pressure = 50' >> /etc/sysctl.conf
# Dirty page writeback optimization
echo 'vm.dirty_ratio = 10' >> /etc/sysctl.conf
echo 'vm.dirty_background_ratio = 5' >> /etc/sysctl.conf
Weapon Three: Memory Allocation Optimization
# Over-allocation control
echo 'vm.overcommit_memory = 2' >> /etc/sysctl.conf
echo 'vm.overcommit_ratio = 80' >> /etc/sysctl.conf
# Huge page memory configuration
echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# Transparent huge page optimization
echo never > /sys/kernel/mm/transparent_hugepage/enabled
3.2 Memory Leak Detection and Handling
Detection Script
#!/bin/bash
# memory_leak_detector.sh - Memory leak detection script
PID=$1
if [[ -z "$PID" ]]; then
echo "Usage: $0 <pid>"
exit 1
fi
echo "=== Memory Leak Detection Report ==="
echo "Process PID: $PID"
echo "Process Name: $(ps -p $PID -o comm=)"
echo "Start Time: $(date)"
# Record initial memory usage
INITIAL_MEM=$(ps -p $PID -o rss= | tr -d ' ')
echo "Initial Memory Usage: ${INITIAL_MEM}KB"
# Monitor memory changes
for i in {1..60}; do
sleep 10
CURRENT_MEM=$(ps -p $PID -o rss= | tr -d ' ')
DIFF=$((CURRENT_MEM - INITIAL_MEM))
echo "$(date +'%H:%M:%S') - Current Memory: ${CURRENT_MEM}KB, Growth: ${DIFF}KB"
if [[ $DIFF -gt 100000 ]]; then # Growth exceeds 100MB
echo "Warning: Possible memory leak detected!"
# Generate memory mapping report
pmap -d $PID > memory_map_$(date +%s).log
fi
done
3.3 Practical Case: Resolving Java Application OutOfMemoryError
Problem Phenomenon: Java application frequently encounters OutOfMemoryError
Tuning Steps:
- 1. JVM Parameter Optimization
# Before optimization
java -Xms512m -Xmx2g -jar app.jar
# After optimization
java -Xms2g -Xmx4g \
-XX:NewRatio=3 \
-XX:SurvivorRatio=8 \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=200 \
-XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/tmp/heapdump.hprof \
-jar app.jar
- 2. System Memory Parameter Adjustment
# Memory optimization for Java applications
echo 'vm.max_map_count = 262144' >> /etc/sysctl.conf
echo 'vm.min_free_kbytes = 131072' >> /etc/sysctl.conf
Part Four: Comprehensive Performance Tuning Strategies
4.1 I/O Performance Tuning
Disk I/O Optimization
# I/O scheduler optimization
echo noop > /sys/block/sda/queue/scheduler # Use noop for SSD
echo deadline > /sys/block/sdb/queue/scheduler # Use deadline for HDD
# Adjust I/O queue depth
echo 32 > /sys/block/sda/queue/nr_requests
# File system optimization
mount -o remount,noatime,nodiratime /dev/sda1 /
Network I/O Optimization
# TCP buffer optimization
echo 'net.core.rmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_rmem = 4096 65536 16777216' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_wmem = 4096 65536 16777216' >> /etc/sysctl.conf
# Connection count optimization
echo 'net.core.somaxconn = 32768' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_max_syn_backlog = 32768' >> /etc/sysctl.conf
4.2 One-Click Tuning Script
#!/bin/bash
# linux_performance_tuning.sh - One-click performance tuning script for Linux systems
set -e
echo "Starting Linux system performance tuning..."
# Backup original configuration
cp /etc/sysctl.conf /etc/sysctl.conf.backup.$(date +%Y%m%d_%H%M%S)
# CPU tuning
cat >> /etc/sysctl.conf << 'EOF'
# CPU performance tuning
kernel.sched_migration_cost_ns = 5000000
kernel.sched_autogroup_enabled = 0
kernel.nmi_watchdog = 0
EOF
# Memory tuning
cat >> /etc/sysctl.conf << 'EOF'
# Memory performance tuning
vm.swappiness = 10
vm.vfs_cache_pressure = 50
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5
vm.overcommit_memory = 2
vm.overcommit_ratio = 80
vm.max_map_count = 262144
vm.min_free_kbytes = 131072
EOF
# Network tuning
cat >> /etc/sysctl.conf << 'EOF'
# Network performance tuning
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 65536 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.somaxconn = 32768
net.ipv4.tcp_max_syn_backlog = 32768
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 3
EOF
# Apply configuration
sysctl -p
# I/O scheduler optimization
for disk in $(lsblk -d -o NAME | tail -n +2); do
if [[ -e /sys/block/$disk/queue/rotational ]]; then
if [[ $(cat /sys/block/$disk/queue/rotational) -eq 0 ]]; then
echo noop > /sys/block/$disk/queue/scheduler
echo "SSD $disk: Set scheduler to noop"
else
echo deadline > /sys/block/$disk/queue/scheduler
echo "HDD $disk: Set scheduler to deadline"
fi
fi
done
echo "Performance tuning completed! It is recommended to restart the system to ensure all optimizations take effect."
Part Five: Monitoring and Continuous Optimization
5.1 Establishing a Performance Monitoring System
Key Metric Monitoring
#!/bin/bash
# performance_monitor.sh - Performance monitoring script
while true; do
timestamp=$(date +'%Y-%m-%d %H:%M:%S')
# CPU usage
cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
# Memory usage
mem_total=$(free | grep Mem | awk '{print $2}')
mem_used=$(free | grep Mem | awk '{print $3}')
mem_usage=$(echo "scale=2; $mem_used * 100 / $mem_total" | bc)
# Load average
load_avg=$(uptime | awk -F'load average:' '{print $2}' | cut -d',' -f1 | tr -d ' ')
# Disk I/O
disk_io=$(iostat -x 1 2 | tail -1 | awk '{print $10}')
# Network connection count
conn_count=$(ss -ant | wc -l)
echo "$timestamp,CPU:$cpu_usage%,MEM:$mem_usage%,LOAD:$load_avg,IO:$disk_io%,CONN:$conn_count" >> /var/log/performance.log
sleep 60
done
5.2 Best Practices for Performance Tuning
Tuning Checklist
- • Establish performance baselines
- • Implement incremental tuning
- • Ensure configuration backups
- • Monitor tuning effects
- • Document the tuning process
- • Regularly review and optimize
Avoiding Common Pitfalls
- 1. Over-Tuning: Do not tune for the sake of tuning
- 2. Parameter Stacking: Avoid modifying multiple parameters simultaneously
- 3. Ignoring Monitoring: Must observe effects after tuning
- 4. Lack of Testing: Tuning in production requires thorough testing
Conclusion: The Path from “Firefighter” to “Architect”
Through these years of practical summary, I deeply realize that:A true operations engineer is not just a problem solver, but a problem preventer..
Performance tuning is not an overnight task; it requires:
- • Solid Theoretical Foundation: Understanding operating system principles
- • Rich Practical Experience: Accumulating experience in practice
- • Continuous Learning Attitude: Keeping up with technological advancements
- • Rigorous Work Methodology: Establishing standardized processes
Remember this formula:Performance Tuning = Theoretical Knowledge × Practical Experience × System Thinking
Reprint Statement: All reprinted articles must indicate the original source or reprint source (in cases where the reprint source does not indicate the original source), please contact for deletion if there is any infringement.