Practical Linux System Performance Tuning: Comprehensive Optimization Strategies from CPU to Memory

Source: Linux Technology Enthusiast

A deep reflection triggered by a production incident, a heartfelt summary from a veteran with 5 years of operations and maintenance experience.

Introduction: That Friday Night That Kept Me Awake

Do you remember that Friday night? Just as I was about to leave work, I received a monitoring alert: the CPU usage in the production environment soared to 95%, and the response time jumped from the usual 100ms to 3 seconds. User complaint calls were coming in like snowflakes, and my leader’s WeChat messages were flashing non-stop.

This was not the first time I encountered performance issues, but this time was different—it made me realize that merely restarting services and expanding machines was far from enough. A true operations engineer must possess system-level performance tuning capabilities.

After three sleepless nights of analysis and optimization, I not only resolved this crisis but also summarized a complete methodology for Linux performance tuning. Today, I will share it with you without reservation.

Part One: The “Way” and “Technique” of Performance Tuning

Tuning Philosophy: Problem-Driven vs. Proactive Prevention

Many operations colleagues fall into a misconception: they only think about tuning when problems arise. But true experts establish a comprehensive monitoring system and tuning mechanism before issues occur.

Golden Rules:

  • • Monitor first, tune later
  • • Analyze first, act later
  • • Backup first, modify later
  • • Validate first, go live later

The Four Levels of Performance Tuning

  1. 1. Hardware Level: CPU, memory, disk, network
  2. 2. Kernel Level: Scheduler, memory management, I/O subsystem
  3. 3. Application Level: Processes, threads, caching strategies
  4. 4. Business Level: Algorithm optimization, architectural adjustments

Part Two: Practical CPU Performance Tuning

2.1 “Observation, Inquiry, Diagnosis, and Treatment” of CPU Performance Issues

Symptom Identification

# View real-time CPU usage
top -p $(pgrep -d',' java)

# Check CPU context switches
vmstat 1 5

# Analyze CPU usage details
sar -u 1 10

# Check interrupt status
cat /proc/interrupts

Deep Diagnosis Script

#!/bin/bash
# cpu_analysis.sh - Deep analysis script for CPU performance

echo "=== CPU Basic Information ==="
lscpu | grep -E "(Architecture|CPU op-mode|Thread|Core|Socket)"

echo -e "\n=== Top 10 CPU Usage Processes ==="
ps aux --sort=-%cpu | head -11

echo -e "\n=== CPU Context Switch Analysis ==="
vmstat 1 3 | tail -2

echo -e "\n=== Interrupt Distribution ==="
grep -E "(CPU0|CPU1|CPU2|CPU3)" /proc/interrupts | head -10

echo -e "\n=== Load Average Trend ==="
uptime && cat /proc/loadavg

2.2 The Three Axes of CPU Tuning

First Axe: Process Priority Adjustment

# Check process priority
ps -eo pid,ni,pri,pcpu,comm --sort=-%cpu | head -10

# Adjust priority of critical processes (the lower the value, the higher the priority)
renice -10 $(pgrep nginx)
renice -5 $(pgrep mysql)

# Specify priority at startup
nice -n -10 ./critical_app

Second Axe: CPU Affinity Binding

# Check process CPU affinity
taskset -cp $(pgrep nginx)

# Bind nginx to CPU 0-3
taskset -cp 0-3 $(pgrep nginx)

# Bind database to CPU 4-7
taskset -cp 4-7 $(pgrep mysql)

# Interrupt load balancing
echo 2 > /proc/irq/24/smp_affinity  # Bind to CPU1

Third Axe: Kernel Parameter Optimization

# CPU scheduler optimization
echo 'kernel.sched_migration_cost_ns = 5000000' >> /etc/sysctl.conf
echo 'kernel.sched_autogroup_enabled = 0' >> /etc/sysctl.conf

# Disable unnecessary kernel features
echo 'kernel.nmi_watchdog = 0' >> /etc/sysctl.conf

# Apply configuration
sysctl -p

2.3 Practical Case: Resolving Abnormal CPU Usage Spikes

Problem Phenomenon: A certain web application’s CPU usage suddenly spiked from 20% to 90%

Investigation Process:

# 1. Locate high CPU process
top -c | head -20

# 2. Analyze CPU usage of threads within the process
top -H -p [PID]

# 3. Check system calls
strace -cp [PID]

# 4. Analyze call stack
perf top -p [PID]

Root Cause Analysis: It was found that a scheduled task caused an infinite loop

Solution:

  1. 1. Temporarily lower the process priority
  2. 2. Fix the code logic
  3. 3. Add monitoring alerts
  4. 4. Optimize task scheduling

Part Three: Practical Memory Performance Tuning

3.1 The “Seven Weapons” of Memory Performance

Weapon One: Memory Usage Analysis

# Memory overview
free -h

# Detailed memory information
cat /proc/meminfo | grep -E "(MemTotal|MemFree|MemAvailable|Buffers|Cached|SwapTotal|SwapFree)"

# Top 10 processes by memory usage
ps aux --sort=-%mem | head -11

# Memory fragmentation analysis
cat /proc/buddyinfo

Weapon Two: Swap Optimization Strategies

# Check Swap usage
swapon -s

# Adjust Swap usage tendency (0-100, the smaller the value, the more it tends to use physical memory)
echo 'vm.swappiness = 10' >> /etc/sysctl.conf

# Memory reclamation strategy
echo 'vm.vfs_cache_pressure = 50' >> /etc/sysctl.conf

# Dirty page writeback optimization
echo 'vm.dirty_ratio = 10' >> /etc/sysctl.conf
echo 'vm.dirty_background_ratio = 5' >> /etc/sysctl.conf

Weapon Three: Memory Allocation Optimization

# Over-allocation control
echo 'vm.overcommit_memory = 2' >> /etc/sysctl.conf
echo 'vm.overcommit_ratio = 80' >> /etc/sysctl.conf

# Huge page memory configuration
echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

# Transparent huge page optimization
echo never > /sys/kernel/mm/transparent_hugepage/enabled

3.2 Memory Leak Detection and Handling

Detection Script

#!/bin/bash
# memory_leak_detector.sh - Memory leak detection script

PID=$1
if [[ -z "$PID" ]]; then
    echo "Usage: $0 <pid>"
    exit 1
fi

echo "=== Memory Leak Detection Report ==="
echo "Process PID: $PID"
echo "Process Name: $(ps -p $PID -o comm=)"
echo "Start Time: $(date)"

# Record initial memory usage
INITIAL_MEM=$(ps -p $PID -o rss= | tr -d ' ')
echo "Initial Memory Usage: ${INITIAL_MEM}KB"

# Monitor memory changes
for i in {1..60}; do
    sleep 10
    CURRENT_MEM=$(ps -p $PID -o rss= | tr -d ' ')
    DIFF=$((CURRENT_MEM - INITIAL_MEM))
    echo "$(date +'%H:%M:%S') - Current Memory: ${CURRENT_MEM}KB, Growth: ${DIFF}KB"
    
    if [[ $DIFF -gt 100000 ]]; then # Growth exceeds 100MB
        echo "Warning: Possible memory leak detected!"
        # Generate memory mapping report
        pmap -d $PID > memory_map_$(date +%s).log
    fi
done

3.3 Practical Case: Resolving Java Application OutOfMemoryError

Problem Phenomenon: Java application frequently encounters OutOfMemoryError

Tuning Steps:

  1. 1. JVM Parameter Optimization
# Before optimization
java -Xms512m -Xmx2g -jar app.jar

# After optimization
java -Xms2g -Xmx4g \
     -XX:NewRatio=3 \
     -XX:SurvivorRatio=8 \
     -XX:+UseG1GC \
     -XX:MaxGCPauseMillis=200 \
     -XX:+HeapDumpOnOutOfMemoryError \
     -XX:HeapDumpPath=/tmp/heapdump.hprof \
     -jar app.jar
  1. 2. System Memory Parameter Adjustment
# Memory optimization for Java applications
echo 'vm.max_map_count = 262144' >> /etc/sysctl.conf
echo 'vm.min_free_kbytes = 131072' >> /etc/sysctl.conf

Part Four: Comprehensive Performance Tuning Strategies

4.1 I/O Performance Tuning

Disk I/O Optimization

# I/O scheduler optimization
echo noop > /sys/block/sda/queue/scheduler  # Use noop for SSD
echo deadline > /sys/block/sdb/queue/scheduler  # Use deadline for HDD

# Adjust I/O queue depth
echo 32 > /sys/block/sda/queue/nr_requests

# File system optimization
mount -o remount,noatime,nodiratime /dev/sda1 /

Network I/O Optimization

# TCP buffer optimization
echo 'net.core.rmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_rmem = 4096 65536 16777216' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_wmem = 4096 65536 16777216' >> /etc/sysctl.conf

# Connection count optimization
echo 'net.core.somaxconn = 32768' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_max_syn_backlog = 32768' >> /etc/sysctl.conf

4.2 One-Click Tuning Script

#!/bin/bash
# linux_performance_tuning.sh - One-click performance tuning script for Linux systems

set -e

echo "Starting Linux system performance tuning..."

# Backup original configuration
cp /etc/sysctl.conf /etc/sysctl.conf.backup.$(date +%Y%m%d_%H%M%S)

# CPU tuning
cat >> /etc/sysctl.conf << 'EOF'
# CPU performance tuning
kernel.sched_migration_cost_ns = 5000000
kernel.sched_autogroup_enabled = 0
kernel.nmi_watchdog = 0
EOF

# Memory tuning
cat >> /etc/sysctl.conf << 'EOF'
# Memory performance tuning
vm.swappiness = 10
vm.vfs_cache_pressure = 50
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5
vm.overcommit_memory = 2
vm.overcommit_ratio = 80
vm.max_map_count = 262144
vm.min_free_kbytes = 131072
EOF

# Network tuning
cat >> /etc/sysctl.conf << 'EOF'
# Network performance tuning
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 65536 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.somaxconn = 32768
net.ipv4.tcp_max_syn_backlog = 32768
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 3
EOF

# Apply configuration
sysctl -p

# I/O scheduler optimization
for disk in $(lsblk -d -o NAME | tail -n +2); do
    if [[ -e /sys/block/$disk/queue/rotational ]]; then
        if [[ $(cat /sys/block/$disk/queue/rotational) -eq 0 ]]; then
            echo noop > /sys/block/$disk/queue/scheduler
            echo "SSD $disk: Set scheduler to noop"
        else
            echo deadline > /sys/block/$disk/queue/scheduler
            echo "HDD $disk: Set scheduler to deadline"
        fi
    fi
done

echo "Performance tuning completed! It is recommended to restart the system to ensure all optimizations take effect."

Part Five: Monitoring and Continuous Optimization

5.1 Establishing a Performance Monitoring System

Key Metric Monitoring

#!/bin/bash
# performance_monitor.sh - Performance monitoring script

while true; do
    timestamp=$(date +'%Y-%m-%d %H:%M:%S')
    
    # CPU usage
    cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
    
    # Memory usage
    mem_total=$(free | grep Mem | awk '{print $2}')
    mem_used=$(free | grep Mem | awk '{print $3}')
    mem_usage=$(echo "scale=2; $mem_used * 100 / $mem_total" | bc)
    
    # Load average
    load_avg=$(uptime | awk -F'load average:' '{print $2}' | cut -d',' -f1 | tr -d ' ')
    
    # Disk I/O
    disk_io=$(iostat -x 1 2 | tail -1 | awk '{print $10}')
    
    # Network connection count
    conn_count=$(ss -ant | wc -l)
    
    echo "$timestamp,CPU:$cpu_usage%,MEM:$mem_usage%,LOAD:$load_avg,IO:$disk_io%,CONN:$conn_count" >> /var/log/performance.log
    
    sleep 60
done

5.2 Best Practices for Performance Tuning

Tuning Checklist

  • • Establish performance baselines
  • • Implement incremental tuning
  • • Ensure configuration backups
  • • Monitor tuning effects
  • • Document the tuning process
  • • Regularly review and optimize

Avoiding Common Pitfalls

  1. 1. Over-Tuning: Do not tune for the sake of tuning
  2. 2. Parameter Stacking: Avoid modifying multiple parameters simultaneously
  3. 3. Ignoring Monitoring: Must observe effects after tuning
  4. 4. Lack of Testing: Tuning in production requires thorough testing

Conclusion: The Path from “Firefighter” to “Architect”

Through these years of practical summary, I deeply realize that:A true operations engineer is not just a problem solver, but a problem preventer..

Performance tuning is not an overnight task; it requires:

  • Solid Theoretical Foundation: Understanding operating system principles
  • Rich Practical Experience: Accumulating experience in practice
  • Continuous Learning Attitude: Keeping up with technological advancements
  • Rigorous Work Methodology: Establishing standardized processes

Remember this formula:Performance Tuning = Theoretical Knowledge × Practical Experience × System Thinking

Reprint Statement: All reprinted articles must indicate the original source or reprint source (in cases where the reprint source does not indicate the original source), please contact for deletion if there is any infringement.

Leave a Comment