Golden Rules for Linux Performance Tuning: Addressing CPU, Memory, and I/O Bottlenecks

Introduction

Starting with a painful downtime incident.

The response time of the online business system skyrocketed from 200ms to 8 seconds, yet the CPU usage (65%), memory (30% free), and disk I/O appeared “normal”.

The core issue: judging system health based on a single metric ignores the complexity of Linux performance, which is a complex symphony.

Background Explanation

Why is performance tuning so important?

The Real Cost of Performance Issues

For every additional second in page load time, the conversion rate drops by 7%; 53% of mobile users will abandon a page that takes more than 3 seconds to load; severe performance failures can lead to millions in business losses.

Typical Performance Bottleneck Scenarios

1Traffic surges during major e-commerce promotions (Double 11/618, traffic increases 10-20 times compared to normal)
2Database slow queries causing avalanches (an unoptimized SQL query can bring down the system)
3Memory leaks as a chronic poison (Full GC and memory overflow in Java applications)
4I/O bottlenecks as invisible killers (performance drops sharply during log writing and data backups)

Core Methodology

Three-step positioning method.

Step One: Global Scan

Quick diagnosis in 10 minutes.

The Golden Three Commands (can be encapsulated as an alias health):

uptime: check load trends
dmesg | tail: check system logs
vmstat 1: check overall resource usage

Load Judgment Techniques:

1-minute load > 5-minute > 15-minute: problem worsening
15-minute load > 5-minute > 1-minute: problem alleviating

Step Two: Layered Deep Dive

Precisely locate bottlenecks.

CPU Bottleneck Positioning

Three Analysis Tools:

top: check overall CPU usage
mpstat -P ALL 1: check usage of each CPU core
pidstat -u 1: check CPU usage by process

Real case: 8-core server total CPU usage at 2.5% but slow response → mpstat revealed a single core at 100% utilization (single-threaded program bottleneck).

Solution: Use taskset to bind CPU cores; refactor the program to be multi-threaded.

Memory Bottleneck Positioning

Combination Analysis:

free -h: check memory overview
cat /proc/meminfo: detailed memory information
slabtop: kernel memory usage

Pitfalls to Avoid: Free showing little remaining ≠ insufficient memory (Linux caches free memory).

Correct Judgment: Available memory = free + buffers + cache; sar -r 1 to check memory trends; sar -w 1 to check swap trends (frequent swap triggers indicate insufficient memory).

Optimization Techniques:

Adjust swappiness (recommended to set below 10): echo 10 > /proc/sys/vm/swappiness
Clear cache (use with caution): sync && echo 3 > /proc/sys/vm/drop_caches
Large page memory optimization: echo 2048 > /proc/sys/vm/nr_hugepage

I/O Bottleneck Positioning

Analysis Tools:

iostat -x 1: disk I/O statistics
blktrace: I/O tracing tool

Key Metric Interpretation:

%util: sustained 100% disk usage → disk saturation
await: average wait time exceeding 10ms needs attention
r_await/w_await: read/write latency, to determine if it’s a read/write issue

Real case: MySQL server %util at 50% but await at 200ms → large number of random small I/Os causing issues → adjusted innodb_flush_method + increased SSD cache.

Step Three: Comprehensive Tuning

Systematic solutions.

Core Parameter Optimization Checklist:

Network optimization (high concurrency scenarios): adjust net.ipv4.tcp_max_syn_backlog and other kernel parameters
File system optimization: set soft nofile / hard nofile (number of file handles)

Experience Sharing

My tuning toolbox.

1Establishing Performance Baselines: Use sar to establish a 7×24 hour performance baseline (/usr/lib64/sa/sa1 1 1 collects data every minute, sa2 -A generates daily reports)
2Automated Alert Scripts: Monitor load average, automatically collect top / iostat data when thresholds are exceeded
3Stress Testing and Validation: Use stress (CPU/memory stress testing), fio (I/O stress testing) to validate tuning effects

Trends and Extensions

The future of performance tuning.

1eBPF: A revolution in performance analysis: achieving finer-grained performance monitoring without modifying code (e.g., bpf_trace to trace system call latency)
2Intelligent Operations: Combining machine learning for performance prediction, automatic tuning, and root cause analysis of anomalies
3Challenges in Cloud-Native Environments: New dimensions such as container resource limits, container network performance, and K8s scheduling optimization

Conclusion

Continuous optimization, never-ending.

Performance tuning is not a one-time task, but a cycle of “establishing monitoring → setting baselines → continuous optimization → validating effects”.