๐ The 24 Most Common Performance Tools for Linux Developers (In-Depth Guide)
โโ Mastering everything from system bottlenecks to DPDK tuning in one article
In the world of Linux, performance analysis is a “core skill” for developers, operations engineers, and architects. Whether you are tuning large-scale servers, optimizing network forwarding performance (such as DPDK), or analyzing system stuttering issues, various performance tools are your most reliable partners.
This article will take you through an in-depth understanding of the 24 most commonly used and valuable Linux performance tools โ covering multiple aspects including CPU, memory, disk, network, kernel, and containers. Additionally, it includes: Best Practices for DPDK System Tuning.
๐งญ Table of Contents
-
โญ Basic Understanding of Linux Performance Analysis
-
โญ Introduction to 24 Core Performance Tools (Categorized by Domain)
-
โญ Tool Comparison Table (CPU/Memory/Network/IO/Kernel)
-
โญ Performance Analysis Thinking: How to Identify Bottlenecks?
-
โญ Dedicated Chapter on DPDK Tuning: NUMA, Hugepage, isolcpus, RPS/XPS
-
โญ Summary: Build Your Own Performance Analysis Methodology
๐ฅ 1. Basic Understanding of Linux Performance Analysis
Although there are many performance tools in Linux, they can be categorized into five main directions:
| Category | Focus | Core Issues |
|---|---|---|
| CPU | Scheduling, context switching, hot functions | What is the CPU busy with? Who is consuming the most? |
| Memory | Memory usage, page faults, NUMA | Why is memory insufficient/unbalanced? |
| IO (Disk) | IOPS, latency, queue depth | Why is the disk slow? |
| Network | Bandwidth, packet loss, queues, IRQ | Where is the network going? Where is the bottleneck? |
| Kernel/System | Scheduling, locks, preemption, soft interrupts | Why is the system stuttering? |
๐ Next, we will officially enter the explanation of all tools.
๐ 2. The 24 Most Common Performance Tools for Linux (In-Depth Analysis)
๐ฅ 1. CPU Performance Tools (8 Tools)
โ top / htop โ Overall CPU Usage
๐งญ Location: Is the system really “busy”?
The most commonly used resource monitoring tool, suitable for quickly assessing system status:
-
load average
-
CPU usage (user/system/idle)
-
Process usage
-
Context switching (htop can display)
๐ Applicable Scenarios
-
Preliminary check for system bottlenecks
-
Whether the CPU is occupied by IRQ or kernel
โก mpstat โ Multi-Core CPU Utilization Analysis
๐งญ Location: Is the load balanced across CPU cores?
mpstat -P ALL 1
It can show:
-
Proportion of user/sys/irq/softirq for each core
-
Whether there is a problem of “some cores being particularly busy”
-
Especially important for observing isolated cores in DPDK
โข pidstat โ View CPU/Context Switching for a Specific Process
pidstat -w -p <pid>
Can locate:
-
Whether the scheduling overhead for a specific thread is large
-
Whether the process context switching is abnormally high (common bottleneck!!)
โฃ mpstat / sar (CPU Subset) โ Historical CPU Analysis
<span>sar -u 1</span> can track historical CPU trends.
โค perf โ Function-Level CPU Flame Graph, Hotspot Analysis (The Strongest Tool)
๐ฅ The King of Linux Performance Analysis
It can do a lot of things:
-
Sampling flame graphs
-
Kernel/user-space hot functions
-
Analyze cache misses, branch prediction failures
-
Analyze soft interrupt overhead
Example:
perf top -C 1
perf record -F 99 -g -- sleep 10
perf report
โฅ ftrace โ Kernel Tracing
Can trace:
-
Scheduling delays
-
Kernel function execution flow
-
ISR, interrupt handling flow
Suitable for analyzing:
-
System stuttering
-
Delays caused by kernel locks
โฆ trace-cmd
A more user-friendly wrapper tool than ftrace.
โง turbostat (x86 only)
View CPU C-state/P-state suitable for analyzing performance issues caused by CPU power saving.
๐ฅ 2. Memory Analysis Tools (5 Tools)
โจ free / /proc/meminfo
Basic memory viewing tool.
Important metrics:
-
available memory
-
buffers/cached
-
slab
-
hugepages usage (must check for DPDK)
โฉ vmstat โ Memory + IO + Scheduling Combined
vmstat 1
Important fields:
-
si/so โ swap (system is overwhelmed)
-
cs โ context switching
-
bi/bo โ IO
โช slabtop โ Kernel Object Memory Statistics
Can check whether the kernel is running out of memory due to a specific object (e.g., TCP skbuffer).
โซ numastat โ NUMA Imbalance Analysis
Key tool for DPDK tuning.
Can check:
-
Whether memory is crossing NUMA
-
Whether numa_miss / numa_foreign is too high
โฌ smem / pmap โ Process Memory Analysis
๐ฅ 3. Disk/IO Tools (4 Tools)
โญ iostat โ IO Latency/IOPS View
iostat -x 1
Focus fields:
-
r/s, w/s
-
await (latency)
-
util (device utilization)
โฎ pidstat -d โ Single Process IO
Can locate “which process is doing excessive IO”.
โฏ blktrace โ Low-Level IO Events
Used for in-depth analysis of IO queue behavior.
โฐ fio โ Load Testing Tool
Commonly used for SSD/HDD performance testing.
๐ฅ 4. Network Performance Tools (5 Tools)
โฑ ss โ More Advanced netstat
ss -s
ss -tulnp
Can see socket, queue, TCP state, etc.
โฒ ethtool โ Network Card Capabilities, Queues, Interrupt Information
Commonly checked before disabling RPS/XPS/TSO for DPDK:
ethtool -S eth0
ethtool -l eth0
ethtool -k eth0
โณ tcpdump โ Packet Capture Troubleshooting
ใ nstat / dropwatch โ Packet Loss Analysis
Can check:
-
Packet loss at the network card level
-
Packet loss at the kernel level (skb failure)
-
Packet loss location before DPDK driver
ใ nicstat / iperf
Network bandwidth testing tools.
๐ฅ 5. System and Kernel Tools (5 Tools)
ใ dstat โ Comprehensive Monitoring
All-in-one for CPU + Memory + IO + Network.
ใ ps / pstree โ Process and Thread Structure
Used in conjunction with perf, pidstat to troubleshoot issues.
ใ strace โ System Call Tracing
Applicable for:
-
Why is the program stuck?
-
Why is a certain line of code executing slowly?
-
Which syscall is blocking the program?
ใ lsof โ File Handle Tracking
Can check:
-
Whether a program is leaking file handles
-
Which process is occupying a certain port
ใ sysctl โ Modify System Kernel Parameters
Used for performance optimization, such as:
-
Network queue length
-
Scheduler parameters
-
Kernel memory management parameters
๐งฉ 3. Tool Comparison Chart
๐ถ Classified by Business Scenario (Recommended to Save)
| Scenario | Most Common Tools |
|---|---|
| CPU Hotspot Analysis | perf, ftrace, mpstat |
| Memory Leak/Anomalies | smem, vmstat, slabtop |
| Network Packet Loss | ethtool, nstat, dropwatch |
| System Stuttering | perf, trace-cmd, vmstat |
| Kernel Analysis | ftrace, trace-cmd |
| Process Debugging | strace, lsof, pidstat |
๐ถ Domain Distribution Chart (Suitable for Public Account Illustration)
๐ฆ CPU๐ฅ Memory๐จ IO๐ฉ Network๐ช Kernel
CPU๏ผtop, mpstat, pidstat, perf, turbostat
ๅ
ๅญ๏ผfree, vmstat, numastat, smem, slabtop
IO๏ผiostat, pidstat -d, blktrace, fio
็ฝ็ป๏ผethtool, ss, nstat, dropwatch, iperf
ๅ
ๆ ธ๏ผftrace, trace-cmd, ps, strace, sysctl
๐ 4. Performance Analysis Thinking Model (Very Important)
99% of faults can be located using the following 5 steps:
โ First determine “which type of resource” is busy?
-
CPU?
-
Memory?
-
Network?
-
Disk?
Tools: top, vmstat, iostat, sar
โก Find out “which process” is consuming resources?
Tools: ps, pidstat, top
โข Find out “which function/module” is the most time-consuming?
Tools: perf, ftrace, flame graph
โฃ Determine if it is a kernel bottleneck
Tools: trace-cmd, ftrace
โค Identify who is causing the business to slow down
-
TCP queuing?
-
Memory NUMA crossing nodes?
-
Disk latency?
-
CPU overwhelmed by interrupts?
-
Program logic bug?
๐ 5. DPDK Dedicated Tuning Guide (Including NUMA/IRQ Optimization)
โ CPU Isolation isolcpus/nohz_full
isolcpus=1-10 nohz_full=1-10 rcu_nocbs=1-10
โ Avoid kernel scheduling interferenceโ Reduce scheduling overheadโ Bind DPDK lcore with taskset
โก Hugepage (Must)
DPDK relies on large pages to reduce TLB misses:
echo 4096 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
โข Disable IRQ Balance
systemctl stop irqbalance
โฃ Disable RPS/XPS
echo 0 > /sys/class/net/eth0/queues/rx-0/rps_cpus
โค NUMA Binding
--lcores "1@(0) 2@(1) 3@(1)"
โฅ BIOS Tuning
-
Disable C-State
-
Disable HT
-
Disable energy-saving policies
-
Enable large cache mode
โฆ Adjust UIO / VFIO Drivers
In DPDK scenarios:
-
vfio-pci (secure)
-
igb_uio (higher performance)
๐ 6. Summary: Build Your Own Performance Analysis System
Linux performance analysis is not about “the skills of a single tool”, but a complete system.