The 24 Most Common Performance Tools for Linux Developers (In-Depth Guide)

๐ŸŒŸ The 24 Most Common Performance Tools for Linux Developers (In-Depth Guide)

โ€”โ€” Mastering everything from system bottlenecks to DPDK tuning in one article

In the world of Linux, performance analysis is a “core skill” for developers, operations engineers, and architects. Whether you are tuning large-scale servers, optimizing network forwarding performance (such as DPDK), or analyzing system stuttering issues, various performance tools are your most reliable partners.

This article will take you through an in-depth understanding of the 24 most commonly used and valuable Linux performance tools โ€” covering multiple aspects including CPU, memory, disk, network, kernel, and containers. Additionally, it includes: Best Practices for DPDK System Tuning.

๐Ÿงญ Table of Contents

  1. โญ Basic Understanding of Linux Performance Analysis

  2. โญ Introduction to 24 Core Performance Tools (Categorized by Domain)

  3. โญ Tool Comparison Table (CPU/Memory/Network/IO/Kernel)

  4. โญ Performance Analysis Thinking: How to Identify Bottlenecks?

  5. โญ Dedicated Chapter on DPDK Tuning: NUMA, Hugepage, isolcpus, RPS/XPS

  6. โญ Summary: Build Your Own Performance Analysis Methodology

๐Ÿฅ‡ 1. Basic Understanding of Linux Performance Analysis

Although there are many performance tools in Linux, they can be categorized into five main directions:

Category Focus Core Issues
CPU Scheduling, context switching, hot functions What is the CPU busy with? Who is consuming the most?
Memory Memory usage, page faults, NUMA Why is memory insufficient/unbalanced?
IO (Disk) IOPS, latency, queue depth Why is the disk slow?
Network Bandwidth, packet loss, queues, IRQ Where is the network going? Where is the bottleneck?
Kernel/System Scheduling, locks, preemption, soft interrupts Why is the system stuttering?

๐Ÿ‘‡ Next, we will officially enter the explanation of all tools.

๐Ÿ† 2. The 24 Most Common Performance Tools for Linux (In-Depth Analysis)

๐Ÿ”ฅ 1. CPU Performance Tools (8 Tools)

โ‘  top / htop โ€” Overall CPU Usage

๐Ÿงญ Location: Is the system really “busy”?

The most commonly used resource monitoring tool, suitable for quickly assessing system status:

  • load average

  • CPU usage (user/system/idle)

  • Process usage

  • Context switching (htop can display)

๐Ÿ‘‰ Applicable Scenarios

  • Preliminary check for system bottlenecks

  • Whether the CPU is occupied by IRQ or kernel

โ‘ก mpstat โ€” Multi-Core CPU Utilization Analysis

๐Ÿงญ Location: Is the load balanced across CPU cores?

mpstat -P ALL 1

It can show:

  • Proportion of user/sys/irq/softirq for each core

  • Whether there is a problem of “some cores being particularly busy”

  • Especially important for observing isolated cores in DPDK

โ‘ข pidstat โ€” View CPU/Context Switching for a Specific Process

pidstat -w -p <pid>

Can locate:

  • Whether the scheduling overhead for a specific thread is large

  • Whether the process context switching is abnormally high (common bottleneck!!)

โ‘ฃ mpstat / sar (CPU Subset) โ€” Historical CPU Analysis

<span>sar -u 1</span> can track historical CPU trends.

โ‘ค perf โ€” Function-Level CPU Flame Graph, Hotspot Analysis (The Strongest Tool)

๐Ÿ”ฅ The King of Linux Performance Analysis

It can do a lot of things:

  • Sampling flame graphs

  • Kernel/user-space hot functions

  • Analyze cache misses, branch prediction failures

  • Analyze soft interrupt overhead

Example:

perf top -C 1
perf record -F 99 -g -- sleep 10
perf report

โ‘ฅ ftrace โ€” Kernel Tracing

Can trace:

  • Scheduling delays

  • Kernel function execution flow

  • ISR, interrupt handling flow

Suitable for analyzing:

  • System stuttering

  • Delays caused by kernel locks

โ‘ฆ trace-cmd

A more user-friendly wrapper tool than ftrace.

โ‘ง turbostat (x86 only)

View CPU C-state/P-state suitable for analyzing performance issues caused by CPU power saving.

๐Ÿ”ฅ 2. Memory Analysis Tools (5 Tools)

โ‘จ free / /proc/meminfo

Basic memory viewing tool.

Important metrics:

  • available memory

  • buffers/cached

  • slab

  • hugepages usage (must check for DPDK)

โ‘ฉ vmstat โ€” Memory + IO + Scheduling Combined

vmstat 1

Important fields:

  • si/so โ†’ swap (system is overwhelmed)

  • cs โ†’ context switching

  • bi/bo โ†’ IO

โ‘ช slabtop โ€” Kernel Object Memory Statistics

Can check whether the kernel is running out of memory due to a specific object (e.g., TCP skbuffer).

โ‘ซ numastat โ€” NUMA Imbalance Analysis

Key tool for DPDK tuning.

Can check:

  • Whether memory is crossing NUMA

  • Whether numa_miss / numa_foreign is too high

โ‘ฌ smem / pmap โ€” Process Memory Analysis

๐Ÿ”ฅ 3. Disk/IO Tools (4 Tools)

โ‘ญ iostat โ€” IO Latency/IOPS View

iostat -x 1

Focus fields:

  • r/s, w/s

  • await (latency)

  • util (device utilization)

โ‘ฎ pidstat -d โ€” Single Process IO

Can locate “which process is doing excessive IO”.

โ‘ฏ blktrace โ€” Low-Level IO Events

Used for in-depth analysis of IO queue behavior.

โ‘ฐ fio โ€” Load Testing Tool

Commonly used for SSD/HDD performance testing.

๐Ÿ”ฅ 4. Network Performance Tools (5 Tools)

โ‘ฑ ss โ€” More Advanced netstat

ss -s
ss -tulnp

Can see socket, queue, TCP state, etc.

โ‘ฒ ethtool โ€” Network Card Capabilities, Queues, Interrupt Information

Commonly checked before disabling RPS/XPS/TSO for DPDK:

ethtool -S eth0
ethtool -l eth0
ethtool -k eth0

โ‘ณ tcpdump โ€” Packet Capture Troubleshooting

ใ‰‘ nstat / dropwatch โ€” Packet Loss Analysis

Can check:

  • Packet loss at the network card level

  • Packet loss at the kernel level (skb failure)

  • Packet loss location before DPDK driver

ใ‰’ nicstat / iperf

Network bandwidth testing tools.

๐Ÿ”ฅ 5. System and Kernel Tools (5 Tools)

ใ‰“ dstat โ€” Comprehensive Monitoring

All-in-one for CPU + Memory + IO + Network.

ใ‰” ps / pstree โ€” Process and Thread Structure

Used in conjunction with perf, pidstat to troubleshoot issues.

ใ‰• strace โ€” System Call Tracing

Applicable for:

  • Why is the program stuck?

  • Why is a certain line of code executing slowly?

  • Which syscall is blocking the program?

ใ‰– lsof โ€” File Handle Tracking

Can check:

  • Whether a program is leaking file handles

  • Which process is occupying a certain port

ใ‰— sysctl โ€” Modify System Kernel Parameters

Used for performance optimization, such as:

  • Network queue length

  • Scheduler parameters

  • Kernel memory management parameters

๐Ÿงฉ 3. Tool Comparison Chart

๐Ÿ”ถ Classified by Business Scenario (Recommended to Save)

Scenario Most Common Tools
CPU Hotspot Analysis perf, ftrace, mpstat
Memory Leak/Anomalies smem, vmstat, slabtop
Network Packet Loss ethtool, nstat, dropwatch
System Stuttering perf, trace-cmd, vmstat
Kernel Analysis ftrace, trace-cmd
Process Debugging strace, lsof, pidstat

๐Ÿ”ถ Domain Distribution Chart (Suitable for Public Account Illustration)

๐ŸŸฆ CPU๐ŸŸฅ Memory๐ŸŸจ IO๐ŸŸฉ Network๐ŸŸช Kernel

CPU๏ผštop, mpstat, pidstat, perf, turbostat
ๅ†…ๅญ˜๏ผšfree, vmstat, numastat, smem, slabtop
IO๏ผšiostat, pidstat -d, blktrace, fio
็ฝ‘็ปœ๏ผšethtool, ss, nstat, dropwatch, iperf
ๅ†…ๆ ธ๏ผšftrace, trace-cmd, ps, strace, sysctl

๐Ÿš€ 4. Performance Analysis Thinking Model (Very Important)

99% of faults can be located using the following 5 steps:

โ‘  First determine “which type of resource” is busy?

  • CPU?

  • Memory?

  • Network?

  • Disk?

Tools: top, vmstat, iostat, sar

โ‘ก Find out “which process” is consuming resources?

Tools: ps, pidstat, top

โ‘ข Find out “which function/module” is the most time-consuming?

Tools: perf, ftrace, flame graph

โ‘ฃ Determine if it is a kernel bottleneck

Tools: trace-cmd, ftrace

โ‘ค Identify who is causing the business to slow down

  • TCP queuing?

  • Memory NUMA crossing nodes?

  • Disk latency?

  • CPU overwhelmed by interrupts?

  • Program logic bug?

๐Ÿš€ 5. DPDK Dedicated Tuning Guide (Including NUMA/IRQ Optimization)

โ‘  CPU Isolation isolcpus/nohz_full

isolcpus=1-10 nohz_full=1-10 rcu_nocbs=1-10

โœ” Avoid kernel scheduling interferenceโœ” Reduce scheduling overheadโœ” Bind DPDK lcore with taskset

โ‘ก Hugepage (Must)

DPDK relies on large pages to reduce TLB misses:

echo 4096 &gt; /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

โ‘ข Disable IRQ Balance

systemctl stop irqbalance

โ‘ฃ Disable RPS/XPS

echo 0 &gt; /sys/class/net/eth0/queues/rx-0/rps_cpus

โ‘ค NUMA Binding

--lcores "1@(0) 2@(1) 3@(1)"

โ‘ฅ BIOS Tuning

  • Disable C-State

  • Disable HT

  • Disable energy-saving policies

  • Enable large cache mode

โ‘ฆ Adjust UIO / VFIO Drivers

In DPDK scenarios:

  • vfio-pci (secure)

  • igb_uio (higher performance)

๐Ÿ”š 6. Summary: Build Your Own Performance Analysis System

Linux performance analysis is not about “the skills of a single tool”, but a complete system.

Leave a Comment