The Ultimate Guide to Disk IO Monitoring in Linux: From iostat to iotop
π When the system lags, database queries slow down, or file transfers take too long, the issue often lies with disk IO. Master these IO monitoring tools to uncover performance bottlenecks!
π Disk IO: A Key Dimension of Performance Analysis
In the world of performance optimization, there is a classic saying: “CPU waits for IO, IO waits for disk“. When our applications experience performance issues, disk IO is often the most overlooked yet critical bottleneck.
Imagine these scenarios:
- β’ Database queries suddenly become abnormally slow
- β’ File server response times are increasing
- β’ Virtual machine startup speeds are as slow as a snail
- β’ Big data processing tasks are stalled
The root cause of these issues likely points in the same directionβdisk IO performance bottlenecks. Today, we will delve into the powerful IO monitoring tools in Linux, from the basic iostat to modern visualization tools, helping you build a complete IO performance monitoring system.
π First Generation: iostat – The Pioneer of IO Statistics
Background
iostat (Input/Output Statistics) is a core component of the sysstat toolkit. Since its first release in 1999, it has been the preferred tool for Linux system administrators for IO performance analysis. It provides system-level disk IO statistics and is the cornerstone for understanding system IO behavior.
Core Functions and Output Interpretation
# Basic usage
iostat
# Display extended statistics
iostat -x
# Update every 2 seconds, showing 5 times
iostat -x 2 5
Detailed Output Analysis
avg-cpu: %user %nice %system %iowait %steal %idle
2.50 0.00 1.25 15.75 0.00 80.50
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 45.20 28.60 1840.5 1145.2 2.40 5.80 5.05 16.88 8.50 12.30 0.85 40.73 40.04 4.50 33.20
sda 0.20 2.40 8.0 96.0 0.00 0.20 0.00 7.69 15.00 25.50 0.06 40.00 40.00 12.50 3.00
Key Metrics Explained:
CPU Section:
- β’ %iowait: Percentage of time the CPU waits for IO operations to complete (critical metric)
- β’ %system: System CPU usage percentage
- β’ %idle: Percentage of CPU idle time
Disk Section:
- β’ r/s, w/s: Number of read/write requests per second
- β’ rkB/s, wkB/s: Amount of data read/written per second (KB)
- β’ r_await, w_await: Average read/write wait time (milliseconds)
- β’ %util: Device utilization (critical performance bottleneck metric)
- β’ aqu-sz: Average queue length
iostat Best Practices
# 1. System IO overview monitoring
iostat -x 1
# 2. Specific device monitoring
iostat -x nvme0n1 2
# 3. Show only disk statistics (no CPU)
iostat -d 2
# 4. Display NFS statistics
iostat -n 2
# 5. Generate IO performance report
#!/bin/bash
echo "=== IO Performance Report $(date) ==="
iostat -x 1 10 | awk '
/avg-cpu/ { getline; cpu_iowait = $4 }
/Device/ {
getline;
while(getline && NF > 0) {
if($14 > 80) print "Warning: " $1 " utilization too high: " $14 "%"
if($10 > 100) print "Warning: " $1 " write latency too high: " $10 "ms"
if($9 > 50) print "Warning: " $1 " read latency too high: " $9 "ms"
}
}
END {
if(cpu_iowait > 20) print "Warning: CPU IO wait too high: " cpu_iowait "%"
}
'
# 6. Historical data collection script
#!/bin/bash
# Long-term IO performance monitoring
LOG_FILE="/var/log/iostat_$(date +%Y%m%d).log"
while true; do
echo "=== $(date) ===" >> $LOG_FILE
iostat -x 1 1 >> $LOG_FILE
sleep 300 # Record every 5 minutes
done
Performance Analysis Guide
Health Status Indicators:
- β’ %util < 80%: Device utilization is normal
- β’ iowait < 10%: IO wait time is reasonable
- β’ await < 20ms: Response time is good
- β’ aqu-sz < 2: Queue length is moderate
Identifying Performance Bottlenecks:
- β’ %util > 90%: Disk is nearing saturation
- β’ iowait > 20%: CPU spending a lot of time waiting for IO
- β’ await > 100ms: IO response time is too long
- β’ rrqm/s + wrqm/s very low: Too much random IO, low merge rate
π― Second Generation: iotop – The Revolution of Process-Level IO Monitoring
Technical Breakthrough
The emergence of iotop fills the gap in process-level IO monitoring in Linux systems. It borrows the interface design concept from the top command but focuses on IO performance analysis, allowing administrators to quickly identify which processes are consuming IO resources.
Installation and Basic Usage
# Install on Ubuntu/Debian
sudo apt install iotop
# Install on CentOS/RHEL
sudo yum install iotop
# or sudo dnf install iotop
# Requires root privileges
sudo iotop
Interface Interpretation and Features
Total DISK READ : 12.45 M/s | Total DISK WRITE : 8.67 M/s
Actual DISK READ: 12.45 M/s | Actual DISK WRITE: 8.67 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO% COMMAND
1234 be/4 mysql 8.45 M/s 2.34 M/s 0.00 % 85.20 % mysqld --defaults-file=/etc/mysql/my.cnf
5678 be/4 root 2.34 M/s 4.56 M/s 0.00 % 45.67 % python3 /opt/backup/backup_script.py
9012 be/4 www-data 1.23 M/s 1.45 M/s 0.00 % 23.45 % nginx: worker process
3456 be/4 user 0.43 M/s 0.32 M/s 0.00 % 12.34 % rsync -av /home/user/docs/ /backup/
Interface Elements Explained:
- β’ Total DISK READ/WRITE: Total disk read/write speed of the system
- β’ Actual DISK READ/WRITE: Actual disk read/write speed (excluding cache)
- β’ TID: Thread ID
- β’ PRIO: IO scheduling priority (be=best effort, rt=real time, idle=idle)
- β’ IO%: Percentage of IO time for that process
- β’ COMMAND: Process command line
iotop Advanced Usage and Practical Applications
# 1. Show only processes with IO activity
sudo iotop -o
# 2. Show processes instead of threads
sudo iotop -P
# 3. Show cumulative IO statistics
sudo iotop -a
# 4. Set update interval (seconds)
sudo iotop -d 0.5
# 5. Batch mode (suitable for scripts)
sudo iotop -b -n 5
# 6. Show only processes of a specific user
sudo iotop -u mysql
# 7. Interactive mode shortcuts
# o - Toggle to show only processes with IO
# p - Toggle process/thread display mode
# a - Toggle between cumulative/current IO
# r - Reverse sort
# q - Exit
# 8. IO hotspot process monitoring script
#!/bin/bash
echo "=== IO Hotspot Process Monitoring $(date) ==="
sudo iotop -b -n 1 -o | head -20 | awk '
NR > 3 && $4 != "0.00" || $5 != "0.00" {
printf "Process: %-20s Read: %8s Write: %8s IO%%: %6s\n", $7, $4, $5, $6
}
'
# 9. Database IO analysis script
#!/bin/bash
# Specifically monitor IO activities related to databases
sudo iotop -b -n 1 | grep -E "(mysql|postgres|mongodb|redis)" |
while read line; do
echo "Database IO: $line"
# Can add alert logic
done
# 10. Establishing IO performance baseline
#!/bin/bash
# Establish system IO performance baseline
BASELINE_FILE="/tmp/io_baseline_$(date +%Y%m%d_%H%M%S).log"
echo "Establishing IO baseline, monitoring for 30 minutes..."
for i in {1..360}; do
echo "=== Sample #${i} $(date) ===" >> $BASELINE_FILE
sudo iotop -b -n 1 -o >> $BASELINE_FILE
sleep 5
done
echo "Baseline data saved to: $BASELINE_FILE"
Fault Diagnosis Practical Cases
# Case 1: Database Performance Issue Diagnosis
echo "=== Database IO Performance Analysis ==="
sudo iotop -b -n 5 -u mysql | awk '
/mysqld/ {
total_read += $4
total_write += $5
samples++
}
END {
if(samples > 0) {
print "MySQL Average Read Speed:", total_read/samples, "MB/s"
print "MySQL Average Write Speed:", total_write/samples, "MB/s"
if(total_read/samples > 100) print "Warning: MySQL read IO too high"
if(total_write/samples > 50) print "Warning: MySQL write IO too high"
}
}
'
# Case 2: Backup Task IO Impact Analysis
#!/bin/bash
echo "Analyzing the impact of backup tasks on system IO..."
BEFORE=$(sudo iotop -b -n 1 | awk '/Total DISK/ {print $4}' | head -1)
echo "System IO before backup: $BEFORE"
# Start backup task
sudo rsync -av /data/ /backup/ &
BACKUP_PID=$!
sleep 10
DURING=$(sudo iotop -b -n 1 | awk '/Total DISK/ {print $4}' | head -1)
echo "System IO during backup: $DURING"
wait $BACKUP_PID
AFTER=$(sudo iotop -b -n 1 | awk '/Total DISK/ {print $4}' | head -1)
echo "System IO after backup: $AFTER"
π Third Generation: Modern IO Monitoring Ecosystem
iftop – Visual Monitoring of Network IO
Although primarily used for network monitoring, iftop plays an important role in IO analysis, especially in network storage and distributed system environments.
# Install iftop
sudo apt install iftop # Ubuntu/Debian
sudo yum install iftop # CentOS/RHEL
# Monitor network interface
sudo iftop -i eth0
# Display port information
sudo iftop -P
# Do not perform DNS resolution (improves performance)
sudo iftop -n
iftop Interface Interpretation:
12.5Kb 25.0Kb 37.5Kb 50.0Kb 62.5Kb
βββββββββββββββββββββ΄ββββββββ΄ββββββββ΄ββββββββ΄ββββββββ΄
server1.example.com => client1.example.com 1.25Mb 2.34Mb 1.89Mb
<= 890Kb 1.45Mb 1.23Mb
server1.example.com => client2.example.com 567Kb 890Kb 678Kb
<= 234Kb 456Kb 345Kb
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
TX: cum: 15.6MB peak rate: 3.45Mb
RX: 8.9MB 2.34Mb
TOTAL: 24.5MB 5.79Mb
nmon – Comprehensive Performance Monitoring Tool
# Install nmon
sudo apt install nmon # Ubuntu/Debian
# Start nmon
nmon
# Shortcuts in nmon
# d - Disk IO statistics
# n - Network statistics
# m - Memory statistics
# c - CPU statistics
# t - Top processes
# q - Exit
# Data collection mode
nmon -f -s 30 -c 120 # Sample every 30 seconds for 1 hour
dstat – A Modern System Statistics Tool
# Install dstat
sudo apt install dstat
# Basic usage
dstat
# Detailed IO monitoring
dstat -d -D sda,sdb
# Comprehensive monitoring
dstat -cdngy 5
# Custom output format
dstat --output system_stats.csv 5
dstat Output Example:
--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read writ| recv send| in out | int csw
2 1 96 1 0| 45k 128k| 0 0 | 0 0 |1.2k 2.5k
3 2 94 1 0| 67k 156k| 234B 456B| 0 0 |1.4k 2.8k
1 1 97 1 0| 23k 89k| 123B 234B| 0 0 |1.1k 2.2k
π Tool Selection and Usage Strategies
Performance Monitoring Tool Comparison
| Tool | Applicable Scenarios | Advantages | Disadvantages |
|---|---|---|---|
| iostat | System-level IO monitoring | Lightweight, comprehensive, long history | No process-level details |
| iotop | Process-level IO analysis | Intuitive, real-time, easy to use | Requires root privileges |
| iftop | Network IO monitoring | Strong network visualization | Limited to network |
| nmon | Comprehensive performance monitoring | Comprehensive features, graphical | High learning cost |
| dstat | Multi-dimensional statistics | Flexible, customizable | Complex output |
Monitoring Strategy Recommendations
Daily Operations Monitoring:
# System health check script
#!/bin/bash
echo "=== System IO Health Check $(date) ==="
# 1. Check overall system IO status
echo "--- System IO Overview ---"
iostat -x 1 3 | tail -10
# 2. Check IO hotspot processes
echo "--- IO Hotspot Processes ---"
sudo iotop -b -n 1 -o | head -10
# 3. Check disk utilization
echo "--- Disk Utilization Alerts ---"
iostat -x 1 1 | awk '$NF > 80 {print "Warning: " $1 " utilization " $NF "%"}'
# 4. Check IO wait time
echo "--- CPU IO Wait Check ---"
iostat 1 1 | awk '/avg-cpu/ {getline; if($4 > 10) print "Warning: IO wait time too high " $4 "%"}'
Performance Bottleneck Diagnosis:
# Comprehensive IO performance diagnosis script
#!/bin/bash
echo "=== Deep IO Performance Diagnosis ==="
echo "1. System IO Load Assessment"
iostat -x 1 5
echo -e "\n2. Process IO Ranking"
sudo iotop -b -n 1 | head -20
echo -e "\n3. Disk Queue Depth Analysis"
cat /sys/block/*/queue/nr_requests
echo -e "\n4. IO Scheduler Status"
for disk in /sys/block/sd* /sys/block/nvme*; do
if [ -d "$disk" ]; then
echo "$(basename $disk): $(cat $disk/queue/scheduler)"
fi
done
echo -e "\n5. Filesystem IO Statistics"
cat /proc/diskstats | awk '{print $3, $4, $8}'
π Advanced IO Optimization Techniques
IO Scheduler Optimization
# View current IO scheduler
cat /sys/block/sda/queue/scheduler
# Temporarily change IO scheduler
echo noop > /sys/block/sda/queue/scheduler
# Permanently change (add to /etc/fstab or use udev rules)
echo 'ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/scheduler}="deadline"' > /etc/udev/rules.d/60-io-scheduler.rules
# Scheduler selection for different scenarios
# SSD: noop or deadline
# Traditional hard drives: cfq
# Database servers: deadline
# Desktop systems: cfq
Filesystem Tuning
# ext4 filesystem optimization
mount -o noatime,data=writeback /dev/sda1 /data
# XFS filesystem optimization
mount -o noatime,logbufs=8,logbsize=256k /dev/sda1 /data
# Adjust filesystem parameters
echo 5 > /proc/sys/vm/dirty_background_ratio
echo 10 > /proc/sys/vm/dirty_ratio
echo 3000 > /proc/sys/vm/dirty_writeback_centisecs
Application Layer Optimization Recommendations
# Database IO optimization checklist
echo "=== Database IO Optimization Recommendations ==="
echo "1. Check innodb_buffer_pool_size setting"
echo "2. Enable innodb_flush_method=O_DIRECT"
echo "3. Adjust innodb_log_file_size"
echo "4. Consider using SSD storage for redo log"
echo "5. Monitor slow query logs"
# Web application IO optimization
echo "=== Web Application IO Optimization ==="
echo "1. Enable static file caching"
echo "2. Use CDN to distribute static resources"
echo "3. Optimize image and resource sizes"
echo "4. Implement application-level caching"
echo "5. Consider using in-memory databases"
π Performance Benchmarking and Capacity Planning
IO Performance Benchmarking
# Use dd for basic IO testing
echo "=== Disk Write Performance Test ==="
dd if=/dev/zero of=/tmp/testfile bs=1M count=1024 oflag=direct
echo "=== Disk Read Performance Test ==="
dd if=/tmp/testfile of=/dev/null bs=1M iflag=direct
# Use fio for professional IO testing
sudo apt install fio
# Random read test
fio --name=random-read --rw=randread --size=1G --bs=4k --ioengine=libaio --iodepth=32 --direct=1
# Random write test
fio --name=random-write --rw=randwrite --size=1G --bs=4k --ioengine=libaio --iodepth=32 --direct=1
# Mixed read/write test
fio --name=mixed-rw --rw=randrw --rwmixread=70 --size=1G --bs=4k --ioengine=libaio --iodepth=32 --direct=1
Capacity Planning Script
#!/bin/bash
# IO capacity planning analysis
echo "=== IO Capacity Planning Analysis ==="
# Historical IO peak analysis
echo "--- IO Peak of the Past 7 Days ---"
for i in {1..7}; do
DATE=$(date -d "-$i day" +%Y%m%d)
if [ -f "/var/log/iostat_$DATE.log" ]; then
MAX_UTIL=$(grep -v '^$\|Device' /var/log/iostat_$DATE.log | awk '{if($14>max) max=$14} END {print max}')
echo "$(date -d "-$i day" +%m/%d): Highest Utilization ${MAX_UTIL}%"
fi
done
# Growth trend prediction
echo -e "\n--- IO Growth Trend Prediction ---"
CURRENT_AVG=$(iostat -x 1 10 | awk '/avg-cpu/ {count++} count>1&&/nvme/ {sum+=$14; samples++} END {if(samples>0) print sum/samples}')
echo "Current average utilization: ${CURRENT_AVG}%"
# Capacity planning recommendations
if (( $(echo "$CURRENT_AVG > 60" | bc -l) )); then
echo "Recommendation: Consider upgrading storage system or optimizing IO"
elif (( $(echo "$CURRENT_AVG > 40" | bc -l) )); then
echo "Recommendation: Monitor closely, prepare expansion plan"
else
echo "Recommendation: IO capacity is sufficient, continue monitoring"
fi
π Fault Diagnosis Practical Cases
Case One: Database Slow Query Issue
# Problem Phenomenon: MySQL query suddenly slows down
echo "=== Database Slow Query IO Diagnosis ==="
# 1. Check overall system IO status
echo "--- System IO Status ---"
iostat -x 1 5
# 2. Locate MySQL process IO usage
echo "--- MySQL IO Usage Analysis ---"
sudo iotop -b -n 5 -u mysql
# 3. Check disk utilization related to MySQL
echo "--- Data Directory Disk Status ---"
df -h /var/lib/mysql
iostat -x 1 3 | grep $(df /var/lib/mysql | tail -1 | awk '{print $1}' | sed 's/.*
//')
# 4. Analyze the correlation between slow queries and IO
echo "--- Slow Query and IO Correlation Analysis ---"
sudo tail -100 /var/log/mysql/mysql-slow.log | grep -A 5 -B 5 "Query_time"
Case Two: Slow System Startup
# System Startup IO Bottleneck Analysis
echo "=== Startup Process IO Analysis ==="
# 1. Immediately check IO status after boot
systemd-analyze blame | head -10
# 2. Check system service IO usage
sudo iotop -b -n 10 | grep systemd
# 3. Analyze disk utilization during startup
echo "--- Disk Status During Startup ---"
dmesg | grep -i "ata\|scsi\|nvme" | tail -20
# 4. Optimization Recommendations
echo "=== Optimization Recommendations ==="
echo "1. Check startup services"
echo "2. Consider using SSD"
echo "3. Optimize filesystem mount options"
echo "4. Adjust IO scheduler"
Case Three: Backup Task Impact on Performance
# Backup Task IO Impact Assessment
echo "=== Backup Task IO Impact Analysis ==="
# Monitor IO status before, during, and after backupackup_io_monitor() {
local phase=$1
echo "--- $phase IO Status ---"
iostat -x 1 3 | tail -10
echo "--- $phase Hotspot Processes ---"
sudo iotop -b -n 1 -o | head -10
echo "--- $phase System Load ---"
uptime
echo ""
}
echo "Starting to monitor the impact of backup tasks on IO..."
backup_io_monitor "Before Backup"
# Start backup task (example)
echo "Starting backup task..."
# rsync -av --progress /data/ /backup/ &
# BACKUP_PID=$!
sleep 30
backup_io_monitor "During Backup"
# wait $BACKUP_PID
sleep 30
backup_io_monitor "After Backup"
echo "=== Optimization Recommendations ==="
echo "1. Use ionice to lower backup process IO priority"
echo "2. Perform backups during off-peak hours"
echo "3. Use incremental backups to reduce IO"
echo "4. Consider using a dedicated backup network"
π― Future Outlook: IO Monitoring in the Cloud-Native Era
IO Monitoring in Container Environments
# Docker Container IO Monitoring
docker stats --format "table {{.Container}}\t{{.BlockIO}}\t{{.MemUsage}}" --no-stream
# K8s Pod IO Monitoring
kubectl top pods --sort-by=cpu
kubectl describe node | grep -A 5 "Allocated resources"
# cAdvisor Integrated Monitoring
curl http://localhost:8080/api/v1.3/containers/
Cloud Storage IO Monitoring
- β’ AWS CloudWatch: EBS volume IO monitoring
- β’ Azure Monitor: Disk performance metrics
- β’ Google Cloud Monitoring: Persistent disk monitoring
- β’ Prometheus + Grafana: Self-built monitoring stack
Emerging Technology Trends
- β’ NVMe over Fabrics: Networked NVMe storage
- β’ Storage Class Memory: Storage-class memory technology
- β’ AI-driven Performance Optimization: Intelligent IO scheduling
- β’ Edge Computing Storage: Distributed storage architecture
π Summary and Best Practices
Linux disk IO monitoring is a key aspect of system performance optimization. From traditional iostat to modern visualization tools, each tool has its unique value and application scenarios.
Monitoring Strategy Recommendations
- 1. Layered Monitoring: System-level β Process-level β Application-level
- 2. Key Metrics: Utilization, wait time, queue depth, IOPS
- 3. Alert Mechanism: Set reasonable thresholds and notification methods
- 4. Historical Analysis: Establish performance baselines, trend analysis
- 5. Continuous Optimization: Regularly evaluate and adjust monitoring strategies
Recommended Tool Combinations
- β’ Daily Monitoring: iostat + iotop
- β’ In-depth Analysis: nmon + dstat
- β’ Automation: Script integration + alerts
- β’ Visualization: Grafana + Prometheus
- β’ Fault Diagnosis: Comprehensive use of all tools
Remember, optimizing IO performance is not a one-time process; it requires continuous monitoring, analysis, and tuning. Master these monitoring tools to handle IO performance issues with ease and become a true Linux performance optimization expert!
What IO performance issues have you encountered in your daily work? Share your experiences and solutions in the comments!
#Linux #PerformanceOptimization #DiskIO #Operations #SystemMonitoring #DatabaseOptimization