- Introduction
- Detailed Explanation of General Methodologies for Diagnosing Linux System I/O Performance Issues
- Check Server Load
- View I/O Utilization
- Clearly Identify I/O Processes
- Supplement: Other Useful I/O Analysis Tools
- Conclusion
- References
Introduction
When you notice that certain I/O processes in the production environment are experiencing decreased read/write efficiency, for example:
- Increased query time in MySQL
- Slower file read/write efficiency
At this point, we can consider whether there are I/O performance bottlenecks in the Linux system. Below is a set of general methodologies for diagnosing I/O performance bottlenecks that I have compiled. To better reproduce this issue, I have also written a multi-threaded program in Java for data read/write operations. Readers can refer to the following code along with comments to understand this logic:
/**
* Start disk I/O operations to simulate high I/O load
* Simulate high disk I/O load by creating multiple I/O task threads
*/
private static void startDiskIOOperations() {
log.info("Starting high I/O disk operations...");
log.info("Run 'iostat -x 1' in another terminal to monitor disk utilization.");
// Create a fixed thread pool
executor = Executors.newFixedThreadPool(NUM_THREADS);
// Submit multiple tasks to continuously write to disk
for (int i = 0; i < NUM_THREADS; i++) {
executor.submit(new IOTask(i));
}
log.info("Disk I/O operations have started, using {} threads", NUM_THREADS);
}
/**
* Execute continuous write operations to simulate high I/O tasks
* This class is responsible for performing disk I/O operations by continuously writing and clearing files to simulate high I/O load
*/
static class IOTask implements Runnable {
private final int taskId;
public IOTask(int taskId) {
this.taskId = taskId;
}
@Override
public void run() {
// Each thread writes to its own temporary file
String filename = "/tmp/disk_io_test_" + taskId + ".tmp";
try (FileOutputStream fos = new FileOutputStream(filename)) {
log.info("Thread-{} is writing to {}", taskId, filename);
// Continuously write data to the file and clear it after each write
while (!Thread.currentThread().isInterrupted()) {
performDiskIOOperation(fos, taskId);
ThreadUtil.sleep(500);
}
} catch (IOException e) {
log.error("Thread-{} encountered an error: {}", taskId, e.getMessage());
}
}
}
/**
* Perform disk I/O operations: write a specified amount of data and then clear the file
* This method continuously writes data to the file and then clears the file content to simulate high I/O load
* @param fos File output stream
* @param taskId Task ID
* @throws IOException I/O exception
*/
private static void performDiskIOOperation(FileOutputStream fos, int taskId) throws IOException {
long startTime = System.currentTimeMillis();
// Write data (chunked writing)
long bytesWritten = 0;
while (bytesWritten < WRITE_SIZE) {
fos.write(DATA);
bytesWritten += DATA.length;
}
fos.flush(); // Force write to disk
// Clear file content
fos.getChannel().truncate(0);
long endTime = System.currentTimeMillis();
// Print the time taken for this operation
log.info("Thread-{} completed a write and clear operation, time taken: {} ms", taskId, (endTime - startTime));
}
Hi, I am sharkChili, a hardcore technology enthusiast and a Java coder. I am also a CSDN blog expert and one of the maintainers of the open-source project Java Guide. I am familiar with Java and have some knowledge of Go, and occasionally dabble in C source code. I have written many interesting technical blogs and am still researching and sharing technology. I hope my articles are helpful to you, and I warmly welcome you to follow my public account: SharkChili Coding.
Recently, I have received many private messages from readers, so I have created a group chat. Interested readers can obtain my contact information through the public account above to add me as a friend. Just click the note “Join Group” to engage in in-depth discussions with me and my friends.
Detailed Explanation of General Methodologies for Diagnosing Linux System I/O Performance Issues
Check Server Load
When we suspect an I/O bottleneck, the first thing to do is to check the current server’s CPU wait time for I/O tasks using the top command. Generally, a value below 20% is considered a reasonable normal threshold, while exceeding 30%-40% indicates that the system may have serious I/O bottlenecks. For example, in my server, the wa value is far greater than the normal range, indicating that the CPU is mostly waiting for I/O tasks to complete:
Tasks: 34 total, 1 running, 33 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.5 us, 2.6 sy, 0.0 ni, 5.3 id, 90.5 wa, 0.0 hi, 1.1 si, 0.0 st
%Cpu1 : 0.0 us, 2.2 sy, 0.0 ni, 24.9 id, 72.4 wa, 0.0 hi, 0.5 si, 0.0 st
%Cpu2 : 1.1 us, 0.6 sy, 0.0 ni, 0.6 id, 97.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 0.5 us, 2.7 sy, 0.0 ni, 16.8 id, 80.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.6 us, 1.7 sy, 0.0 ni, 0.0 id, 97.8 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 3.9 sy, 0.0 ni, 18.8 id, 77.3 wa, 0.0 hi, 0.0 si, 0.0 st
View I/O Utilization
Once we have confirmed that there is an I/O performance bottleneck, we need to further pinpoint the issue. In my case, I usually execute the iostat command as follows:
- -x: Display more extended information (including device utilization, wait time, etc.)
- Output once every second, continuously
iostat -x 1
From the output, we can see that the utilization %util for the sdd disk has skyrocketed to 100%, and the iowait has reached 78.2%, indicating that there are some abnormal I/O read/write tasks on this disk:
avg-cpu: %user %nice %system %iowait %steal %idle
0.0% 0.0% 1.0% 78.2% 0.0% 20.8%
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f_await aqu-sz %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 4.00 0.04 0.00 0.00 122.25 10.24 171.00 190.81 1.00 0.58 3884.17 1.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 664.68 100.00
Key Indicator Interpretation:
<span>%util</span>: Device utilization, close to 100% indicates the device is busy, which may suggest an I/O bottleneck<span>r_await</span>and<span>w_await</span>: Average read/write request wait time, higher values indicate slower I/O response<span>aqu-sz</span>: Average request queue size, larger values indicate serious I/O request backlog<span>await</span>: Average service time (sum of read/write wait times)
Clearly Identify I/O Processes
Based on the above process, we have confirmed that the sdd disk has I/O anomalies. We can now use iotop to view the specific processes. It is important to note that iotop is not installed by default, and readers can refer to online tutorials for installation. For my Ubuntu system, the corresponding installation command is:
sudo apt install iotop
Finally, type <span>sudo iotop -o</span> to view the processes currently performing I/O operations. At this point, we can clearly see the threads of my Java process that are executing abnormal I/O operations and their read/write rates:
Total DISK READ: 0.00 B/s | Total DISK WRITE: 142.99 M/s
Current DISK READ: 11.92 K/s | Current DISK WRITE: 336.21 M/s
TID PRIO USER DISK READ DISK WRITE> COMMAND
3253712 be/4 sharkchi 0.00 B/s 18.26 M/s java -jar web-cache-1.0.jar --app.startup.method=1 [pool-2-thread-3]
3253713 be/4 sharkchi 0.00 B/s 18.26 M/s java -jar web-cache-1.0.jar --app.startup.method=1 [pool-2-thread-4]
3253711 be/4 sharkchi 0.00 B/s 18.25 M/s java -jar web-cache-1.0.jar --app.startup.method=1 [pool-2-thread-2]
3253715 be/4 sharkchi 0.00 B/s 18.25 M/s java -jar web-cache-1.0.jar --app.startup.method=1 [pool-2-thread-6]
3253714 be/4 sharkchi 0.00 B/s 18.24 M/s java -jar web-cache-1.0.jar --app.startup.method=1 [pool-2-thread-5]
3253710 be/4 sharkchi 0.00 B/s 17.50 M/s java -jar web-cache-1.0.jar --app.startup.method=1 [pool-2-thread-1]
3253717 be/4 sharkchi 0.00 B/s 17.50 M/s java -jar web-cache-1.0.jar --app.startup.method=1 [pool-2-thread-8]
3253716 be/4 sharkchi 0.00 B/s 16.74 M/s java -jar web-cache-1.0.jar --app.startup.method=1 [pool-2-thread-7]
Supplement: Other Useful I/O Analysis Tools
In actual troubleshooting, in addition to the tools mentioned above, there are other useful tools that can assist in analysis:
<span>pidstat -d 1</span>: Displays I/O statistics for each process<span>iotop -a</span>: Sorts by cumulative I/O usage<span>vmstat 1</span>: Displays virtual memory statistics, including I/O-related metrics<span>lsof +D /path/to/directory</span>: Lists processes that have opened files in the specified directory
Conclusion
Let’s briefly summarize the troubleshooting routine for I/O performance bottlenecks:
- Check the
<span>top</span>command for<span>%wa</span>(iowait) metrics to determine if the CPU is experiencing abnormal waits for I/O - Use
<span>iostat -x 1</span>to view detailed metrics such as I/O utilization and response times, pinpointing specific disk devices - Use
<span>iotop</span>to display processes and threads currently performing I/O tasks, clarifying the problematic program - Combine with other tools like
<span>pidstat</span>and<span>vmstat</span>for in-depth analysis
Hi, I am sharkChili, a hardcore technology enthusiast and a Java coder. I am also a CSDN blog expert and one of the maintainers of the open-source project Java Guide. I am familiar with Java and have some knowledge of Go, and occasionally dabble in C source code. I have written many interesting technical blogs and am still researching and sharing technology. I hope my articles are helpful to you, and I warmly welcome you to follow my public account: SharkChili Coding.
Recently, I have received many private messages from readers, so I have created a group chat. Interested readers can obtain my contact information through the public account above to add me as a friend. Just click the note “Join Group” to engage in in-depth discussions with me and my friends.
References
[Bilingual Perspective] Troubleshooting I/O Performance Issues on Linux: https://www.bilibili.com/video/BV14d35zhEjs/?from_spmid=united.player-video-detail.drama-float.0&plat_id=411&share_from=season&share_medium=iphone&share_plat=ios&share_session_id=2A52FF3A-393E-4D48-98A0-712F961E6C3A&share_source=WEIXIN&share_tag=s_i&spmid=united.player-video-detail.0.0×tamp=1758040810&unique_k=gAzBCJY&vd_source=bf04f9a485aa892c0242fbfdfca25589