Linux System Optimization: In-Depth Analysis of Increased Disk Latency Issues

Based on the previous chapters, we have basically mastered a series of fundamental information about file systems and disks, from the entire read and write process from applications to disks, as well as the role of the virtual file system and block device layer in bridging these components, and finally obtaining some performance metrics to observe them. With this information, analyzing disk performance becomes much easier. If you are not familiar with the relevant information, please read the previous sections again.

Linux Performance Optimization: In-Depth Understanding of File Systems and Disks

Linux Performance Optimization: Detailed Explanation of Disk Workflow and Performance Metrics

1. System I/O Stack

In the entire process of I/O data being written to disk during business requests, data must first call the interfaces of the virtual file system, then the block device layer organizes, sorts, and merges the I/O requests, and finally writes to the disk. This constitutes the entire system’s storage I/O stack. It is the slowest layer in the entire system, and to improve its performance, various caching mechanisms have been implemented.

For the file system, page cache, index cache, directory entry cache, etc., are used. By leveraging the high efficiency of memory, the method of writing to cache first and then asynchronously flushing to disk significantly enhances the I/O efficiency of the file system. For those unfamiliar with memory caching, please refer to the series of articles on memory.

For hardware devices, buffering is used to cache data from block devices, which also improves efficiency.

Now, let’s outline the possible factors affecting disk read and write latency based on this approach.

2. Increased Disk Latency

The most noticeable disk latency during system operation is when the system experiences stuttering, which refers to the disk I/O metric response time we mentioned earlier. The general formula for calculating response time is:

Response Time = Transfer Time (Round Trip) + Request Queue Time + Disk Processing Time

We will analyze each possible influencing factor in this process, and through this process, we will provide a troubleshooting approach for similar issues.

1. Transfer Time, this limiting factor is mainly the network, focusing on aspects such as transmission medium, bandwidth, network card, traffic scheduling, network topology, protocol configuration, etc. We will discuss the details of this topic in the upcoming network section, so stay tuned.

2. Request Queue Time, under low load and with no hardware failures, the value of the request queue should be very low. When does this value increase? Let’s hypothesize a few scenarios and deduce backwards. Check the usage tool iostat -d -x 1 to see the value of avgqu-sz.

root@node:~# iostat -d -x 1
Linux 4.15.0-58-generic (cs1anr01n02)   11/14/2025      _x86_64_        (48 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     1.05    0.00    1.04     0.00     9.60    18.40     0.00    3.95    0.29    3.95   1.58   0.16
sdb               0.00     0.04    0.26  203.44    10.18  1663.53    16.43     0.03    0.15    1.81    0.14   0.06   1.17
sdc               0.00     0.04    0.33  154.60    14.10  1250.00    16.32     0.02    0.15    2.00    0.15   0.07   1.08
nvme0n1          20.88    70.25  367.48  691.60  3127.79  6237.64    17.69     0.02    0.01    0.05    0.03   0.01   0.77
intelcas1-1       0.00     0.00    1.15  131.82   147.45  1763.82    28.75     0.02    0.13    0.79    0.12   0.07   0.86
intelcas1-2       0.00     0.00    1.56  116.54   147.79  1252.25    23.71     0.01    0.12    0.75    0.11   0.06   0.72
  1. Sudden high-concurrency I/O: A large number of processes simultaneously initiate read and write requests (such as batch operations in databases, big data tasks, etc.). If this exceeds the storage IOPS/throughput limit, queue buildup will occur. We need to combine fio testing to determine the limits.

  2. High density of small I/O requests: Frequent read and write of small files (such as log writing, random queries in databases, etc.), where each I/O operation is very fast but the quantity is extremely large, can easily lead to queue buildup. We can use the iostat tool to check these values (r/s read requests per second, w/s write requests per second, rkB/s read data per second, wkB/s write data per second), and then estimate the size of I/O requests based on the formula data volume divided by count.

  3. Application I/O model is inefficient: Too much synchronous I/O, not using asynchronous I/O, or frequently opening/closing files increases the overhead of I/O requests. This requires checking application information, and can also be diagnosed using the strace tool.

  4. Internal resource contention in the system: Insufficient CPU or memory (e.g., memory being fully occupied leading to cache invalidation), or other processes simultaneously using a large amount of I/O resources can also lead to queuing of target I/O requests. This requires timely monitoring of resource usage rates.

3. Disk Processing Time, the factors affecting the disk’s ability to perform at its best include the following:

The first is firmware failure of the disk. There may be bad sectors, read/write errors, etc., leading to retries and delays. Additionally, as usage time increases, the performance of the disk itself may decline. Environmental factors (e.g., temperature) and cabling can also have an impact. For issues related to the disk itself, we can quickly check using tools like disktool or smartctl.

For example, using the command disktool show -l all to quickly check if there are any errors or if the disk temperature is too high.

root@node:~# disktool show -l all
-- Controller Information --
+---------------+--------+------------+----------------+-------+--------------+
| controller_id | locate | enclosures | virtual_drives | disks |   pci_path   |
+---------------+--------+------------+----------------+-------+--------------+
|       0       |   c0   |     1      |       1        |   2   | 0000:5e:00.0 |
|       1       |   c1   |     2      |       2        |   12  | 0000:af:00.0 |
+---------------+--------+------------+----------------+-------+--------------+
-- Enclosure Information --
+--------------+--------+--------+---------------------------+-----------------+
| enclosure_id | locate | status | number_of_physical_drives | number_of_slots |
+--------------+--------+--------+---------------------------+-----------------+
|      8       |  c1e8  |   OK   |             12            |        16       |
|     252      | c1e252 |   OK   |             0             |        8        |
+--------------+--------+--------+---------------------------+-----------------+
-- Virtual Drive Information --
+-------+--------+------------+----------+---------+------------------+
| vd_id | locate | raid_level |   path   |  state  | number_of_drives |
+-------+--------+------------+----------+---------+------------------+
|   0   |  c1v0  |   raid5    | /dev/sdb | optimal |        3         |
|   1   |  c1v1  |   raid5    | /dev/sdc | optimal |        3         |
+-------+--------+------------+----------+---------+------------------+
-- Physical Disk Information In Raid Model --
+-------+----------+----------+-----------+--------+--------+--------------------+------+--------------+---------+
| pd_id | p_locate | v_locate | interface | medium | state  |         sn         | slot | error(m|o|p) |  speed  |
+-------+----------+----------+-----------+--------+--------+--------------------+------+--------------+---------+
|   8   | c0e252s0 |  c0v0d8  |    sata   |  ssd   | online | BTYF01830HFC480CGN |  0   |    0|0|0     | 6.0Gb/s |
|   9   |  c1e8s2  |  c1v0d9  |    sata   |  hdd   | online |      WKD20E4X      |  2   |    0|0|0     | 6.0Gb/s |
|   10  |  c1e8s4  | c1v0d10  |    sata   |  hdd   | online |      WKD27F1X      |  4   |    0|0|0     | 6.0Gb/s |
|   11  |  c1e8s10 | c1v0d11  |    sata   |  hdd   | online |      WKD28DDX      |  10  |    0|0|0     | 6.0Gb/s |
|   12  |  c1e8s6  | c1v0d12  |    sata   |  hdd   | online |      WKD26G4X      |  6   |    0|0|0     | 6.0Gb/s |
|   20  |  c1e8s8  | c1v1d20  |    sata   |  hdd   | online |      WKD26J5X      |  8   |    0|0|0     | 6.0Gb/s |
+-------+----------+----------+-----------+--------+--------+--------------------+------+--------------+---------+
-- Nvme device Information --
+---------+--------------+--------------------+--------------+-------------+------------------+--------------+---------------------+
|   name  |     path     |         sn         | percent_used | temperature | critical_warning | media_errors | num_err_log_entries |
+---------+--------------+--------------------+--------------+-------------+------------------+--------------+---------------------+
| nvme0n1 | /dev/nvme0n1 | BTYF01830HFC480CGN |      5       |      43     |        0         |      0       |          0          |
+---------+--------------+--------------------+--------------+-------------+------------------+--------------+---------------------+
-- Disk Smart Error --
-- Disk sn: BTYF01830HFC480CGN, Assessment: PASS --
+-----+-----------------------+---------+-------------+-----+
| num |          name         |   type  | when_failed | raw |
+-----+-----------------------+---------+-------------+-----+
|  5  | reallocated_sector_ct | Old_age |      -      |  Linux性能调优:深入分析磁盘延迟变高的问题  |
+-----+-----------------------+---------+-------------+-----+

The second is abnormal configuration of the disk scheduling algorithm. We have learned about several scheduling algorithms (such as CFQ, NOOP, Deadline) in previous articles, and they have different impacts on different types of loads.

none: If the disk selects this, it generally means that no scheduler is used, mostly applied to directly connected SSDs, thus bypassing the file system’s I/O.noop: This is a first-in-first-out queue for I/O requests, performing only the most basic request merging.CFQ: This maintains an I/O scheduling queue for each process, fairly distributing I/O requests.deadline: This creates separate I/O queues for read and write requests, significantly improving read and write performance while ensuring that requests meeting deadlines are prioritized. Note that in higher versions of the Linux kernel, this may be referred to as mq-deadline.

The third is that system selection and configuration can also have an impact. The disk’s write caching strategy (such as write-through, write-back) can affect performance. If using RAID, different RAID selections will also have an impact.

With the above information in mind, when encountering system stuttering or unresponsive behavior, we can follow this approach to troubleshoot. Stay tuned as we continue to explore this topic!

Linux Performance Optimization Series

– Memory Section

Linux Performance Optimization: About Memory

Linux Performance Optimization: Why Swap Usage Increased

Linux Performance Optimization: Understanding Caches in Memory

Linux Performance Optimization: How to Quickly Locate Memory Leaks?

Linux Performance Optimization: Detailed Usage of Memory Analysis Tools memleak-bpfcc and valgrind

Linux Performance Optimization: A Comprehensive Guide to Troubleshooting Memory Issues

– Disk Section

Linux Performance Optimization: In-Depth Understanding of File Systems and Disks

Linux Performance Optimization: Detailed Explanation of Disk Workflow and Performance Metrics

Linux Performance Optimization: Detailed Explanation of Common Scenarios in FIO Performance Testing

Leave a Comment