In the previous chapters, we introduced information related to file systems and disks, discussing in detail the entire process of I/O requests reaching the disk, as well as potential performance bottlenecks and troubleshooting tools. If you haven’t read them yet, you can refer to the previous articles at the end of this article.
Today, we will mainly introduce the tool strace, which will be used in conjunction with tools like lsof, vmstat, iostat, and pidstat to analyze performance issues in the file system.
Why use this tool? Firstly, it is easy to install and use; you can install it using the command apt install strace. After installation, you can use the strace command directly without additional configuration. Moreover, these tools are primarily related to file operations. When we look at the process of disk I/O requests, the most common issues do not arise from the underlying block device layer, but rather from the virtual file system layer, as it interacts with various applications, and the majority of I/O requests originate from there.
1. Core Operations of VFS
The core operations of the Virtual File System (VFS) mainly include open, read, write, close, stat, and fsync. These operations are relatively concentrated and unified, which is beneficial for our analysis. By analyzing the time taken and the number of calls for these system calls, we can quickly locate performance bottlenecks in the file system.
open/close: Represents opening/closing files, potential performance issues may arise from frequent openings and delays in opening.read/write: Represents reading and writing file data, with performance metrics focusing on IOPS, throughput, buffer size, etc.stat/lstat/fstat: These operations can be used to obtain the status of files.fsync: Flushes the disk and updates data. We can pay attention to whether flushing the disk is delayed.
In addition to these commonly used operations, there are also mmap for memory mapping and access for permission checks, which can be used in conjunction when necessary.
2. Four Steps of Performance Analysis
Before using the strace tool to analyze file system calls, we need to use some preliminary tools to analyze the system to determine whether to call this tool for better results.
Step 1: Generally, when issues arise, the first tools we use are free, top, etc., to check the overall system operation. For example, run the command top -c
top - 11:19:22 up 790 days, 18:52, 4 users, load average: 0.00, 0.00, 0.00
Tasks: 136 total, 1 running, 87 sleeping, 0 stopped, 0 zombie
%Cpu(s): 6.8 us, 3.3 sy, 0.0 ni, 89.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 2041048 total, 428584 free, 246764 used, 1365700 buff/cache
KiB Swap: 14401532 total, 14029564 free, 371968 used. 1523972 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
45479 root 20 0 1246604 17568 8288 S 4.7 0.9 5362:53 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf
27426 root 20 0 41828 3548 2972 R 0.7 0.2 0:00.04 top -c
16 root 20 0 0 0 0 S 0.3 0.0 146:37.59 [ksoftirqd/1]
3248 redis 20 0 53376 2088 1652 S 0.3 0.1 813:41.59 /usr/bin/redis-server 127.0.0.1:6379
1 root 20 0 119712 4544 3000 S 0.0 0.2 190:35.95 /sbin/init
From this, we can determine several key pieces of information:
- Basic load information, load average: 0.00, 0.00, 0.00.
- Kernel CPU usage, as it reflects system calls, process scheduling, etc., indicated by 3.3 sy.
- CPU wait time for I/O completion, iowait, indicated by 0.0 wa.
- CPU idle rate (the higher, the more idle the CPU), indicated by 89.8 id. If there are I/O issues, iowait will generally show up, with data consistently exceeding 10%, indicating significant I/O pressure. If the sy value consistently exceeds 20%, it indicates frequent kernel scheduling or excessive system calls, requiring further investigation.
Step 2: If in the first observation, we find that iowait is indeed high, it indicates a potential performance bottleneck. We then use the iostat tool to see which disk’s I/O usage (%util) is relatively high and whether there is a saturation issue (queue avgqu-sz).
root@node:~# iostat -x -d 1
Linux 4.15.0-58-generic (cs1ahyper01n07) 11/17/2025 _x86_64_ (64 CPU)
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 7.94 0.03 14.19 0.88 98.64 14.00 0.00 0.07 0.37 0.07 0.01 0.01
With this basic information, we can further investigate. If the disk usage remains high, we need to identify the processes causing the increase.
Step 3: At this point, we can use pidstat or iotop to further observe the I/O situation of the processes.
root@node:~# pidstat -d 1
Linux 4.15.0-58-generic (cs1ahyper01n07) 11/17/2025 _x86_64_ (64 CPU)
11:41:10 AM UID PID kB_rd/s kB_wr/s kB_ccwr/s iodelay Command
11:41:11 AM 0 4855 0.00 29.63 0.00 0 bkunifylogbeat
11:41:11 AM 0 4893 0.00 3.70 0.00 0 exceptionbeat
11:41:11 AM 0 24244 0.00 7.41 0.00 0 qemu-system-x86
11:41:11 AM 0 38781 0.00 3.70 0.00 0 qemu-system-x86
11:41:11 AM 0 48974 0.00 459.26 0.00 0 qemu-system-x86
Based on the output, we can quickly identify processes with high kB_rd/s and kB_wr/s metrics, as well as their process IDs. Additionally, we can find all thread IDs and related information based on this process ID.
root@node:~# ps -efT |grep 3565
root 3565 3565 14596 0 Mar04 ? 00:00:00 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf
root 3565 3566 14596 0 Mar04 ? 02:01:44 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf
root 3565 3567 14596 0 Mar04 ? 08:45:51 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf
root 3565 3568 14596 0 Mar04 ? 08:25:29 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf
root 3565 3569 14596 0 Mar04 ? 07:48:13 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf
root 3565 3570 14596 0 Mar04 ? 00:00:00 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf
Step 4: Based on the specific processes identified in the previous three steps, we can use strace to analyze their file call situations in detail. The basic usage of strace is as follows: For example, to trace all file operations of a process:
strace -e trace=file -tt -T -p <PID>
Where:-e trace=file only traces system calls related to file operations (including system calls with file paths as parameters)-tt prints timestamps (including microseconds)-T option to view the time taken for each system call, then identify the calls that take longer-p specifies the process ID to traceOther options:-c option to count the number of system calls, errors, and time taken-o output to a file-f trace child processes-s 1024 display the maximum length of strings
For example:
root@node1:~# strace -e trace=stat -p 3442
strace: Process 3442 attached
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=3447, si_uid=114} ---
stat("promote", 0x7ffd312ccf10) = -1 ENOENT (No such file or directory)
stat("fallback_promote", 0x7ffd312ccf10) = -1 ENOENT (No such file or directory)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=42508, si_uid=114, si_status=0, si_utime=0, si_stime=0} ---
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=3447, si_uid=114} ---
Based on the process ID found in step three, we use strace -p to view the file call situation of the process.
root@node:~# strace -e trace=read,write -p 60274
strace: Process 60274 attached
read(5, "\1\0\0\0\0\0\0\0", 512) = 8
write(20, "\1\0\0\0\0\0\0\0", 8) = 8
read(5, "\1\0\0\0\0\0\0\0", 512) = 8
write(20, "\1\0\0\0\0\0\0\0", 8) = 8
write(9, "\1\0\0\0\0\0\0\0", 8) = 8
write(9, "\1\0\0\0\0\0\0\0", 8) = 8
write(9, "\1\0\0\0\0\0\0\0", 8) = 8
read(9, "\3\0\0\0\0\0\0\0", 16) = 8
read(9, 0x7ffe9b2b8060, 16) = -1 EAGAIN (Resource temporarily unavailable)
Sometimes in the read() and write() calls, we can only see the file descriptor numbers, and the file names and paths remain unknown. In this case, we also need to use lsof, which is specifically used to view the list of files opened by a process. However, here, “files” not only include regular files but also directories, block devices, dynamic libraries, network sockets, etc.
root@node:~# lsof -p 60274
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
qemu-syst 60274 root cwd DIR 8,1 4096 2 /
qemu-syst 60274 root rtd DIR 8,1 4096 2 /
qemu-syst 60274 root txt REG 8,1 22519968 1587004 /usr/bin/qemu-system-x86_64
Here, FD indicates the file descriptor number, TYPE indicates the file type, and NAME indicates the file path.
Based on the specific files and application information obtained, we can further analyze the specific issues.
Stay tuned for more on the journey of Linux performance optimization~
Linux Performance Tuning Series
– Memory Section
Linux Performance Tuning: About Memory
Linux Performance Tuning: Why Has Swap Increased?
Linux Performance Tuning: Understanding Caches in Memory
Linux Performance Tuning: How to Quickly Locate Memory Leaks?
Linux Performance Tuning: Detailed Usage of Memory Analysis Tools memleak-bpfcc and valgrind
Linux Performance Tuning: A Comprehensive Guide to Troubleshooting Memory Issues
– Disk Section
Linux Performance Tuning: In-Depth Understanding of File Systems and Disks
Linux Performance Tuning: Detailed Explanation of Disk Workflow and Performance Metrics
Linux Performance Tuning: Further Discussion on Disk Performance Metrics and Process-Level I/O
Linux Performance Tuning: Detailed Explanation of Common Scenarios for FIO Performance Testing
Linux System Tuning: In-Depth Analysis of Increasing Disk Latency Issues