Linux Performance Tuning: Using strace to Analyze File System Performance Issues

In the previous chapters, we introduced information related to file systems and disks, discussing in detail the entire process of I/O requests reaching the disk, as well as potential performance bottlenecks and troubleshooting tools. If you haven’t read them yet, you can refer to the previous articles at the end of this article.

Today, we will mainly introduce the tool strace, which will be used in conjunction with tools like lsof, vmstat, iostat, and pidstat to analyze performance issues in the file system.

Why use this tool? Firstly, it is easy to install and use; you can install it using the command apt install strace. After installation, you can use the strace command directly without additional configuration. Moreover, these tools are primarily related to file operations. When we look at the process of disk I/O requests, the most common issues do not arise from the underlying block device layer, but rather from the virtual file system layer, as it interacts with various applications, and the majority of I/O requests originate from there.

1. Core Operations of VFS

The core operations of the Virtual File System (VFS) mainly include open, read, write, close, stat, and fsync. These operations are relatively concentrated and unified, which is beneficial for our analysis. By analyzing the time taken and the number of calls for these system calls, we can quickly locate performance bottlenecks in the file system.

open/close: Represents opening/closing files, potential performance issues may arise from frequent openings and delays in opening.read/write: Represents reading and writing file data, with performance metrics focusing on IOPS, throughput, buffer size, etc.stat/lstat/fstat: These operations can be used to obtain the status of files.fsync: Flushes the disk and updates data. We can pay attention to whether flushing the disk is delayed.

In addition to these commonly used operations, there are also mmap for memory mapping and access for permission checks, which can be used in conjunction when necessary.

2. Four Steps of Performance Analysis

Before using the strace tool to analyze file system calls, we need to use some preliminary tools to analyze the system to determine whether to call this tool for better results.

Step 1: Generally, when issues arise, the first tools we use are free, top, etc., to check the overall system operation. For example, run the command top -c

top - 11:19:22 up 790 days, 18:52,  4 users,  load average: 0.00, 0.00, 0.00
Tasks: 136 total,   1 running,  87 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.8 us,  3.3 sy,  0.0 ni, 89.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2041048 total,   428584 free,   246764 used,  1365700 buff/cache
KiB Swap: 14401532 total, 14029564 free,   371968 used.  1523972 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                 
45479 root      20   0 1246604  17568   8288 S   4.7  0.9   5362:53 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf                 
27426 root      20   0   41828   3548   2972 R   0.7  0.2   0:00.04 top -c                                                                 
   16 root      20   0       0      0      0 S   0.3  0.0 146:37.59 [ksoftirqd/1]                                                           
 3248 redis     20   0   53376   2088   1652 S   0.3  0.1 813:41.59 /usr/bin/redis-server 127.0.0.1:6379                                    
    1 root      20   0  119712   4544   3000 S   0.0  0.2 190:35.95 /sbin/init

From this, we can determine several key pieces of information:

Basic load information, load average: 0.00, 0.00, 0.00.
Kernel CPU usage, as it reflects system calls, process scheduling, etc., indicated by 3.3 sy.
CPU wait time for I/O completion, iowait, indicated by 0.0 wa.
CPU idle rate (the higher, the more idle the CPU), indicated by 89.8 id. If there are I/O issues, iowait will generally show up, with data consistently exceeding 10%, indicating significant I/O pressure. If the sy value consistently exceeds 20%, it indicates frequent kernel scheduling or excessive system calls, requiring further investigation.

Step 2: If in the first observation, we find that iowait is indeed high, it indicates a potential performance bottleneck. We then use the iostat tool to see which disk’s I/O usage (%util) is relatively high and whether there is a saturation issue (queue avgqu-sz).

root@node:~# iostat -x -d 1
Linux 4.15.0-58-generic (cs1ahyper01n07)        11/17/2025      _x86_64_        (64 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     7.94    0.03   14.19     0.88    98.64    14.00     0.00    0.07    0.37    0.07   0.01   0.01

With this basic information, we can further investigate. If the disk usage remains high, we need to identify the processes causing the increase.

Step 3: At this point, we can use pidstat or iotop to further observe the I/O situation of the processes.

root@node:~# pidstat -d 1
Linux 4.15.0-58-generic (cs1ahyper01n07)        11/17/2025      _x86_64_        (64 CPU)

11:41:10 AM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
11:41:11 AM     0      4855      0.00     29.63      0.00       0  bkunifylogbeat
11:41:11 AM     0      4893      0.00      3.70      0.00       0  exceptionbeat
11:41:11 AM     0     24244      0.00      7.41      0.00       0  qemu-system-x86
11:41:11 AM     0     38781      0.00      3.70      0.00       0  qemu-system-x86
11:41:11 AM     0     48974      0.00    459.26      0.00       0  qemu-system-x86

Based on the output, we can quickly identify processes with high kB_rd/s and kB_wr/s metrics, as well as their process IDs. Additionally, we can find all thread IDs and related information based on this process ID.

root@node:~# ps -efT |grep 3565
root      3565  3565 14596  0 Mar04 ?        00:00:00 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf
root      3565  3566 14596  0 Mar04 ?        02:01:44 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf
root      3565  3567 14596  0 Mar04 ?        08:45:51 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf
root      3565  3568 14596  0 Mar04 ?        08:25:29 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf
root      3565  3569 14596  0 Mar04 ?        07:48:13 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf
root      3565  3570 14596  0 Mar04 ?        00:00:00 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf

Step 4: Based on the specific processes identified in the previous three steps, we can use strace to analyze their file call situations in detail. The basic usage of strace is as follows: For example, to trace all file operations of a process:

strace -e trace=file -tt -T -p <PID>

Where:-e trace=file only traces system calls related to file operations (including system calls with file paths as parameters)-tt prints timestamps (including microseconds)-T option to view the time taken for each system call, then identify the calls that take longer-p specifies the process ID to traceOther options:-c option to count the number of system calls, errors, and time taken-o output to a file-f trace child processes-s 1024 display the maximum length of strings

For example:

root@node1:~# strace -e trace=stat -p 3442
strace: Process 3442 attached
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=3447, si_uid=114} ---
stat("promote", 0x7ffd312ccf10)         = -1 ENOENT (No such file or directory)
stat("fallback_promote", 0x7ffd312ccf10) = -1 ENOENT (No such file or directory)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=42508, si_uid=114, si_status=0, si_utime=0, si_stime=0} ---
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=3447, si_uid=114} ---

Based on the process ID found in step three, we use strace -p to view the file call situation of the process.

root@node:~# strace -e trace=read,write -p 60274
strace: Process 60274 attached
read(5, "\1\0\0\0\0\0\0\0", 512)        = 8
write(20, "\1\0\0\0\0\0\0\0", 8)        = 8
read(5, "\1\0\0\0\0\0\0\0", 512)        = 8
write(20, "\1\0\0\0\0\0\0\0", 8)        = 8
write(9, "\1\0\0\0\0\0\0\0", 8)         = 8
write(9, "\1\0\0\0\0\0\0\0", 8)         = 8
write(9, "\1\0\0\0\0\0\0\0", 8)         = 8
read(9, "\3\0\0\0\0\0\0\0", 16)         = 8
read(9, 0x7ffe9b2b8060, 16)             = -1 EAGAIN (Resource temporarily unavailable)

Sometimes in the read() and write() calls, we can only see the file descriptor numbers, and the file names and paths remain unknown. In this case, we also need to use lsof, which is specifically used to view the list of files opened by a process. However, here, “files” not only include regular files but also directories, block devices, dynamic libraries, network sockets, etc.

root@node:~# lsof -p 60274
COMMAND     PID USER   FD      TYPE             DEVICE  SIZE/OFF       NODE NAME
qemu-syst 60274 root  cwd       DIR                8,1      4096          2 /
qemu-syst 60274 root  rtd       DIR                8,1      4096          2 /
qemu-syst 60274 root  txt       REG                8,1  22519968    1587004 /usr/bin/qemu-system-x86_64

Here, FD indicates the file descriptor number, TYPE indicates the file type, and NAME indicates the file path.

Based on the specific files and application information obtained, we can further analyze the specific issues.

Stay tuned for more on the journey of Linux performance optimization~

Linux Performance Tuning Series

– Memory Section

Linux Performance Tuning: About Memory

Linux Performance Tuning: Why Has Swap Increased?

Linux Performance Tuning: Understanding Caches in Memory

Linux Performance Tuning: How to Quickly Locate Memory Leaks?

Linux Performance Tuning: Detailed Usage of Memory Analysis Tools memleak-bpfcc and valgrind

Linux Performance Tuning: A Comprehensive Guide to Troubleshooting Memory Issues

– Disk Section

Linux Performance Tuning: In-Depth Understanding of File Systems and Disks

Linux Performance Tuning: Detailed Explanation of Disk Workflow and Performance Metrics

Linux Performance Tuning: Further Discussion on Disk Performance Metrics and Process-Level I/O

Linux Performance Tuning: Detailed Explanation of Common Scenarios for FIO Performance Testing

Linux System Tuning: In-Depth Analysis of Increasing Disk Latency Issues

1. Core Operations of VFS

2. Four Steps of Performance Analysis

Related posts

Leave a Comment Cancel reply