Performance Metrics for Linux Servers

(Click the public account above to quickly follow)

Source: taozj

https://taozj.org/201701/linux-performance-basic.html

If you have good articles to submit, please click → here for details

A server based on the Linux operating system exhibits various performance metrics while running. Generally, operations and maintenance personnel, as well as system administrators, are very sensitive to this data, but these metrics are also crucial for developers, especially when your program is not functioning correctly; these clues often help quickly locate and trace issues.

Here are some simple tools to view system-related parameters. Many tools work by analyzing data from /proc and /sys, while more detailed and professional performance monitoring and tuning may require more specialized tools (such as perf, systemtap, etc.) and techniques. After all, system performance monitoring is a vast field.

Performance Metrics for Linux Servers

1. CPU and Memory Metrics

1.1

top

➜ ~ top

Performance Metrics for Linux Servers

The three values after the first line represent the system’s average load over the previous 1, 5, and 15 minutes. You can also see the trend of system load rising, stabilizing, or declining. When this value exceeds the number of CPU executable units, it indicates that CPU performance has saturated and become a bottleneck.

The second line summarizes the system’s task status information. Running is self-explanatory, including tasks currently running on the CPU and those scheduled to run; sleeping typically refers to tasks waiting for events (such as I/O operations) to complete, which can be further divided into interruptible and uninterruptible types; stopped refers to tasks that have been paused, usually by sending SIGSTOP or by operating a foreground task with Ctrl-Z; zombie tasks are terminated processes whose resources will be automatically reclaimed, but the task descriptor containing the exit task needs to be accessed by the parent process to be released. This type of process appears as defunct, and whether due to the parent process exiting early or a lack of wait calls, such processes should be carefully monitored for potential design flaws in the program. The third line shows CPU usage rates categorized as follows:

  • (us) user: Time the CPU spends in user mode with low nice values (high priority) (nice <= 0). Normally, as long as the server is not idle, most CPU time should be spent executing these types of programs.

  • (sy) system: Time the CPU spends in kernel mode, where the operating system transitions from user mode to kernel mode via system calls to perform specific services; typically, this value is small, but it can be larger when the server performs I/O-intensive operations.

  • (ni) nice: Time the CPU spends in user mode with high nice values (low priority) (nice > 0). By default, newly started processes have nice = 0 and will not be counted here unless manually modified via renice or setpriority().

  • (id) idle: Time the CPU spends in an idle state (executing kernel idle handler).

  • (wa) iowait: Time spent waiting for I/O to complete.

  • (hi) irq: Time consumed by the system handling hardware interrupts.

  • (si) softirq: Time consumed by the system handling soft interrupts. Remember that soft interrupts are divided into softirqs, tasklets (which are actually a special case of the former), and work queues. It is unclear which of these times are being counted here, as the execution of work queues is no longer in the interrupt context.

  • (st) steal: This is only meaningful in a virtual machine context, as the CPU in a virtual machine also shares the physical CPU. This time indicates how long the virtual machine waits for the hypervisor to schedule the CPU, meaning that during this time, the hypervisor has allocated CPU resources to other CPUs, and this CPU resource has been “stolen.” This value is non-zero on my KVM VPS machine, but only at the 0.1 magnitude level. Could this be used to judge VPS overselling?

High CPU usage often indicates various issues, providing corresponding troubleshooting ideas for high server CPU usage:

  1. When the user usage rate is too high, it usually means that certain individual processes are consuming a lot of CPU. At this point, it is easy to find the program using top; if you suspect the program is abnormal, you can use perf and other methods to identify hotspot calling functions for further investigation.

  2. When the system usage rate is too high, if I/O operations (including terminal I/O) are frequent, it may cause this part of the CPU usage to be high, such as on file servers or database servers. Otherwise (e.g., >20%), it is likely that some parts of the kernel or driver modules have issues.

  3. When the nice usage rate is too high, it is usually intentional behavior. When the initiator of the process knows that certain processes consume a high CPU, they will set their nice value to ensure that they do not overwhelm other processes’ requests for CPU usage.

  4. When the iowait usage rate is too high, it usually indicates that certain programs have low I/O operation efficiency, or the performance of the I/O corresponding device is so low that read/write operations take a long time to complete.

  5. When the irq/softirq usage rate is too high, it is likely that some peripherals have issues, leading to a large number of irq requests. At this point, you should check the /proc/interrupts file to investigate the problem.

  6. When the steal usage rate is too high, it is likely that the vendor has oversold the virtual machine!

The fourth and fifth lines provide information about physical memory and virtual memory (swap space): total = free + used + buff/cache. Now, the sum of buffers and cached memory information is combined, but the relationship between buffers and cached memory is not clearly stated in many places. In fact, by comparing the data, these two values correspond to the Buffers and Cached fields in /proc/meminfo: Buffers refer to the raw disk block cache, primarily caching the metadata of the file system (such as superblock information), and this value is generally small (around 20M); Cached refers to the read cache for specific files to increase file access efficiency, which can be said to be used for file caching in the file system.

The avail Mem is a new parameter value used to indicate how much memory space can be allocated to newly started programs without swapping, roughly equivalent to free + buff/cached, which also confirms the above statement that free + buffers + cached Mem is the truly available physical memory. Moreover, using swap space is not necessarily a bad thing, so the swap space usage rate is not a serious parameter, but frequent swap in/out is not a good sign, indicating a shortage of physical memory.

Finally, there is a list of resource usage for each program, where the CPU usage rate is the sum of the usage rates of all CPU cores. Typically, when executing top, the program itself will read /proc extensively, so the top program itself will usually rank high in resource usage.

Although top is very powerful, it is typically used for real-time monitoring of system information in the console and is not suitable for long-term (days, months) monitoring of system load information. Additionally, it may miss short-lived processes and fail to provide statistical information.

1.2

vmstat

vmstat is another commonly used system monitoring tool besides top. The screenshot below shows the system load while compiling boost with the -j4 option.Performance Metrics for Linux Serversr indicates the number of runnable processes, and the data roughly matches; b indicates the number of uninterruptible sleeping processes; swpd indicates the amount of virtual memory used, which corresponds to the Swap-used value in top; as the manual states, typically the number of buffers should be much smaller than cached Mem, with buffers generally around 20M; the bi and bo in the I/O domain indicate the number of blocks received from and sent to the disk per second (blocks/s); the in in the system domain indicates the number of system interrupts per second (including clock interrupts), and cs indicates the number of context switches caused by process switching.

At this point, I recall that many people used to be confused about whether the -j parameter when compiling the Linux kernel refers to CPU Core or CPU Core + 1. By modifying the -j parameter value while compiling boost and the Linux kernel and monitoring with vmstat, I found that the context switch remained basically unchanged in both cases, and only when the -j value was significantly increased did the context switch increase significantly. It seems that there is no need to be overly concerned about this parameter, although I have not yet tested the specific compilation time length. It is said that if not in system startup or benchmark state, a context switch > 100000 indicates that the program has issues.

1.3

pidstat

If you want to comprehensively track a specific process, nothing is more suitable than pidstat—stack space, page faults, voluntary and involuntary context switches, and other information are all at your fingertips. The most useful parameter of this command is -t, which lists detailed information about each thread in the process.

-r: Displays page fault and memory usage status. A page fault occurs when a program needs to access a page mapped in virtual memory that has not yet been loaded into physical memory. There are two main types of page faults:

  1. minflt/s refers to minor faults, where the physical page needed is already in physical memory for some reason (such as shared pages, caching mechanisms, etc.) but is not referenced in the current process’s page table. The MMU only needs to set the corresponding entry, which incurs a very small cost.

  2. majflt/s refers to major faults, where the MMU needs to allocate a free physical page from the currently available physical memory (if no free pages are available, it must swap out other physical pages to free up a physical page) and then load data from external sources into that physical page and set the corresponding entry. This incurs a high cost, with several orders of magnitude difference from the former.

-s: Stack usage status, including StkSize for the stack space reserved for the thread and StkRef for the actual stack space used. Using ulimit -s, I found that the default stack space on CentOS 6.x is 10240K, while on CentOS 7.x and Ubuntu series, the default stack space size is 8196K.

Performance Metrics for Linux Servers

-u: CPU usage status, with parameters similar to those mentioned earlier.

-w: The number of thread context switches, further divided into cswch/s for voluntary switches due to waiting for resources and nvcswch/s for involuntary switches due to CPU time.

If you always have to first use ps to get the program’s pid before operating pidstat, it can be cumbersome. Therefore, this killer feature -C allows you to specify a string, and if the Command contains this string, the program’s information will be printed and counted. The -l option can display the full program name and parameters ➜ ~ pidstat -w -t -C “ailaw” -l

In this way, when checking a single task, especially a multithreaded one, pidstat is more effective than the commonly used ps!

1.4

Others

When you need to monitor a single CPU, in addition to htop, you can also use mpstat to check whether the workload is balanced across various cores in an SMP processor and whether certain hotspot threads occupy a core. ➜ ~ mpstat -P ALL 1

If you want to directly monitor the resources occupied by a specific process, you can either use top -u taozj to filter out unrelated processes from other users or use the following method to select, where the ps command can customize the information to be printed:

while :; do ps -eo user,pid,ni,pri,pcpu,psr,comm | grep ‘ailawd’; sleep 1; done

If you want to clarify the inheritance relationship, the following commonly used parameter can be used to display the process tree structure, which is much more detailed and visually appealing than pstree.

➜ ~ ps axjf

2. Disk I/O Metrics

iotop can visually display the real-time disk read rates of various processes and threads; lsof can not only show the open information of ordinary files (users) but also the open information of device files like /dev/sda1. For example, when a partition cannot be unmounted, you can use lsof to find out the usage status of that disk partition, and adding the +fg parameter can also display the file open flag marks.

2.1

iostat

➜ ~ iostat -xz 1

Whether using iostat -xz 1 or sar -d 1, the important parameters for disks are:

  • avgqu-s: The average length of the waiting queue for I/O requests sent to the device. For a single disk, if the value > 1, it indicates that the device is saturated, except for logical disks in multiple disk arrays.

  • await (r_await, w_await): The average wait time (ms) for each device I/O request operation, including the time spent in the queue and the service time;

  • svctm: The average service time (ms) for I/O requests sent to the device. If svctm is very close to await, it indicates that there is almost no I/O wait, and the disk performance is good; otherwise, if the disk queue wait time is long, the disk response is poor.

  • %util: The utilization of the device, indicating the proportion of time used for I/O work per second. For a single disk, when %util > 60%, performance will decline (reflected in an increase in await), and when it approaches 100%, the device is saturated, except for logical disks in multiple disk arrays.

Additionally, although the monitored disk performance may be poor, it does not necessarily impact the application’s response, as the kernel typically uses asynchronous I/O techniques and read/write caching to improve performance. However, this is also constrained by the limitations of physical memory mentioned above.

The above parameters are also applicable to network file systems.

3. Network Metrics

The importance of network performance for servers is self-evident. The tool iptraf can intuitively display the send/receive speed information of the network card, and a simpler way to obtain similar throughput information is through sar -n DEV 1, while network cards are typically equipped with maximum rate information, such as 100M or 1G network cards, making it easy to check the device’s utilization.

Typically, the transmission rate of the network card is not the primary concern in network development; rather, it is the packet loss rate, retransmission rate, and network latency for specific UDP and TCP connections that are of interest.

3.1

netstat

➜ ~ netstat -s

Displays the overall data information for each protocol since the system started. Although the parameter information is rich and useful, it is cumulative, and unless you run it twice to calculate the current network status, or use watch to visually observe the trend of value changes, netstat is usually used to detect port and connection information:

netstat –all(a) –numeric(n) –tcp(t) –udp(u) –timers(o) –listening(l) –program(p)

–timers can cancel reverse domain name queries to speed up display; commonly used commands include:

➜ ~ netstat -antp # List all TCP connections

➜ ~ netstat -nltp # List all local TCP listening sockets, do not add the -a parameter

3.2

sar

The sar tool is incredibly powerful, managing everything from CPU, disk, to page swapping. Here, using -n is mainly to analyze network activity, although it also breaks down data for various layers and protocols, such as NFS, IP, ICMP, SOCK, etc. We only care about TCP and UDP. The following command displays the usual segment and datagram send/receive situation, including:

TCP➜ ~ sudo sar -n TCP,ETCP 1 Performance Metrics for Linux Servers

  • active/s: TCP connections initiated by the local machine, such as through connect(), with TCP status transitioning from CLOSED -> SYN-SENT.

  • passive/s: TCP connections initiated by remote machines, such as through accept(), with TCP status transitioning from LISTEN -> SYN-RCVD.

  • retrans/s (tcpRetransSegs): The number of TCP retransmissions per second, which typically occurs in cases of poor network quality or server overload, where packet loss triggers retransmission based on TCP’s acknowledgment mechanism.

  • isegerr/s (tcpInErrs): The number of erroneous packets received per second (e.g., checksum failures).

UDP➜ ~ sudo sar -n UDP 1

  • noport/s (udpNoPorts): The number of datagrams received per second that have no application listening on the specified destination port.

  • idgmerr/s (udpInErrors): The number of datagrams received by the local machine that cannot be dispatched for reasons other than the above.

Of course, these data can indicate network reliability to some extent, but they only become meaningful when combined with specific business requirements.

3.3

tcpdump

tcpdump is undoubtedly a great tool. Everyone knows that when debugging locally, Wireshark is preferred, but what about when issues arise on the online server?

The reference materials in the appendix provide a solution: recreate the environment, use tcpdump to capture packets, and when the issue reoccurs (for example, when logs show or a certain state appears), you can stop the capture. Moreover, tcpdump has -C/-W parameters to limit the size of the captured packet storage files, and when this limit is reached, the saved packet data will automatically rotate, keeping the number of captured packets manageable. Afterward, you can take the packet data offline and analyze it with Wireshark as you wish, which is quite enjoyable! Although tcpdump lacks a GUI, its packet capture functionality is not inferior; you can specify network cards, hosts, ports, protocols, and various filtering parameters, and the captured packets are complete and timestamped, making online program data packet analysis straightforward.

Below is a small test, showing that when Chrome starts, it automatically establishes three connections to the web server. Since the dst port parameter is limited here, the server’s response packets are filtered out. When analyzed with Wireshark, the SYNC and ACK connection establishment process is still very clear! When using tcpdump, it is essential to configure the capture filtering conditions as much as possible, both to facilitate subsequent analysis and because tcpdump can impact the performance of the network card and system, which in turn affects online business performance.

Performance Metrics for Linux Servers

End of article!

Did you gain anything from this article? Please share it with more people.

Follow “Linux Enthusiasts” for technical insights

Performance Metrics for Linux Servers

Leave a Comment