Detailed Explanation of the Linux Top Command

(Pretending to pull up the terminal by tapping the spacebar twice) As a Linux operator or developer, you have certainly encountered the existential question, “The system is lagging, who is consuming resources?” — at this moment, there is no need to panic; just type top, and the characters that pop up on the screen are the system’s “health report” for you. This command has been around for quite some time, dating back to the 1980s when it ran alongside Unix, and later it was ported to Linux, becoming a “standard tool”. Just like a doctor’s stethoscope, whether it’s a server or a personal computer, if you want to check resource usage, calling it up is always a safe bet.Let’s talk about the “resource detective” in Linux — the top command We will directly analyze the output of the top command, breaking down its “cryptic language” line by line, and by the end, you will understand how practical this tool is.First Line: Basic System Information• 10:00:00: Current system time, which serves as a timestamp for logs, making it easier to trace issues later. • 122 days, 10:30: System uptime; if the server suddenly restarts, this number will “reset to zero”, allowing for quick identification of any abnormal shutdowns. • 2 users: The number of currently logged-in terminals; for example, if you have one terminal open locally and another connected remotely, this will show 2, indicating whether anyone else is operating the system. • load average: 0.08, 0.06, 0.06: System load over the last 1, 5, and 15 minutes. Note that this number is not the CPU usage rate, but a comprehensive indicator of how “busy” the system is — CPU, IO, and memory pressure all affect it. For instance, if the 1-minute load suddenly spikes to 10, it is highly likely that a process is “up to something”; we will discuss how to analyze load in detail in a separate article, but for now, remember that this line is a “warning signal”.Second Line: Process Statistics• Tasks: 122 total: There are currently 122 processes running on the system, equivalent to having 122 “workers” at home. • 1 running: 1 process is “working hard” (using CPU), while the rest are either “slacking off” or “waiting in line for resources”. • 121 sleeping: 121 processes are in a sleeping state; for example, your document editor will sleep when not in use, not consuming CPU resources. • 0 stopped: 0 processes are paused; commands paused with Ctrl+Z will count here. • 0 zombie: 0 zombie processes — these are “ghosts”; if the parent process does not manage them well, a child process can die and still occupy a slot. If a number appears here, you need to quickly investigate the corresponding parent process, or else having too many will consume resources.Third Line: CPU Workload Details• 0.5%us: The percentage of CPU used by user-space processes, such as your running Java programs or Python scripts; a high value indicates that business processes are “running hard”. • 0.7%sy: The percentage of CPU used by system processes, such as the kernel handling file IO or network requests; a high value may indicate a bottleneck in the kernel, such as a driver issue. • 0.0%ni: The percentage of CPU used by processes with adjusted priority; the default process priority is 0, and only those modified with the nice command will appear here. • 98.8%id: The percentage of idle CPU; the higher this number, the more idle the CPU is. For example, with 98.8% idle, the system is definitely not lagging. • 0.0%wa: The percentage of CPU waiting for IO; a high number indicates that the CPU is “waiting for the hard drive/network card to work”; for instance, if the hard drive read/write is too slow, the CPU can only wait, causing the system to lag. • 0.0%hi/0.0%si: The percentage of hard and soft interrupts; hard interrupts are signals from hardware (like keyboard or mouse), while soft interrupts are “internal messages” from the kernel, such as processing network packets; if these are too high, it may indicate hardware failure or a network storm. • 0.0%st: The “stolen time” in a virtualized environment; if you are using a cloud server, a high number here indicates that other virtual machines on the physical machine are “stealing resources”; for example, if a neighboring virtual machine’s CPU is maxed out, yours will “lose” time, causing your system to lag.Fourth Line: Physical Memory Balance Sheet• 8388608k total: Total physical memory, which is 8GB (1GB=1024MB=1024×1024k). • 8256576k used: Memory currently in use; note that this is the memory “managed” by the kernel, not all of it is actually occupied by processes. • 132032k free: Completely free memory, untouched by the kernel; this number being small is not a cause for concern, as Linux will use free memory for caching. • 0k buffers: Memory used for block device (like hard drives) caching; for example, when reading a file, the kernel will store the file content in a buffer, making subsequent reads faster.Fifth Line: Swap Space Emergency Warehouse• 1999864k total: Total swap space, which is about 2GB, serving as a “spare tire” for memory; when memory is insufficient, less frequently used processes will be “moved” here. • 0k used: Swap space usage; if this number is high, it indicates that physical memory is insufficient, and the system is “struggling”; over time, this will lead to significant lag, as hard drive speeds are much slower than memory. • 1999864k free: Free swap space, currently full, indicating that memory is still ample. • 4664332k cached: Memory cached to the swap space, which is actually used for kernel optimization, so it is not a major concern. Here is a key formula to remember: actual memory usage = used – buffers – cached (subtracting buffers gives the true memory occupied by processes); actual available memory = free + buffers + cached (cached memory can be released for process use at any time, so this is the true “available memory”), so don’t panic just because free is small; calculating will reveal whether memory is sufficient.Sixth Line: Process Resume Header This line serves as the “manual” for the process list below, with each field helping you locate the “resource thief”: • PID: Process ID; to kill a process or check process details, you rely on this (e.g., kill PID). • USER: Process owner; for example, processes run by root are generally system services; those run by ordinary users may be business programs. • PR/NI: Process priority and nice value; PR is the actual priority used by the kernel, while NI is user-set (-20 to 19, with lower values indicating higher priority). • VIRT: Total virtual memory used by the process, including memory, swap space, and shared memory; a high number here is not a concern, as virtual memory does not occupy actual hardware resources. • RES: Physical memory occupied by the process (not swapped out); this number is the “real memory usage”; for example, if a process has RES of 1GB, it has truly used 1GB of physical memory. • SHR: Size of shared memory; for example, shared library files used by multiple processes will count here to avoid duplicate memory usage. • S: Process state; R (running), S (sleeping), D (uninterruptible sleep, e.g., waiting for IO), Z (zombie); checking the state allows for quick assessment of whether a process is functioning normally. • %CPU/%MEM: Percentage of CPU and memory occupied by the process; the higher the percentage, the more of a “resource hog” it is. • TIME+: Total CPU time used by the process; for example, if a process has TIME+ of 10:00, it has used a total of 10 minutes of CPU time; the longer the time, the more “CPU-intensive” it is. • COMMAND: Process name or command line; for example, java -jar app.jar, which clearly indicates which program is running. Additionally, there are two super useful shortcuts in the top command to remember: pressing Shift+M (uppercase M) will sort processes by memory usage in descending order, making it easy to see who is consuming the most memory; pressing Shift+P (uppercase P) will sort by CPU usage in descending order, immediately revealing the “CPU thief”.These advanced uses of top can solve 80% of problems In addition to the default interface, top has some “hidden skills” that are particularly useful for daily troubleshooting: 1. top -H: View thread details of processes. For example, if a Java process is maxing out the CPU, using top -H -p processPID will show the CPU usage of all threads under that process, allowing you to find the thread ID with the highest usage, convert it to hexadecimal (using printf "%x\n" threadID), and search in the jstack logs to pinpoint which code block is “causing trouble”, which is much more efficient than guessing. 2. top -p PID1,PID2: Only view information for specified processes. For example, if you are only concerned about nginx and mysql, use top -p 123,456 (where 123 is nginx’s PID and 456 is mysql’s PID) for a cleaner interface without having to sift through a bunch of processes. 3. top -d 5: Set the refresh interval to 5 seconds (the default is 3 seconds). If you want to monitor a specific process’s changes, increasing the interval will reduce the need for frequent refreshing, making it easier on the eyes.Finally, I want to say: top is not just for “checking resources”, but also the beginning of “understanding the system” Many people use top only to look at CPU and memory usage percentages, then close it; this is quite unfortunate. This command is like a “barometer” for the system — if the load is high, it may indicate IO issues; if there are many zombie processes, it may suggest bugs in the program; if the st percentage is high, it may indicate that a “neighbor” on the cloud server is causing trouble. Initially, you may find the fields overwhelming and hard to remember, but with frequent use, you will discover that every time you type top, you are “communicating” with the system. It won’t directly tell you “where the problem is”, but it will lay out all the clues for you to discover. Just like when we troubleshoot bugs, we don’t know the answer right away; instead, we find clues through logs and monitoring — top is the most fundamental and reliable “clue repository” in Linux. So don’t think of it as just a “viewing command”; ponder over the meaning of each line, and try out the shortcuts and parameters; you will find that understanding the system is not that difficult.

Leave a Comment