Server response is slow, services fail to start, disk space is running low… When faced with these common issues, blindly restarting the server not only fails to address the root cause but may also obscure the real problems.
This article compiles 10 key <span>Linux</span> commands, along with detailed usage methods and scenario descriptions, to help you efficiently troubleshoot core resource anomalies such as <span>CPU</span>, <span>memory</span>, <span>ports</span>, <span>disk</span>, and <span>Swap</span>.
In <span>daily operations</span>, quickly identifying issues is the first step to ensuring system stability.<span>Linux</span> systems provide a wealth of diagnostic tools; mastering the following command combinations can complete initial fault analysis within 5 minutes.
1. Identify High CPU Usage Processes
Command:
ps aux --sort=-%cpu | head -n 11
Custom Reference
ps -eo pid,comm,%cpu --no-headers | sort -k3 -nr | head -11 | awk '{printf "PID: %6s | Process: %-20s | CPU: %6.1f%%\n", $1, $2, $3}'
Usage Instructions: This command lists the top 11 processes in descending order of CPU usage. Pay special attention to the %CPU and COMMAND columns. If a process consistently uses high CPU (e.g., >80%), further analysis of its behavior is required.
Notes:
In multi-core systems, a single process’s CPU usage may exceed 100% (e.g., a maximum of 400% on a 4-core system). If the kswapd0 process has high CPU usage, it usually indicates insufficient memory, and memory and Swap usage should be checked first.
2. View High Memory Usage Processes
Command:
ps aux --sort=-%mem | head -n 11
Custom Reference
ps -eo pid,comm,rss --no-headers | sort -k3 -nr | head -5 | awk '{printf "PID: %6s | Process: %-20s | MEM: %7.2f GB\n", $1, $2, $3/1024/1024}'
Usage Instructions:
Sort by physical memory (RSS) usage to identify memory hogs. Combine with free -h to check overall memory status, avoiding misjudgment of cache (buff/cache) as actual memory pressure.
Key Metrics:
%MEM: Percentage of physical memory used by the processRSS: Actual size of physical memory used (KB)
3. Check Which Process is Using a Specific Port
Command:
ss -tulnp | grep :
For example, to check port 80:
ss -tulnp | grep :80
*Usage Instructions:
When a service fails to start and indicates “Address already in use”, this command can quickly locate the PID and program name occupying the port.
Output Example:
tcp LISTEN 0 128 *:80 *:* users:(("nginx",pid=1234,fd=6))
Here, pid=1234 is the ID of the occupying process.
Alternative:
lsof -i :80 has similar functionality, but ss performs better, recommended for production environments.
4. List All Listening Ports and Corresponding Processes
Command:
ss -tulnp
Usage Instructions:
Comprehensively view the currently listening TCP/UDP ports and their associated processes, suitable for security audits or service status checks.
Parameter Explanation:
-t: TCP connections-u: UDP connections-l: Show only listening state-n: Display addresses and ports in numeric form (to avoid DNS resolution delays)-p: Show process information
5. Analyze Swap Usage and Occupying Processes
Step 1: Check Overall Swap Usage
free -h
If the used value in the Swap row is significantly greater than 0, it indicates that the system has started using the swap partition, which may affect performance.
Step 2: Locate Specific Processes Using Swap (requires root privileges)
for file in /proc/*/status; do awk '/VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file; done | sort -k2 -n -r | head
Typical Scenario:
A Java application with improperly set heap memory frequently triggers Full GC, causing a large amount of memory to be swapped out, resulting in high system IO and slow responses.
6. Check Disk Space Usage
Command:
df -h
Usage Instructions:
Displays the disk usage of each mount point in a human-readable format (GB/MB). Pay special attention to critical partitions such as /, /var, and /home.
Alert Thresholds:
Usage ≥90%: Immediate action requiredUsage ≥95%: May lead to service write failures or even crashes
7. Locate Large Files or Directories
Command (<span>Execute in the target directory</span>):
du -sh * | sort -hr
Usage Process: Use df -h to determine which partition is tight on space (e.g., /var) Navigate to that directory: cd /var Execute the above command to sort subdirectories by size Drill down until specific large files (e.g., logs, caches, core dumps) are locatedExtended Tip:
Find all files larger than 100MB in the entire system (ignoring permission errors):
find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null | head -n 20
8. View System Load
Command:
uptime
Output Example:
19:30:01 up 10 days, load average: 4.20, 3.80, 2.90
Interpretation:
The three values represent the average load over the past 1 minute, 5 minutes, and 15 minutes, respectively.
For an N-core CPU system, a sustained load > N indicates a resource bottleneck High load does not equal high CPU usage; it may also be caused by IO wait (D state processes) Auxiliary Command:
Check the number of CPU cores:
nproc
9. Detect Zombie Processes
Command:
ps aux | awk '$8 ~ /^[Zz]/'
Explanation:
Zombie processes are terminated child processes that have not been reaped by their parent process, with a status of Z. Although they do not consume CPU/memory, they occupy process table entries. A large number of zombie processes may prevent the system from creating new processes.
Handling Suggestions:
Usually requires restarting the parent process If the parent process is init (PID=1), the system will automatically clean up without intervention
10. Identify Abnormal Network Connections (Potential Attack Signs)
Command:
ss -tn | tail -n +2 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head
Purpose:
Count the number of TCP connections from each remote IP to discover brute force attacks, DDoS, or abnormal crawling behavior.
Countermeasures:
Combine with firewalls (iptables/firewalld) or cloud platform security groups to block suspicious IPs Check authentication logs:<span>grep "Failed" /var/log/secure (CentOS) or /var/log/auth.log (Ubuntu)</span>
General Troubleshooting Process Recommendations
**Observe Symptoms:** Service unavailable? Slow response? Write failures?Check Load:<span>uptime</span><span> to assess overall system pressure</span><strong><span>Resource-Specific Troubleshooting:</span></strong><code><span>CPU</span>→<span>ps aux --sort=-%cpu</span><span>Memory/Swap</span>→<span>free -h + process memory sorting</span><span>Disk</span>→<span>df -h + du -sh</span><span>Network</span>→<span>ss -tulnp + connection statistics</span>Locate Processes: Further analyze with PID (logs, strace, lsof, etc.)Handle and Validate: After fixing, continuously monitor metrics to see if they return to normal
Important Principles:
Do not rely on<span>restarting</span><span> to solve problems. Only by understanding the</span><code><span>root cause</span><span> can true system</span><code><span>stability</span><span> be achieved.</span>
Why do I say this? Because in a certain project, resource issues often arise, and the client requests a restart to release the usage, and I can only comply silently.
📬 Follow Me
Recommended Reading
Does TIME_WAIT Really Need “Optimization”? A Calm Review by an SRE
One-Click Time Synchronization! This Bash Script Makes Chrony Configuration Lightning Fast
A Practical Linux Login Script: Automatically Display System Status and Middleware Operation
Implement Load Balancing with Nginx to Keep Your Website Stable!
Is Your Home Broadband IP Dynamic? Use DDNS-GO to Solve Dynamic Domain Name Resolution in Three Minutes!
A Friend Asked Me to Recommend a Remote Work Tool, I Gave Him This Domestic Open Source Solution
In 10 Minutes, Set Up Your “Private Music Library” with Docker and Say Goodbye to VIP
Can One USB Drive Install All Systems? Ventoy Really Can!
Recently, Many Friends Messaged Me: Do You Have a Windows Activation Tool?
A Few Days Ago, Someone in the Comments Asked for: Use Nginx Container as a “General Agent” to Manage All Internal Applications!