Here is a professional version of the ✅ “Linux Pre-Restart Checklist” that is very useful in production environments.
1
✅ Linux Pre-Restart Checklist
1. Current system load and resource usage
uptime / top / htop / vmstat
Record the current CPU, memory, and Load Average to compare if there is any improvement after the restart.
2. Disk space and IO status
df -h to check if the disk is nearly full.
iostat -x 1 5 or iotop to check for abnormal disk IO.
If the disk is full, the system may fail to boot after a restart.
3. Log analysis
Check /var/log/messages, /var/log/syslog, and application logs.
Are there any error messages? For example, OOM Killer, disk I/O errors, abnormal service exits, etc.
Key commands:
dmesg | tail -n 50
journalctl -xe
4. Service status confirmation
systemctl list-units –state=failed
Check which services have crashed, note them down, and pay special attention to whether these services can start normally after the restart.
5. Network connection status
ss -tunlp or netstat -anp
Record the current number of network connections (listening ports, active connections).
Avoid port conflicts or listening failures after the restart.
6. Scheduled task check
crontab -l / ls /etc/cron*
Some scheduled tasks that need to run after a restart should be confirmed in advance.
7. Mount points and storage confirmation
mount / cat /etc/fstab
Check if important disks (NFS, iSCSI, SAN storage, etc.) mount points are configured correctly; otherwise, a failed mount after a restart may prevent services from starting.
8. Important configuration backup
Backup important configuration files such as: /etc/fstab, /etc/network/, /etc/hosts, application configurations, database configurations, etc.
Recommended command:
tar czvf config_backup_$(date +%F).tar.gz /etc /opt/app/config
9. Coordination in high availability/clustering environments
If the server is a cluster node (e.g., MySQL master-slave, Redis Sentinel, K8s Node, HAProxy+Keepalived, etc.), notify other nodes or temporarily remove the node to avoid triggering master-slave switches or cluster alarms.
10. Confirm if there is unsaved data
Key information and temporary operational data (such as data in memory disks, important files in tmp directories) need to be saved in advance.
11. Confirm available remote connection methods
Ensure IPMI, iLO, KVM, and remote consoles are available.
Avoid being unable to rescue the system if it hangs after a restart.
12. Notify relevant parties
Notify application owners, business parties, monitoring teams, etc., to avoid false alarms or misunderstandings.
📋 Appendix: Restart execution suggestions
sync
shutdown -r now
Or a more cautious approach (especially for servers with high IO):
sync && sleep 5 && reboot
(Write to disk, wait 5 seconds, then restart to avoid data loss)
🔥 Tip
If you have time, take a snapshot or backup before restarting (especially in virtual machine environments like VMware/KVM) for added safety.
2
✅ Linux Post-Restart Checklist
1. Confirm if the system has booted normally
Check if you can successfully access the login interface via the console/remote terminal.
Check for any abnormal error messages during the boot process, such as:
File system check failed (fsck failed)
Unable to mount disk
Kernel panic
If there are abnormalities, intervene immediately using rescue mode or IPMI.
2. Check time synchronization
Confirm if the system time is accurate:
date
datectl status
Confirm if ntpd/chronyd is running normally:
systemctl status chronyd
3. Check disk mount status
Check if all mount points have been successfully mounted:
df -hmount
Compare with the mount status before the restart to see if there are any losses (especially for NFS/iSCSI).
4. Check core service status
Confirm one by one if key business services have automatically started and are running normally:
systemctl status service_name
ps aux | grep service_name
For example: Nginx, MySQL, Redis, Docker, Kubernetes node services, etc.
5. Check network status
Is the IP address configured correctly:
ip a
Can you ping the gateway, DNS, and other key nodes?
Can the external network be accessed normally (if there are public network requirements)?
Check port listening:
ss -tunlp
Code snippet: Language can be switched, but text formatting cannot be set separately
6. Check CPU and memory load
Observe if the system has an abnormally high load:
top
uptime
Is the Load Average abnormally high (it should be low after a restart)?
7. Check logs
Quickly scan the system logs during the boot process:
journalctl -b -p 3
(-b indicates this boot, -p 3 filters error-level logs)
Key focus:
Disk I/O errors
Network errors
Authentication failures
Service startup failures
8. Check firewall and SELinux status
Confirm if the firewall rules are correctly applied:
firewall-cmd --list-all
Check if SELinux is unexpectedly enabled/disabled (if required by the environment):
getenforce
9. Confirm if scheduled tasks have resumed
Check if crontab has loaded correctly:
crontab -l
systemctl status crond
10. Application health confirmation
Log into the application interface and perform functional access tests.
Check application logs to confirm there are no anomalies during startup (e.g., Tomcat, Nginx+PHP, SpringBoot applications).
📋 Final suggestion: Small actions:
Update the status of the monitoring system (e.g., Zabbix/Nagios/Prometheus):
Confirm that monitoring has returned to normal and is no longer reporting faults.
Briefly record the restart time, reason, and results (for future tracking and review).
🖐️ In summary:
“After a restart, the verification goals are: services start, data is normal, network is smooth, and the system is healthy.”
END



Long press to scan the code and follow us
Public Account | SRE Road

Forge ahead
Strive
Progress
Advance
