A Checklist of Key Items to Check Before and After Restarting a Linux System

Here is a professional version of the ✅ “Linux Pre-Restart Checklist” that is very useful in production environments.

✅ Linux Pre-Restart Checklist

1. Current system load and resource usage

uptime / top / htop / vmstat

Record the current CPU, memory, and Load Average to compare if there is any improvement after the restart.

2. Disk space and IO status

df -h to check if the disk is nearly full.

iostat -x 1 5 or iotop to check for abnormal disk IO.

If the disk is full, the system may fail to boot after a restart.

3. Log analysis

Check /var/log/messages, /var/log/syslog, and application logs.

Are there any error messages? For example, OOM Killer, disk I/O errors, abnormal service exits, etc.

Key commands:

dmesg | tail -n 50
journalctl -xe

4. Service status confirmation

systemctl list-units –state=failed

Check which services have crashed, note them down, and pay special attention to whether these services can start normally after the restart.

5. Network connection status

ss -tunlp or netstat -anp

Record the current number of network connections (listening ports, active connections).

Avoid port conflicts or listening failures after the restart.

6. Scheduled task check

crontab -l / ls /etc/cron*

Some scheduled tasks that need to run after a restart should be confirmed in advance.

7. Mount points and storage confirmation

mount / cat /etc/fstab

Check if important disks (NFS, iSCSI, SAN storage, etc.) mount points are configured correctly; otherwise, a failed mount after a restart may prevent services from starting.

8. Important configuration backup

Backup important configuration files such as: /etc/fstab, /etc/network/, /etc/hosts, application configurations, database configurations, etc.

Recommended command:

tar czvf config_backup_$(date +%F).tar.gz /etc /opt/app/config

9. Coordination in high availability/clustering environments

If the server is a cluster node (e.g., MySQL master-slave, Redis Sentinel, K8s Node, HAProxy+Keepalived, etc.), notify other nodes or temporarily remove the node to avoid triggering master-slave switches or cluster alarms.

10. Confirm if there is unsaved data

Key information and temporary operational data (such as data in memory disks, important files in tmp directories) need to be saved in advance.

11. Confirm available remote connection methods

Ensure IPMI, iLO, KVM, and remote consoles are available.

Avoid being unable to rescue the system if it hangs after a restart.

12. Notify relevant parties

Notify application owners, business parties, monitoring teams, etc., to avoid false alarms or misunderstandings.

📋 Appendix: Restart execution suggestions

sync
shutdown -r now

Or a more cautious approach (especially for servers with high IO):

sync && sleep 5 && reboot

(Write to disk, wait 5 seconds, then restart to avoid data loss)

🔥 Tip

If you have time, take a snapshot or backup before restarting (especially in virtual machine environments like VMware/KVM) for added safety.

✅ Linux Post-Restart Checklist

1. Confirm if the system has booted normally

Check if you can successfully access the login interface via the console/remote terminal.

Check for any abnormal error messages during the boot process, such as:

File system check failed (fsck failed)

Unable to mount disk

Kernel panic

If there are abnormalities, intervene immediately using rescue mode or IPMI.

2. Check time synchronization

Confirm if the system time is accurate:

date

datectl status

Confirm if ntpd/chronyd is running normally:

systemctl status chronyd

3. Check disk mount status

Check if all mount points have been successfully mounted:

df -hmount

Compare with the mount status before the restart to see if there are any losses (especially for NFS/iSCSI).

4. Check core service status

Confirm one by one if key business services have automatically started and are running normally:

systemctl status service_name
ps aux | grep service_name

For example: Nginx, MySQL, Redis, Docker, Kubernetes node services, etc.

5. Check network status

Is the IP address configured correctly:

ip a

Can you ping the gateway, DNS, and other key nodes?

Can the external network be accessed normally (if there are public network requirements)?

Check port listening:

ss -tunlp

Code snippet: Language can be switched, but text formatting cannot be set separately

6. Check CPU and memory load

Observe if the system has an abnormally high load:

top
uptime

Is the Load Average abnormally high (it should be low after a restart)?

7. Check logs

Quickly scan the system logs during the boot process:

journalctl -b -p 3

(-b indicates this boot, -p 3 filters error-level logs)

Key focus:

Disk I/O errors

Network errors

Authentication failures

Service startup failures

8. Check firewall and SELinux status

Confirm if the firewall rules are correctly applied:

firewall-cmd --list-all

Check if SELinux is unexpectedly enabled/disabled (if required by the environment):

getenforce

9. Confirm if scheduled tasks have resumed

Check if crontab has loaded correctly:

crontab -l
systemctl status crond

10. Application health confirmation

Log into the application interface and perform functional access tests.

Check application logs to confirm there are no anomalies during startup (e.g., Tomcat, Nginx+PHP, SpringBoot applications).

📋 Final suggestion: Small actions:

Update the status of the monitoring system (e.g., Zabbix/Nagios/Prometheus):

Confirm that monitoring has returned to normal and is no longer reporting faults.

Briefly record the restart time, reason, and results (for future tracking and review).

🖐️ In summary:

“After a restart, the verification goals are: services start, data is normal, network is smooth, and the system is healthy.”

END

Long press to scan the code and follow us

Public Account | SRE Road

Forge ahead

Strive

Progress

Advance

Leave a Comment Cancel reply