Establishing
Summer


The following is a practical summary of backups for virtual machines, physical machines, operating systems, applications, files, and databases. It summarizes various dimensions including backup types, technical methods, tool selection, precautions, and recovery strategies, suitable for enterprise-level system maintenance and disaster recovery.
1
Multidimensional Summary of Backup-Related Content
1. Overview of Backups

2. Practical Backup for Virtual Machines (using KVM/VMware as an example)
1. Online Hot Backup
Tools: VMware VDP/Veeam Backup & Replication, qemu-img snapshot (KVM)
Method:
Regularly create snapshots
Copy virtual disk files (.vmdk/.qcow2)
2. Cold Backup
Copy the entire virtual machine directory, including disks and configuration files, after shutting down.
3. Automation Strategy
# KVM Backup Example
virsh suspend vm01
qemu-img convert -O qcow2 /var/lib/libvirt/images/vm01.qcow2 /backup/vm01_$(date +%F).qcow2
virsh resume vm01
3. Practical Backup for Physical Machines
1. Full System Image
Tools: Clonezilla, Ghost, Acronis
Purpose: Quick recovery to specified hardware/bare metal environment
2. Key Data Partition Backup
dd if=/dev/sda of=/backup/sda_backup.img bs=64K
When backing up partitions, be sure to exclude system running files outside of the mount points. It is recommended to operate in a PE or LiveCD environment.
4. Operating System Backup
1. Backup of System Configuration Files
/etc, /boot, /var/log
Use rsync for regular incremental backups
rsync -avz /etc /boot /var/log /backup/os-config/
2. Export SELinux, Firewall, and Network Configuration
semanage export > selinux.rules
firewall-cmd --list-all > firewall.rules
nmcli connection show > net_config.txt
5. Application Service Backup
Application Backup Content Command/Method
Nginx /etc/nginx, SSL Certificates tar or git version control
Tomcat conf directory, webapps zip compression or sync backup
Redis dump.rdb, appendonly.aof copy data directory
Kafka config directory, log directory stop and copy or use rsync sync
6. File-Level Backup
1. Using rsync
rsync -avz --delete /data/ /backup/data/
2. Using tar/gzip for scheduled packaging
tar -czf data_$(date +%F).tar.gz /data/
3. Example of Scheduled Backup Script (crontab)
02 * * * /usr/bin/rsync -avz /data/ /backup/data/ >> /var/log/backup.log
7. Database Backup
1. MySQL/MariaDB
# Logical Backup
mysqldump -uroot -p --all-databases > all.sql
# Physical Backup Tools: Percona XtraBackup, suitable for large data volume hot backups
2. PostgreSQL
pg_dumpall -U postgres > pg_backup.sql
3. MongoDB
mongodump --out /backup/mongodb/
4. Oracle
Use RMAN, exp/imp, Data Pump, etc.
8. Recommended General Backup Tools

9. Recovery Strategies and Drill Recommendations
Regularly drill the recovery process; no drill = no backup
Ensure off-site backup copies to prevent ransomware and fire
Backup version retention strategy (7 days, 30 days, half a year)
Use checks (e.g., md5sum) to ensure backup validity
Implement backup reports and alerts (email, monitoring linkage)
10. Recommendations for Enterprise-Level Backup Strategies
Three Backup Principles (3-2-1):
Keep 3 copies
Use 2 different storage media
At least 1 copy stored off-site
Automation + auditing mechanism + disaster recovery plan formulation
Support snapshot technology and incremental differential backups to reduce overhead
2
Summary of Considerations and Performance Impact During Backup Process
When performing backups for virtual machines, physical machines, operating systems, applications, files, and databases, the considerations during the backup process and their impact on performance are crucial, directly affecting the effectiveness of backups, business continuity, and system stability.
1. Backup Consistency
File system consistency: Avoid backing up during file writes; it is recommended to use LVM snapshots or mount as read-only.
Database consistency:
Use logical backup tools (e.g., mysqldump, pg_dump) or hot backup tools (e.g., Percona XtraBackup);
Need to add parameters like –single-transaction to ensure consistency.
Virtual machine consistency:
Use application-aware tools (e.g., Veeam);
Enable VMware Tools or QEMU guest agent to trigger consistent snapshots.
2. Backup Window Planning
Avoid peak business hours;
For large data volumes, consider segmented, incremental, full weekly + daily incremental strategies;
For 7×24 businesses, use hot backup solutions (e.g., database hot backup, Rsync real-time sync).
3. Network and IO Bandwidth Assessment
Avoid scheduling large traffic backup tasks during high network or disk load periods;
It is recommended to use a dedicated backup network port/VLAN;
Use QoS to control backup bandwidth usage to avoid affecting online business.
4. Selection of Backup Storage Media
Local Disk: Fast read/write but volatile;
NAS/iSCSI: Suitable for file-level centralized backups;
Object Storage (e.g., MinIO, Alibaba OSS): Highly scalable, suitable for long-term archiving.
5. Backup Security
Transmission encryption (rsync + ssh, sftp);
Backup data encryption (GPG, restic, Duplicity);
Set backup directory permissions to prevent encryption by ransomware.
6. Automation and Visualization
Use scripts or tools for scheduled backups (cron + shell, Ansible);
Provide logs, reports, alerts;
It is recommended to integrate with monitoring systems (e.g., Zabbix) for visual tracking.
7. Backup Retention Strategy
Retention Period: For example, full backups for 7 days, incremental for 30 days, monthly archiving for half a year;
Automatically clean old backups;
Regularly verify backup integrity (e.g., md5, sha256 checks).
2. Summary of Performance Impact of Backup Process on System

3. Special Impacts in High Availability Scenarios
In clustered or load-balanced environments (e.g., MySQL MGR, Redis Cluster, K8s), backup strategies need to consider:
Backing up from slave nodes to avoid performance impact on master nodes;
Implementing read-write separation architecture to reduce lock contention;
Monitoring disaster recovery node synchronization delays to avoid backing up dirty data.
4. Summary and Recommendations

3
How to Optimize Backup Strategies When Resources Are Limited
When resources are limited (e.g., insufficient disk space, limited network bandwidth, tight computing resources), the goal of optimizing backup strategies is to ensure the recoverability of critical data, reduce resource consumption, and improve efficiency. Here is a practical set of optimization ideas:
1. Clarify Priorities: Only back up critical data

2. Adopt Incremental + Differential Backup Strategies
1. Full Backup (Full)
Perform once a month/week;
Most resource-intensive, only serves as a baseline.
2. Differential Backup (Differential)
Every 2-3 days;
Back up data that has changed since the last full backup.
3. Incremental Backup (Incremental)
Daily or hourly;
Only back up changes since the last backup.
Illustration:
Full (Sunday)
└─ Differential (Wednesday)
└─ Incremental (Thursday, Friday, Saturday)
3. Use Compression and Deduplication Techniques to Reduce Space Usage
Use gzip, xz, zstd, etc. for compression
tar -I 'zstd -19' -cf config_backup.tar.zst /etc/
Use tools that support deduplication (e.g., restic, BorgBackup)
Compression ratio is proportional to CPU load: when resources are tight, it is recommended to use gzip -6 or zstd -3
4. Distributed or Staggered Backups, Execute During Off-Peak Hours
Backup different hosts at different times to distribute the load;
Make reasonable use of nighttime or low business peak hours for backups;
Rotate backups among multiple nodes, for example:
HostA: Monday, Thursday
HostB: Tuesday, Friday
HostC: Wednesday, Saturday
5. Use Low Resource Overhead Tools

6. Reduce Unnecessary Backup Content
Use exclusion parameters to reduce junk files:
rsync -av --exclude='*.log' --exclude='*.tmp' /data/ /backup/
Use a .backupignore list to maintain files to skip
7. Adopt Backup Rotation and Cleanup Strategies When Space is Insufficient
Use fixed retention periods, such as:
Keep 3 full backups;
Keep incremental backups for 7 days;
Regularly automatically clean old backups
find /backup -type f -mtime +10 -delete
8. Asynchronous Remote Backup + Local Cache
Keep the last 7 days locally;
Transfer offline to remote NFS or USB disk;
Or use rclone to upload important backups to object storage (e.g., Alibaba OSS, MinIO)
9. Lightweight Backup Optimization Recommendations Table

10. Key Recommendations Summary
✅ Prioritize ensuring the minimal recoverable set (Minimal Recovery Set)
✅ Use lightweight incremental + compression tools
✅ Control backup frequency and time windows
✅ Automatically clean old data, automatic alerts
✅ Regularly test recovery, even if only testing key file restoration
4
The Importance of Backups
The importance of backups can be explained from multiple dimensions such as data security, business continuity, compliance, cost control, and disaster recovery capability. Here is a systematic summary:
1. Why are Backups Important?
1. Prevent Data Loss
Accidental deletion, misoperation (e.g., rm -rf);
Hardware failures (e.g., disk damage, power failure);
Software bugs or system crashes;
Malicious attacks (e.g., ransomware, hacker tampering);
Natural disasters (e.g., fire, flood, earthquake).
✅ Backups are the only guarantee for recovery after complete data loss.
2. Ensure Business Continuity
Data loss will directly interrupt system operation;
Slow data recovery can lead to customer loss and damage to brand reputation;
Effective backups can achieve recovery in minutes/hours, minimizing downtime (MTTR).
✅ Without backups, downtime or data corruption will prevent quick recovery of business operations.
3. Meet Regulatory and Audit Requirements
Industries such as finance, healthcare, and e-commerce require mandatory data retention;
Compliance requirements for data protection exist in standards like GB/T 22239-2019, GDPR, SOX, etc.;
Regular backups and retention records are the basis for passing audits.
✅ Backups are a fundamental requirement for compliant operations.
4. Reduce Long-Term Costs
The direct economic losses from data loss far exceed the cost of backups;
Effective backups can avoid repetitive work and costs caused by human/system errors;
Support tiered recovery to avoid resource waste caused by restoring all data.
✅ Small cost investment avoids large disaster costs.
5. Enhance Operations and Management Capabilities
Back up before system upgrades or migrations to ensure rollback capability;
Support data comparison, version tracking, and recovery of deleted files;
Multi-environment testing can be quickly set up using backup images (e.g., UAT).
✅ Backups are an important support for stable operations.
2. Serious Consequences of Not Backing Up

3. One-Sentence Summary
Backups are not optional; they are the “last line of defense” for enterprise information systems and stable operation.
5.
Backup Strategy Specifications and Summary of Enterprise Backup Recovery Drills
The following is a systematic summary of “Backup Strategy Specifications” and “Enterprise Backup Recovery Drills”, suitable for IT operations, architecture management, and disaster recovery scenarios in small to medium-sized enterprises:
📦 1. Backup Strategy Specifications (Standardized Document Template)
1. Objectives and Principles

2. Classification Strategy Design

3. Backup Methods and Technology Selection

4. Storage and Security Policies
Backup storage media:
Local Disk: Temporary cache, fast recovery speed;
Remote NFS/SAN: Can be centrally managed;
Cloud/Object Storage (OSS/S3): Disaster recovery off-site storage;
Removable Media: Tape, USB hard drives for long-term archiving.
Security hardening measures:
Backup data encrypted storage;
Storage isolation + access control;
Use separate accounts + MFA;
To combat ransomware: One offline cold backup is indispensable.
5. Retention and Rotation Strategies
Rolling cleanup of old backups:
find /backup -type f -mtime +30 -delete
It is recommended to adopt the “3-2-1” strategy:
3 copies of data;
2 different media;
1 off-site offline backup.
🚨 2. Summary of Enterprise Backup Recovery Drills
1. Purpose of Recovery Drills
Verify that backups are valid and usable;
Familiarize with recovery processes, document them;
Check if recovery time is within RTO requirements;
Train operations personnel’s response capabilities.
2. Types of Recovery Drills
Type Content
File-Level Recovery Recover deleted files, rollback specific configurations
Database Recovery Full recovery + binlog point-in-time recovery
System Recovery Restore entire virtual machine or bare metal to a specific snapshot state
Disaster Recovery Switch Drill Switch business from master node to backup node
Simulated Disaster Recovery Simulate downtime/intrusion/ransomware attack, execute complete recovery process
3. Suggested Recovery Drill Process (Documented)
Prepare test environment: Isolate drill environment, do not affect online;
Confirm recovery plan: Select recovery target (time point/version);
Execute recovery operations: Follow standard procedures;
Verify integrity and consistency;
Record time taken, issues found, and improvement points;
Generate reports and archive.
4. Suggested Frequency and Check Items
Content Suggested Frequency
File/Database Recovery Testing Once per quarter
Full Site/Disaster Recovery Drill Every six months or annually
Off-Site Recovery Capability Verification Once per year
Backup Integrity Check (hash) Once per month
5. Common Recovery Issues and Responses
Issue Type Description and Recommendations
Backup File Corruption Regularly use md5sum checks to avoid silent failures
Database Recovery Failure Check version compatibility, character set, permissions
Lack of Recovery Documentation Write SOP + automation scripts
Slow Recovery Optimize compression parameters, speed up decompression process, partial recovery
✅ Summary Recommendations
Backup strategies must be “written down + tested” to be truly reliable;
Drills are the best means to validate strategies and improve team coordination;
Having backups does not guarantee recovery; only drills provide true peace of mind.
END



Long press to scan the code and follow us
Public Account | The Road to SRE

Forge
Ahead
Strive
Forward
