Daily Linux: Understanding the rsync Command – Underlying Principles and Advanced Practices to Double File Synchronization Efficiency

1. Command Introduction and Principles

1.1 Introduction

rsync (Remote Sync) is an efficient, cross-platform file synchronization and transfer tool in Linux/Unix systems, supporting local synchronization, remote synchronization (via SSH or rsync daemon), as well as features like incremental transfer, permission preservation, and data compression. Its core advantage is that it only transfers the differing parts of files, significantly improving the efficiency of large file or repeated synchronization.

1.2 Working Principle

  1. Incremental Transfer: Uses the “delta encoding” algorithm to only transfer the changed parts of files.

  2. Checksum Comparison: Detects file differences through checksums to ensure synchronization accuracy.

  3. Temporary File Mechanism: Transfers to a temporary file first, then renames it upon success to ensure data integrity.

  4. Pipe Transfer: Establishes an encrypted channel for remote synchronization via protocols like SSH.

1.3 Core Algorithm: rsync Algorithm

  1. The sender splits the file into fixed-size blocks.

  2. Calculates weak checksums (rolling checksum) and strong checksums (MD5) for each block.

  3. The receiver compares checksums to identify changed blocks.

  4. Only transfers the changed blocks and their location information.

  5. Reassembles the file on the receiver’s end.

2. Basic Syntax

rsync [options] source_path destination_path

Common Options

# Basic behavior options
-a, --archive           # Archive mode, equivalent to -rlptgoD (preserves permissions, timestamps, etc.)
-r, --recursive         # Recursively copy directories
-v, --verbose           # Verbose output
-z, --compress          # Compress data during transfer
-h, --human-readable    # Output numbers in human-readable format

# Synchronization control options
-u, --update            # Skip files that are newer on the receiver (only update older files)
--progress              # Show transfer progress
--delete                # Delete files in the destination that are not in the source
--ignore-existing       # Skip files that already exist in the destination
--force                 # Force delete directories, even if not empty

# Remote operation options
-e, --rsh=COMMAND       # Specify remote shell (e.g., -e ssh)
-P                      # Equivalent to --partial --progress
--partial               # Keep partially transferred files
--bwlimit=RATE          # Limit I/O bandwidth (KB/s)

# Filtering options
--exclude=PATTERN       # Exclude matching files
--include=PATTERN       # Include matching files
--exclude-from=FILE     # Read exclude patterns from a file
--include-from=FILE     # Read include patterns from a file

3. Classic Use Cases

3.1 Local Directory Synchronization

Requirement: Synchronize local /home/user/docs to /backup/docs, preserving all metadata and displaying detailed process.Command:

# Synchronize two local directories
rsync -av /source/directory/ /destination/directory/
# Note the difference in trailing slashes:
rsync -av source/ dest/     # Synchronize contents of source directory to dest
rsync -av source dest/      # Synchronize the source directory itself to dest

3.2 Remote Backup Solution

Requirement: Synchronize local code to a remote server, enabling compression and displaying progress.

Command:

# Local to remote backup
rsync -avzP /local/code/ [email protected]:/backup/code/
# Remote to local restore
rsync -avzP [email protected]:/backup/code/ /local/restorecode/

3.3 Mirror Directory Synchronization

Requirement: Ensure local /source and remote /mirror are identical, deleting files in /mirror that no longer exist remotely.

Command:

# Create an identical mirror (delete extra files in the destination)
rsync -av --delete /source/ /mirror/
# Remote mirror
rsync -avz --delete /local/source/ [email protected]:/remote/mirror/

3.4 Exclude Specific File Patterns

Requirement: Synchronize /data to /backup, but exclude all .tmp files and logs/ directory.

Command:

rsync -av --exclude='*.tmp' --exclude='logs/' /data/ /backup/

4. Combining with Other Tools and Commands

4.1 Deep Integration with SSH

Requirement: Simplify remote synchronization operations to avoid repeated input of authentication information.

Steps: Edit local ~/.ssh/config:

Host remote-backup       # Give the server a memorable alias
HostName 192.168.9.100 # Real IP or domain of the server
User backup-user       # Username to log into the server
Port 2222              # SSH port (default 22, changed to 2222 here)
IdentityFile ~/.ssh/id_rsa_backup # Specify the private key path for connecting to this server

Subsequent synchronization commands are simplified to:

rsync -avz /local/path/ remote-backup:/remote/path/

4.2 Combining with cron: Scheduled Automatic Backups

Requirement: Automatically synchronize /var/www to the remote server every day at 2 AM.

Steps: Write script /root/backup.sh:

#!/bin/bash
rsync -avz --delete /var/www/ user@remote:/backup/www/

Configure cron job:

crontab -e
# Write the following content (executes daily at 2:00)
0 2 * * * /bin/bash /root/backup.sh >> /var/log/backup.log 2>&1

4.3 Real-time Synchronization with inotify-tools

Requirement: Immediately synchronize local directory /app/data when changes occur.

Principle: inotifywait listens for file events (such as creation, modification) and triggers rsync synchronization.

Script example:

#!/bin/bash
SRC_DIR="/app/data"
DEST_DIR="user@remote:/remote/data"
# Install inotify-tools (if not installed)
yum install -y epel-release
yum install -y inotify-tools
# Listen for events: create, modify, delete, move
inotifywait -mrq --timefmt '%d/%m/%y %H:%M' \
  --format '%T: %e %w%f' \
  -e create,modify,delete,move $SRC_DIR | while read line
do
  rsync -avzP --delete $SRC_DIR $DEST_DIR
  echo "$(date '+%F %T') Synchronization complete" >> /var/log/realtime-sync.log
done

5. Advanced Application Scenarios

5.1 Enterprise-level Backup System

#!/bin/bash
# Enterprise-level incremental backup script
enterprise_backup() {
    local source_dir="$1"
    local backup_host="$2"
    local backup_path="$3"
    local retention_days=30
    # Create date stamp
    local date_stamp=$(date +%Y%m%d)
    local backup_dir="$backup_path/$date_stamp"
    echo "Starting backup: $source_dir -> $backup_host:$backup_dir"
    # Execute incremental backup
    rsync -avz \
        --progress \
        --delete \
        --link-dest="../latest" \
        -e ssh \
        "$source_dir/" \
        "$backup_host:$backup_dir/"
    # Update latest symlink
    ssh "$backup_host" "cd $backup_path && rm -f latest && ln -s $date_stamp latest"
    # Clean up old backups
    ssh "$backup_host" "find $backup_path -maxdepth 1 -type d -name '20*' -mtime +$retention_days -exec rm -rf {} \;"
    echo "Backup complete: $date_stamp"
}

5.2 Real-time File Synchronization

#!/bin/bash
# Use inotify-tools for real-time synchronization
real_time_sync() {
    local source_dir="$1"
    local dest_dir="$2"
    # Check if inotify-tools is installed
    if ! command -v inotifywait > /dev/null; then
        echo "Error: Please install inotify-tools"
        return 1
    fi
    echo "Starting real-time monitoring: $source_dir"
    inotifywait -m -r -e modify,create,delete,move "$source_dir" |
    while read path action file; do
        echo "Detected change: $path$file - $action"
        rsync -av --delete "$source_dir/" "$dest_dir/"
    done}
# More efficient version using --exclude to avoid duplicate synchronization
efficient_realtime_sync() {
    local source="$1"
    local dest="$2"
    local exclude_file="/tmp/rsync_excludes"
    # Create exclude list
    cat > "$exclude_file" << EOF
*.swp
*.tmp
.cache/
.git/
EOF
    while inotifywait -r -e modify,create,delete,move "$source"; do
        rsync -av --delete --exclude-from="$exclude_file" "$source/" "$dest/"
    done}

6. Comparison with Other Commands

# rsync vs scp
rsync -av source/ dest/     # Incremental sync, efficient
scp -r source/ dest/        # Simple copy, full each time

# rsync vs cp
rsync -av source/ dest/     # Smart sync, only copy changes
cp -r source/ dest/         # Simple copy, may overwrite

# rsync vs tar
rsync -av source/ dest/     # File-level sync
tar cf - source | ssh dest tar xf -  # Stream transfer, suitable for networks

7. Conclusion

rsync is the de facto standard tool for file synchronization and backup in Linux systems, with core advantages in incremental transfer, metadata preservation, and flexible synchronization strategies. Key usage points include:

  • Basic synchronization: Use -a to preserve metadata, -v/-z to control output and compression.

  • Secure synchronization: Prefer SSH for remote sync (default encryption), sensitive data can be combined with –bwlimit to limit speed.

  • Advanced scenarios: Use –link-dest for incremental backups, real-time sync with inotify-tools, daemon mode for low-security internal environments.

  • Risk control: Use –delete with caution, check permissions in advance.

Mastering rsync can significantly enhance file management efficiency, making it an indispensable tool for both personal data backup and enterprise-level data synchronization.

#rsync command #Daily Linux command #Linux file synchronization command #Linux operation and maintenance command

[Please correct any omissions!]

Leave a Comment