Daily Linux: Still Using Compression Software? The tar Command is the True King of Packaging and Unpacking in Linux!

1. Command Introduction and Principles

1.1 Introduction

tar (Tape ARchive) is the most classic archiving tool in the Linux system, originally designed for tape backup, and is now widely used for file packaging, compression, and archive management. It can package multiple files or directories into a single file and supports various compression algorithms.

1.2 Working Principle

  • File Collection: Organizes multiple files and directories into a continuous byte stream

  • Header Information: Creates metadata headers for each file (filename, permissions, timestamps, etc.)

  • Data Storage: Stores file contents in order

  • Compression Processing: Optional compression layer compresses the archived data

  • Stream Output: Supports standard input and output, facilitating pipeline operations

2. Basic Syntax

tar [options] [archive file] [files or directories...]

Common Options

# Main operation modes (must choose one)-c, --create        # Create a new archive-x, --extract       # Extract files from an archive-t, --list          # List archive contents-r, --append        # Append files to an archive-u, --update        # Only add files newer than those in the archive
# Compression options-z, --gzip          # Use gzip compression/decompression (.tar.gz, .tgz)-j, --bzip2         # Use bzip2 compression/decompression (.tar.bz2)-J, --xz            # Use xz compression/decompression (.tar.xz)--zstd              # Use zstd compression/decompression (.tar.zst)-Z, --compress      # Use compress compression/decompression (.tar.Z)-a, --auto-compress # Automatically select compression method based on extension
# File operations-f, --file=ARCHIVE  # Specify archive filename-v, --verbose       # Show detailed processing information-C, --directory=DIR # Change to specified directory--exclude=PATTERN   # Exclude files matching pattern--exclude-from=FILE # Read exclude patterns from file
# Permissions and attributes-p, --preserve-permissions  # Preserve file permissions--same-owner         # Try to preserve file owner--no-same-owner      # Do not preserve owner when extracting files (default)--no-same-permissions # Do not preserve permissions when extracting files
# Other important options--totals             # Show total byte count after processing--checkpoint         # Show processing progress--verify             # Verify after writing to archive--wildcards          # Use wildcard pattern matching-T, --files-from=FILE # Read filenames to process from file

3. Classic Usage Scenarios

3.1 Creating Archive Files

# Create an uncompressed tar archive
tar -cvf project.tar project/
# Create a gzip compressed archive
tar -czvf project.tar.gz project/
# Create a bzip2 compressed archive
tar -cjvf project.tar.bz2 project/
# Create an xz compressed archive (high compression ratio)
tar -cJvf project.tar.xz project/

3.2 Extracting Archive Files

# Extract tar archive
tar -xvf archive.tar
# Extract gzip compressed archive
tar -xzvf archive.tar.gz
# Extract bzip2 compressed archive
tar -xjvf archive.tar.bz2
# Extract to a specified directory
tar -xzvf archive.tar.gz -C /target/directory/

3.3 Viewing Archive Contents

# List archive contents
tar -tvf archive.tar
# List compressed archive contents
tar -tzvf archive.tar.gz
# Detailed list of archive contents (including permissions, size, etc.)
tar -tvf archive.tar | less

3.4 Incremental Operations

# Add files to an existing archive
tar -rvf archive.tar newfile.txt
# Only add files newer than those in the archive
tar -uvf archive.tar project/
# Delete files from an archive (requires extraction and re-creation)
tar --delete -f archive.tar newfile.txt

4. Combining with Other Tools and Commands

4.1 Combining with find

# Find specific files and packageind . -name "*.log" -exec tar -rvf logs.tar {} \;
# Generate file list using find
find /var/log -name "*.log" -mtime -7 > filelist.txt
tar -czvf recent_logs.tar.gz -T filelist.txt
# Exclude certain file types
find . -type f ! -name "*.tmp" | tar -czvf backup.tar.gz -T -

4.2 Combining with ssh for Remote Operations

# Remote backup: locally package and transfer to remote server
tar -czf - /important/data | ssh user@remote "cat > /backup/backup.tar.gz"
# Remote restore: get and extract from remote server
ssh user@remote "tar -czf - /remote/data" | tar -xzf - -C /local/restore/
# Directly operate remote files
ssh user@remote "tar -czf - /path/to/files" | tar -tzv

4.3 Combining with gpg for Encryption

# Create an encrypted archive
tar -czf - sensitive_data/ | gpg -c > backup.tar.gz.gpg
# Extract encrypted archive
gpg -d backup.tar.gz.gpg | tar -xzf -
# Use asymmetric encryption
tar -czf - data/ | gpg -e -r [email protected] > backup.tar.gz.gpg

4.4 Automating Usage in Scripts

#!/bin/bash
# Automated backup script
automated_backup() {
    local backup_dir="/backup"
    local source_dirs=("/etc" "/home" "/var/www")
    local timestamp=$(date +%Y%m%d_%H%M%S)
    # Create backup directory
    mkdir -p "$backup_dir"
    # Perform backup
    echo "Starting system backup..."
    tar -czpf "$backup_dir/backup_$timestamp.tar.gz" \
        --exclude="/home/*/.cache" \
        --exclude="/var/www/*/tmp" \
        "${source_dirs[@]}"
    # Verify backup
    if tar -tzf "$backup_dir/backup_$timestamp.tar.gz" > /dev/null; then
        echo "Backup successful: $backup_dir/backup_$timestamp.tar.gz"
        # Clean up old backups (keep the last 7 days)
        find "$backup_dir" -name "backup_*.tar.gz" -mtime +7 -delete
    else
        echo "Backup verification failed!"
        return 1
    fi
}

5. Advanced Application Scenarios

5.1 Incremental Backup System

#!/bin/bash
# Incremental backup implementation
incremental_backup() {
    local full_backup="/backup/full_backup.tar.gz"
    local incremental_base="/backup/last_backup.time"
    local incremental_backup="/backup/incremental_$(date +%Y%m%d_%H%M%S).tar.gz"
    if [ ! -f "$full_backup" ]; then
        echo "Creating full backup..."
        tar -czf "$full_backup" --listed-incremental="$incremental_base" /data
    else
        echo "Creating incremental backup..."
        tar -czf "$incremental_backup" --listed-incremental="$incremental_base" /data
    fi
    echo "Backup completed"
}
# Timestamp-based incremental backup
timestamp_backup() {
    local last_run_file="/var/run/last_backup"
    local current_time=$(date +%s)
    if [ -f "$last_run_file" ]; then
        local last_time=$(cat "$last_run_file")
        # Find files modified since last backup
        find /data -type f -newer "@$last_time" > /tmp/changed_files
        if [ -s /tmp/changed_files ]; then
            tar -czf "/backup/changes_$current_time.tar.gz" -T /tmp/changed_files
        fi
    else
        # First run, create full backup
        tar -czf "/backup/full_$current_time.tar.gz" /data
    fi
    echo "$current_time" > "$last_run_file"
}

5.2 Multi-Volume Archiving (Splitting Large Files)

#!/bin/bash
# Large file split archiving
split_archive() {
    local source_dir="$1"
    local part_size="100M"  # Each part 100MB
    local base_name="large_archive"
    # Create split archive
    tar -czf - "$source_dir" | split -b "$part_size" - "${base_name}.tar.gz.part"
    echo "Archive split into ${base_name}.tar.gz.part*"
}
# Merging split archives
merge_archive() {
    local output_file="$1"
    cat *.part > "$output_file"
    echo "Archive merged into $output_file"
}
# Directly process split archives
process_split_archive() {
    # Directly extract split archive (no need to merge first)
    cat archive.tar.gz.part* | tar -xzf -
}

5.3 Advanced Exclusion and Filtering

#!/bin/bash
# Smart backup exclusion
smart_backup() {
    local exclude_file="/etc/backup_excludes"
    # Create exclusion list
    cat > "$exclude_file" << 'EOF'
# Cache and temporary files
*.tmp
*.cache
__pycache__
node_modules
# Log files (compressed)
*.log.gz
*.log.bz2
# Version control directories
.git
.svn
.hg
# System-specific exclusions
/proc
/sys
/dev
/tmp
/run
EOF
    # Perform backup
    tar -czpf "/backup/smart_backup_$(date +%Y%m%d).tar.gz" \
        --exclude-from="$exclude_file" \
        --exclude="/var/cache" \
        /

6. Comparison with Other Commands

# tar vs zip
tar -czf archive.tar.gz dir/    # Preserves Linux permissions, better compression ratio
zip -r archive.zip dir/         # Cross-platform compatible, but does not preserve all attributes
# tar vs cpio
tar -czf archive.tar.gz dir/    # Simple syntax, widely used
find dir/ | cpio -ov > archive.cpio  # More precise file control
# tar vs rsync
tar -czf backup.tar.gz /data    # Creates a snapshot in time
rsync -av /data/ backup/        # Incremental sync, maintains directory structure

7. Conclusion

By mastering the tar command in depth, you can build reliable data backup and archiving solutions. Whether for simple file packaging or complex enterprise-level backup systems, tar provides a powerful and flexible toolkit. Although modern tools like restic and borg offer better features in certain scenarios, the popularity and reliability of tar make it an indispensable foundational tool in the Linux environment.

#tar command #linux unpacking tool #linux operation and maintenance command

[Please correct any omissions!]

Leave a Comment