Linux File System and I/O Performance Optimization: A Complete Link from VFS to Storage Devices

Ultimate Optimization of Storage Performance: In the data-driven era, I/O performance directly determines application response speed and user experience. This article delves into the Linux 5.15.4 kernel file system subsystem, revealing the secrets of performance optimization from the VFS virtual file system to the underlying storage devices.

🎯 The Business Value of I/O Performance

The Direct Impact of Storage Performance on Business

Real Data Statistics:

  • Database Applications: I/O latency reduced by 50%, query performance improved by 3 times
  • Big Data Analysis: After storage optimization, ETL processing time reduced by 60%
  • Web Applications: Page load time reduced from 3 seconds to 800ms, conversion rate increased by 25%
  • Video Streaming: I/O optimization reduced buffering time by 90%, user satisfaction increased by 40%

Case Study of an E-commerce Platform: Through file system and I/O optimization, achieved:

  • Product search response time reduced from 500ms to 80ms
  • Order processing capacity increased by 200%
  • Storage costs reduced by 35%
  • No I/O failures during Double 11, supporting peak 100,000 QPS

🏗️ In-depth Analysis of Linux File System Architecture

VFS (Virtual File System) Hierarchy

Application
    ↓
System Call Interface (read/write/open/close)
    ↓
VFS Virtual File System Layer
    ↓
Specific File System (ext4/xfs/btrfs)
    ↓
Block Device Layer (Block Layer)
    ↓
I/O Scheduler (I/O Scheduler)
    ↓
Device Driver
    ↓
Storage Hardware (SSD/HDD)

Core Data Structure Deep Dive

// include/linux/fs.h (line 623)
struct inode {
    umode_t i_mode;                     /* File type and permissions */
    kuid_t i_uid;                       /* User ID */
    kgid_t i_gid;                       /* Group ID */
    
    const struct inode_operations *i_op; /* inode operation function set */
    struct super_block *i_sb;           /* Super block pointer */
    struct address_space *i_mapping;    /* Address space mapping */
    
    /* File statistics */
    unsigned long i_ino;                /* inode number */
    unsigned int i_nlink;               /* Hard link count */
    loff_t i_size;                      /* File size */
    
    /* Timestamps */
    struct timespec64 i_atime;          /* Access time */
    struct timespec64 i_mtime;          /* Modification time */
    struct timespec64 i_ctime;          /* Status change time */
    
    /* Locks and synchronization */
    spinlock_t i_lock;                  /* Protects i_blocks, i_bytes, etc. */
    struct rw_semaphore i_rwsem;        /* Read-write semaphore */
    
    /* Block information */
    u8 i_blkbits;                       /* Block size bits */
    blkcnt_t i_blocks;                  /* Number of blocks */
    
    /* State and reference count */
    unsigned long i_state;              /* inode state */
    atomic_t i_count;                   /* Reference count */
    atomic_t i_writecount;              /* Writer count */
    
    /* File operations */
    const struct file_operations *i_fop; /* File operation function set */
    
    /* Address space */
    struct address_space i_data;        /* Device address space */
    
    void *i_private;                    /* Private data pointer */
} __randomize_layout;

2. VFS Read/Write Operation Process

// fs/read_write.c (line 465)
ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
{
    ssize_t ret;
    
    /* Permission check */
    if (!(file->f_mode & FMODE_READ))
        return -EBADF;
    if (!(file->f_mode & FMODE_CAN_READ))
        return -EINVAL;
    if (unlikely(!access_ok(buf, count)))
        return -EFAULT;
    
    /* Area validation */
    ret = rw_verify_area(READ, file, pos, count);
    if (ret)
        return ret;
    if (count > MAX_RW_COUNT)
        count = MAX_RW_COUNT;
    
    /* Execute actual read */
    if (file->f_op->read)
        ret = file->f_op->read(file, buf, count, pos);
    else if (file->f_op->read_iter)
        ret = new_sync_read(file, buf, count, pos);
    else
        ret = -EINVAL;
    
    /* Update statistics */
    if (ret > 0) {
        fsnotify_access(file);
        add_rchar(current, ret);
    }
    inc_syscr(current);
    
    return ret;
}

⚡ In-depth Analysis of I/O Performance Optimization Techniques

1. Page Cache Optimization

How Page Cache Works:

// mm/filemap.c
struct page *find_get_page(struct address_space *mapping, pgoff_t offset)
{
    struct page *page;
    
    rcu_read_lock();
    page = find_get_entry(mapping, offset);
    if (xa_is_value(page))
        page = NULL;
    rcu_read_unlock();
    
    return page;
}

// Page cache read optimization
static ssize_t generic_file_buffered_read(struct kiocb *iocb,
                                         struct iov_iter *iter,
                                         ssize_t written)
{
    struct file *filp = iocb->ki_filp;
    struct address_space *mapping = filp->f_mapping;
    struct inode *inode = mapping->host;
    struct file_ra_state *ra = &filp->f_ra;
    
    /* Prefetch optimization */
    page_cache_sync_readahead(mapping, ra, filp, index, last_index - index);
    
    /* Find page cache */
    page = find_get_page(mapping, index);
    if (!page) {
        /* Cache miss, trigger page allocation and read */
        goto page_not_up_to_date;
    }
    
    /* Cache hit, return data directly */
    return copy_page_to_iter(page, offset, bytes, iter);
}

2. Readahead Optimization

Adaptive Readahead Algorithm:

#!/bin/bash
# Readahead parameter optimization script

DEVICE="/dev/sda"
MOUNT_POINT="/data"

# Get current readahead settings
CURRENT_RA=$(blockdev --getra $DEVICE)
echo "Current readahead size: ${CURRENT_RA} sectors"

# Adjust readahead size based on workload type
case "$1" in
    "database")
        # Database workload: small random I/O, reduce readahead
        NEW_RA=256
        ;;
    "streaming")
        # Streaming workload: large sequential I/O, increase readahead
        NEW_RA=4096
        ;;
    "web")
        # Web server: mixed workload, moderate readahead
        NEW_RA=1024
        ;;
    *)
        NEW_RA=2048
        ;;
esac

# Apply new readahead settings
blockdev --setra $NEW_RA $DEVICE
echo "New readahead size: ${NEW_RA} sectors"

# Adjust filesystem-level readahead
echo $NEW_RA > /sys/block/$(basename $DEVICE)/queue/read_ahead_kb

3. I/O Scheduler Optimization

Comparison of Different Scheduler Characteristics:

#!/bin/bash
# I/O scheduler performance testing and optimization script

DEVICE="sda"
TEST_FILE="/tmp/io_test"
SCHEDULERS=("mq-deadline" "kyber" "bfq" "none")

echo "I/O Scheduler Performance Testing"
echo "=================="

for scheduler in "${SCHEDULERS[@]}"; do
    echo "Testing scheduler: $scheduler"
    
    # Set scheduler
    echo $scheduler > /sys/block/$DEVICE/queue/scheduler
    
    # Sequential read test
    echo "Sequential read test:"
    dd if=$TEST_FILE of=/dev/null bs=1M count=1000 2>&1 | grep -E "(copied|MB/s)"
    
    # Random read test
    echo "Random read test:"
    fio --name=random_read --ioengine=libaio --rw=randread --bs=4k \
        --numjobs=4 --iodepth=32 --runtime=30 --filename=$TEST_FILE \
        --group_reporting --minimal | awk -F';' '{print "IOPS:", $8, "Latency:", $40"us"}'
    
    echo "---"
done

# Recommend scheduler based on workload
echo "Scheduler selection recommendations:"
echo "- Database/OLTP: mq-deadline (low latency)"
echo "- Big Data/Batch Processing: none (high throughput)"
echo "- Desktop/Interactive: bfq (fairness)"
echo "- SSD/NVMe: kyber (multi-queue optimization)"

🚀 Practical File System Performance Optimization

ext4 File System Optimization

Mount Parameter Optimization:

#!/bin/bash
# ext4 file system performance optimization

DEVICE="/dev/sda1"
MOUNT_POINT="/data"

# High-performance mount options
MOUNT_OPTIONS="defaults,noatime,nodiratime,barrier=0,data=writeback,commit=60,delalloc"

# Database optimization mount options
DB_MOUNT_OPTIONS="defaults,noatime,nodiratime,barrier=1,data=ordered,commit=5"

# Web server optimization mount options
WEB_MOUNT_OPTIONS="defaults,relatime,barrier=1,data=ordered,commit=30"

echo "ext4 file system optimization configuration"
echo "==================="

case "$1" in
    "performance")
        OPTIONS=$MOUNT_OPTIONS
        echo "High-performance configuration (suitable for high throughput scenarios)"
        ;;
    "database")
        OPTIONS=$DB_MOUNT_OPTIONS
        echo "Database configuration (suitable for transactional workloads)"
        ;;
    "web")
        OPTIONS=$WEB_MOUNT_OPTIONS
        echo "Web server configuration (suitable for web applications)"
        ;;
    *)
        OPTIONS=$MOUNT_OPTIONS
        ;;
esac

# Apply mount options
umount $MOUNT_POINT 2>/dev/null || true
mount -o $OPTIONS $DEVICE $MOUNT_POINT

echo "Mount options: $OPTIONS"
echo "Mount point: $MOUNT_POINT"

# Verify mount options
mount | grep $MOUNT_POINT

XFS File System Optimization

XFS Specific Optimization:

#!/bin/bash
# XFS file system performance tuning

DEVICE="/dev/sdb1"
MOUNT_POINT="/data"

echo "XFS file system optimization"
echo "==============="

# Create optimized XFS file system
mkfs.xfs -f -d agcount=8,su=64k,sw=4 -l size=128m,su=64k $DEVICE

# High-performance mount options
XFS_OPTIONS="defaults,noatime,nodiratime,logbufs=8,logbsize=256k,largeio,inode64,swalloc"

# Mount file system
mount -o $XFS_OPTIONS $DEVICE $MOUNT_POINT

echo "XFS optimization parameters description:"
echo "- agcount=8: 8 allocation groups, improving concurrency"
echo "- su=64k,sw=4: stripe unit 64KB, stripe width 4"
echo "- logbufs=8: 8 log buffers"
echo "- logbsize=256k: log buffer size 256KB"
echo "- largeio: enable large I/O optimization"
echo "- inode64: use 64-bit inode numbers"
echo "- swalloc: stripe-aware allocation"

# Runtime tuning
echo "Runtime tuning parameters:"
echo 0 > /proc/sys/vm/swappiness
echo 1 > /proc/sys/vm/vfs_cache_pressure
echo 15 > /proc/sys/vm/dirty_background_ratio
echo 40 > /proc/sys/vm/dirty_ratio

📊 I/O Performance Monitoring and Analysis

System-level I/O Monitoring

Comprehensive I/O Performance Monitoring Script:

#!/bin/bash
# I/O performance monitoring script

DEVICE=${1:-"sda"}
INTERVAL=${2:-5}
DURATION=${3:-60}

echo "I/O Performance Monitoring - Device: $DEVICE"
echo "Monitoring interval: ${INTERVAL} seconds, Duration: ${DURATION} seconds"
echo "========================================"

# Create result file
RESULT_FILE="io_performance_$(date +%Y%m%d_%H%M%S).log"

{
    echo "Timestamp,Read IOPS,Write IOPS,Read MB/s,Write MB/s,Average Queue Length,Utilization%,Average Wait Time ms"
    
    for ((i=0; i<DURATION; i+=INTERVAL)); do
        TIMESTAMP=$(date +%s)
        
        # Use iostat to get detailed I/O statistics
        IOSTAT_OUTPUT=$(iostat -x $INTERVAL 1 | grep -w $DEVICE | tail -1)
        
        if [[ -n "$IOSTAT_OUTPUT" ]]; then
            # Parse iostat output
            read -r device rrqm_s wrqm_s r_s w_s rkB_s wkB_s \
                 avgrq_sz avgqu_sz await r_await w_await svctm util <<< "$IOSTAT_OUTPUT"
            
            # Calculate IOPS and MB/s
            READ_IOPS=$r_s
            WRITE_IOPS=$w_s
            READ_MBS=$(echo "scale=2; $rkB_s / 1024" | bc -l)
            WRITE_MBS=$(echo "scale=2; $wkB_s / 1024" | bc -l)
            
            echo "$TIMESTAMP,$READ_IOPS,$WRITE_IOPS,$READ_MBS,$WRITE_MBS,$avgqu_sz,$util,$await"
        fi
        
        sleep $INTERVAL
    done
} > $RESULT_FILE

echo "Monitoring complete, results saved to: $RESULT_FILE"

# Generate performance analysis report
{
    echo "I/O Performance Analysis Report"
    echo "=============="
    echo "Monitored Device: $DEVICE"
    echo "Monitoring Duration: ${DURATION} seconds"
    echo
    
    # Calculate averages and peaks
    tail -n +2 $RESULT_FILE | awk -F',' '
    BEGIN {
        read_iops_sum = 0; write_iops_sum = 0;
        read_mbs_sum = 0; write_mbs_sum = 0;
        util_sum = 0; await_sum = 0;
        count = 0;
        max_read_iops = 0; max_write_iops = 0;
        max_util = 0; max_await = 0;
    }
    {
        read_iops_sum += $2; write_iops_sum += $3;
        read_mbs_sum += $4; write_mbs_sum += $5;
        util_sum += $7; await_sum += $8;
        count++;
        
        if ($2 > max_read_iops) max_read_iops = $2;
        if ($3 > max_write_iops) max_write_iops = $3;
        if ($7 > max_util) max_util = $7;
        if ($8 > max_await) max_await = $8;
    }
    END {
        if (count > 0) {
            printf "Average Read IOPS: %.2f\n", read_iops_sum/count;
            printf "Average Write IOPS: %.2f\n", write_iops_sum/count;
            printf "Average Read Bandwidth: %.2f MB/s\n", read_mbs_sum/count;
            printf "Average Write Bandwidth: %.2f MB/s\n", write_mbs_sum/count;
            printf "Average Utilization: %.2f%%\n", util_sum/count;
            printf "Average Wait Time: %.2f ms\n", await_sum/count;
            printf "\nPeak Data:\n";
            printf "Peak Read IOPS: %.2f\n", max_read_iops;
            printf "Peak Write IOPS: %.2f\n", max_write_iops;
            printf "Peak Utilization: %.2f%%\n", max_util;
            printf "Peak Wait Time: %.2f ms\n", max_await;
        }
    }'
} >> $RESULT_FILE

echo "Performance analysis report has been added to the result file"

🔧 Enterprise Storage Optimization Solutions

Database Storage Optimization

MySQL I/O Optimization Configuration:

-- MySQL storage engine optimization configuration
-- InnoDB buffer pool optimization
SET GLOBAL innodb_buffer_pool_size = 8589934592;  -- 8GB
SET GLOBAL innodb_buffer_pool_instances = 8;

-- I/O related optimizations
SET GLOBAL innodb_io_capacity = 2000;
SET GLOBAL innodb_io_capacity_max = 4000;
SET GLOBAL innodb_read_io_threads = 8;
SET GLOBAL innodb_write_io_threads = 8;

-- Log optimization
SET GLOBAL innodb_log_file_size = 1073741824;     -- 1GB
SET GLOBAL innodb_log_buffer_size = 67108864;     -- 64MB
SET GLOBAL innodb_flush_log_at_trx_commit = 2;

-- Flush strategy optimization
SET GLOBAL innodb_flush_method = 'O_DIRECT';
SET GLOBAL innodb_flush_neighbors = 0;            -- SSD optimization

-- Concurrency optimization
SET GLOBAL innodb_thread_concurrency = 0;
SET GLOBAL innodb_concurrency_tickets = 5000;

Big Data Storage Optimization

Hadoop HDFS Optimization:

<!-- hdfs-site.xml optimization configuration -->
<configuration>
    <!-- Block size optimization -->
    <property>
        <name>dfs.blocksize</name>
        <value>268435456</value> <!-- 256MB -->
    </property>
    
    <!-- Replication count optimization -->
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    
    <!-- I/O optimization -->
    <property>
        <name>dfs.datanode.max.transfer.threads</name>
        <value>8192</value>
    </property>
    
    <property>
        <name>dfs.datanode.handler.count</name>
        <value>64</value>
    </property>
    
    <!-- Cache optimization -->
    <property>
        <name>dfs.client.read.shortcircuit</name>
        <value>true</value>
    </property>
    
    <property>
        <name>dfs.domain.socket.path</name>
        <value>/var/lib/hadoop-hdfs/dn_socket</value>
    </property>
</configuration>

💡 Summary and Best Practices

Core Principles of I/O Optimization

  1. Reduce I/O Operations: Batch operations, caching strategies, readahead mechanisms
  2. Increase Concurrency: Asynchronous I/O, multi-queue, parallel processing
  3. Optimize Access Patterns: Sequential access, aligned I/O, appropriate block sizes
  4. Select Appropriate File Systems: Choose based on workload characteristics
  5. Hardware Matching: SSD vs HDD, RAID configurations, network storage

Performance Optimization Checklist

System-level Optimization:

  • [ ] I/O scheduler selection and configuration
  • [ ] File system type and mount parameters
  • [ ] Page cache and readahead settings
  • [ ] Virtual memory parameter tuning
  • [ ] Storage device queue depth

Application-level Optimization:

  • [ ] Database buffer pool configuration
  • [ ] Application I/O pattern optimization
  • [ ] Batch operations and transaction optimization
  • [ ] Caching strategy implementation
  • [ ] Asynchronous I/O usage

Monitoring and Maintenance:

  • [ ] I/O performance metrics monitoring
  • [ ] Storage capacity planning
  • [ ] Performance bottleneck identification
  • [ ] Regular performance testing
  • [ ] Fault warning mechanisms

Linux file system and I/O optimization is a complex system engineering task that requires full-stack optimization from the kernel to applications. By deeply understanding the working principles of file systems and combining them with actual business scenarios, significant improvements in storage system performance and reliability can be achieved.

Follow the “Cloud and Digitalization” public account for more practical experience in Linux storage optimization.

This article is based on an analysis of the Linux 5.15.4 kernel source code, providing optimization solutions validated in production environments.

Leave a Comment