In computer operating systems, disk I/O (Input/Output) is a key link in data read and write operations. For Linux systems, optimizing disk I/O performance is directly related to the overall response speed and data processing capability of the system.
This article will detail the read and write process of Linux disk I/O, including basic concepts, specific steps of read and write operations, and related optimization techniques.
Image source: bokeyuan@lidongdongdong~
Disk I/O refers to the process of data transfer between the disk and memory in a Linux system. It involves file read and write operations and is an important way for the operating system to interact with external storage devices.
In Linux, disk I/O operations are implemented through system calls (such as read() and write()), which are ultimately handled by the kernel.
1. Buffered I/O (Buffered I/O)
Buffered I/O is the most common disk I/O operation method in Linux. In this method, data is first read into the kernel’s page cache, and then copied from the page cache to the user space. Write operations are the opposite; data is first written to the page cache, and then the kernel decides when to flush it to the disk.
(1) Read Operation Process:
If it exists, directly copy data from the page cache to the user space.
If it does not exist, read data from the disk into the page cache via DMA (Direct Memory Access), and then copy data from the page cache to the user space.
The application calls the read() system call to request data reading.
The kernel checks whether the required data exists in the page cache.
(2) Write Operation Process:
The application calls the write() system call to request data writing.
Data is copied from the user space to the page cache.
The kernel decides when to flush the data from the page cache to the disk (usually through a background flush thread).
2. Direct I/O (Direct I/O)
Direct I/O is a disk I/O operation method that bypasses the kernel page cache. In this method, data is transferred directly between the user space buffer and the disk, reducing the number of copies of data between kernel space and user space.
(1) Read Operation Process:
The application opens the file with the O_DIRECT flag.
The read() system call is invoked, and data is directly copied from the disk to the user space buffer via DMA.
(2) Write Operation Process:
The application opens the file with the O_DIRECT flag.
The write() system call is invoked, and data is directly written from the user space buffer to the disk via DMA.
3. Memory Mapping with mmap
mmap is a method for memory-mapping files that maps the contents of a file into the address space of a process. Through mmap, applications can directly manipulate file contents via pointers without making explicit read() and write() system calls.
Operation Process:
The application calls the mmap() system call to map the file into the process’s address space.
The application directly manipulates the mapped memory area via pointers to read and write files.
When the file contents are modified, the kernel will flush them to the disk at the appropriate time (through a copy-on-write mechanism).
1. Adjusting File System Mount Options:
Use noatime and nodiratime options to reduce disk writes.
2. Choosing the Right I/O Scheduler:
Select the appropriate I/O scheduler (such as noop, deadline, cfq, mq-deadline, etc.) based on workload.
3. Adjusting Read-Ahead Cache Size:
Optimize sequential I/O read performance by adjusting the read_ahead_kb parameter.
4. Using Faster Storage Devices:
Consider using NVMe SSDs, RAID configurations, or distributed storage systems (such as Ceph, GlusterFS) to improve I/O performance.
5. Application Layer Caching:
Use application layer caching (such as Redis, Memcached) to reduce database access times.
The disk I/O read and write process in Linux involves collaboration among multiple layers and components, including user space, kernel space, file systems, the generic block layer, and device layers.
By understanding how these layers and components work, we can better optimize the disk I/O performance of Linux systems.
Performance Analysis Tools in Linux (Disk I/O)
Performance Analysis Tools and Methods in Linux
Python Concurrency: How to Choose the Right Approach Based on the Scenario