Mastering Linux File I/O: In-Depth Exploration and Practical Skills from Open to Write!

Linux | Red Hat Certified | IT Technology | Operations Engineer

👇 Join our 1000-member tech exchange QQ group, note 【public account】 for faster approval

1. Classic Review of C File Interfaces

When using C, we typically use functions like fopen, fwrite, fread, and fclose to access files.

1.2 fwrite

For example, I need to write some information to a file:

#include <stdio.h>
#include <string.h>
int main(){
    FILE* fp = fopen("test.txt","w");
    if(fp == NULL)
    {
        perror("fopen failed");
        return 1;
    }
    const char* str = "i am yui~\n";
    int len = strlen(str);
    int num = 5;
    while(num--)
    {
        fwrite(str,len,1,fp);
    }
    fclose(fp);
    return 0;
}

Execution Result:

ubuntu@VM-20-9-ubuntu:~/FILETEST$ cat test.txt
i am yui~
i am yui~
i am yui~
i am yui~
i am yui~

1.2 fread

Now let’s read the content from the file:

#include <stdio.h>
#include <string.h>
int main(){
    FILE* fp = fopen("test.txt","r");
    if(!fp)
    {
        perror("fopen failed");
        return 1;
    }
    const char* str = "i am yui~\n";
    char s[1024];
    int len = strlen(str);
    while(1)
    {
        ssize_t n  = fread(s,1,len,fp);
        if(n == len)
        {
            s[len] = 0;
            printf("%s",s);
        }
        if(feof(fp))
            break;
    }
    return 0;
}

Output Result:

i am yui~
i am yui~
i am yui~
i am yui~
i am yui~

2. System File I/O

In addition to using the above C interfaces, we can also use system interfaces to access files.

System File I/O refers to reading and writing files at the operating system level. In Linux and other Unix-like systems, system file I/O is typically done through system calls. Compared to the C standard library’s file I/O functions (like fopen, fread, fwrite), system file I/O provides lower-level control and higher efficiency, but the operations are slightly more complex.

To better understand system file I/O, I will implement the above functionality using system interfaces and explain it.

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <fcntl.h>
#include <sys/stat.h>
int main(){
    umask(0);//Remove restrictions to prevent subsequent interference
    int fd  = open("myfile",O_WRONLY|O_CREAT,0644);
    if(fd<0)
    {
        perror("open failed");
        return 1;
    }
    int num = 5;
    const char* str = "i am yui\n";
    int len = strlen(str);
    while(num--)
    {
        write(fd,str,len);
    }
    close(fd);
    return 0;
}

3. The open Function

Now that we need to use the system interface to open files, we will use the open function instead of fopen.

The open function is a system call in Unix and Unix-like operating systems used to open a file and return a file descriptor. This file descriptor is used for subsequent file operations such as reading, writing, and closing. Compared to the C standard library’s fopen function, open provides lower-level control and is more suitable for system-level programming.

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open(const char *pathname, int flags, mode_t mode);

3.1 Parameter Introduction:

pathname: The path of the file to be opened.
flags: Flags that specify the file open mode and behavior, determining how the file is opened.
mode: The permission mask for new files, effective only when creating files with the O_CREAT flag, specifying the file's access permissions.

We need to discuss the flags in detail.

Access Modes (must include one):

O_RDONLY: Open the file in read-only mode.
O_WRONLY: Open the file in write-only mode.
O_RDWR: Open the file in read-write mode.
Only one of O_RDONLY, O_WRONLY, and O_RDWR can be selected, controlling the basic read and write permissions of the file.

File Creation and Control:

O_CREAT: Create the file if it does not exist. This flag is often used in conjunction with the mode parameter to specify file permissions.
O_EXCL: Must be used in combination with O_CREAT. If the file already exists, it returns an error to avoid duplicate creation. This combination is often used to create unique files.
O_TRUNC: If the file exists and is opened in write mode (O_WRONLY or O_RDWR), the file length will be truncated to 0.
O_APPEND: Append mode, when writing, the file pointer will automatically move to the end of the file, suitable for scenarios like logging where appending is required.

Non-blocking and Synchronization Control:

O_NONBLOCK: Open the file in non-blocking mode. Effective for special files (like device files), suitable for scenarios that require immediate results.
O_SYNC: Synchronous write mode, ensuring data is immediately written to disk. Each write operation will not be cached in memory but directly flushed to the storage device, suitable for scenarios with high data persistence requirements.
O_DSYNC: Data synchronization, similar to O_SYNC, but only synchronizes data without including file metadata (like last modified time).
O_RSYNC: Synchronous read mode, similar to O_SYNC, but affects read operations.

We need to select appropriate functions to perform operations because the underlying implementation uses state compression through bitwise operations (bit masks), allowing each flag to be independently set or cleared without needing to store each combination separately.

For the writing operation below, we only need to select O_WRONLY|O_CREAT.

3.2 Return Value (File Descriptor)

On success, open returns a file descriptor (a non-negative integer) for subsequent file operations.

On failure, it returns -1 and sets errno to indicate the reason for the error.

This return value is also significant,

The file descriptor (File Descriptor, FD) is an integer assigned by the operating system to represent each open file or I/O resource. In Unix and Unix-like systems (like Linux), the file descriptor serves as a bridge for processes and the kernel to perform file or resource operations; nearly all I/O operations are completed through file descriptors.

A file descriptor is a non-negative integer, and each process has a file descriptor table to manage file descriptors. When a file is opened, the operating system assigns a file descriptor to identify that file. This file descriptor can be used for subsequent read, write, and close operations. File descriptors can represent not only files but also other I/O resources, such as pipes, network sockets, device files, etc.

Each process typically has three default file descriptors known as standard file descriptors:

Standard Input (stdin): File descriptor 0, used for reading data from the user or input source.
Standard Output (stdout): File descriptor 1, used for outputting data to the terminal or output source.
Standard Error (stderr): File descriptor 2, used for outputting error messages to the terminal.

0, 1, and 2 correspond to the physical devices: keyboard, monitor, monitor.

With this understanding, we can directly use file descriptors to output data to the display.

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
int main(){
    char buf[1024];
    ssize_t s = read(0,buf,sizeof(buf));//Read data from the keyboard
    if(s>0)
    {
        buf[s] = 0;
        write(1,buf,strlen(buf));
        write(2,buf,strlen(buf));
    }
    return 0;
}//Execution Result: 
/**ubuntu@VM-20-9-ubuntu:~/FILETEST$ ./a.out hello worldhello worldhello world*/

From this code, we can further clarify the concept that everything is a file in Linux.

Some Low-Level Knowledge:

The file descriptor is a small integer starting from 0. When we open a file, the operating system creates a corresponding data structure in memory to describe the target file, resulting in the file structure that represents an already opened file object. When a process executes the open system call, it must associate the process with the file. Each process has a pointer *file pointing to a table file_struct, the most important part of which contains an array of pointers, each element pointing to a file. Essentially, the file descriptor is the index of this array, so as long as we have the file descriptor, we can find the corresponding file.

3.2.1 File Descriptor Allocation Rules

First, let’s look at the code:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
int main(){
    int fd = open("myfile",O_RDONLY);//Open in read-only mode
    if(fd>0)
    {
        printf("%d\n",fd);
    }
    close(fd);
    return 0;
}//Execution Result: 
/*ubuntu@VM-20-9-ubuntu:~/FILETEST$ ./a.out 3*/

The result is 3.

Now, what happens if we close file descriptor 0?

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
int main(){
    close(0);//Close file descriptor 0
    int fd = open("myfile",O_RDONLY);//Open in read-only mode
    if(fd<0)
    {
        perror("open");
        return 1;
    }
    printf("%d\n",fd);
    close(fd);
    return 0;
}//Print Result: 
/*ubuntu@VM-20-9-ubuntu:~/FILETEST$ ./a.out 0*/

The result is 0, did you guess it?

This shows the allocation rule for file descriptors: In the file_struct array, find the smallest index that is not currently in use to be the new file descriptor.

Finally, let’s look at redirection.

3.2.2 Redirection

Now we will close output flag 1 and then open a file and write something into it, let’s see what happens.

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
int main(){
    close(1);//Close file descriptor 1
    int fd = open("myfile",O_WRONLY|O_CREAT,0644);//Open in write mode
    if(fd<0)
    {
        perror("open");
        return 1;
    }
    printf("fd:%d\n",fd);
    fflush(stdout);
    close(fd);
    return 0;
}

When we open myfile, we find that the file contains fd:1.

In other words, the content that should have been displayed on the screen has been written into the myfile file. We call this phenomenon redirection. Common redirections are >, >>, <

The Essence of Redirection:

Mastering Linux File I/O: In-Depth Exploration and Practical Skills from Open to Write!

4. The write Function

The write function is a system call in Unix and Linux systems for file writing operations, used to write data from the user space buffer to files or devices (such as files, pipes, network sockets). Write is a low-level I/O operation that bypasses the standard I/O buffer and directly writes to the target pointed to by the file descriptor, commonly used for raw data reading and writing of system resources.

Syntax:

ssize_t write(int fd, const void *buf, size_t count);

Parameter Description:

fd: The file descriptor indicating the target file or device to write to (e.g., STDOUT_FILENO indicates standard output).
buf: Buffer pointer pointing to the data to be written.
count: The number of bytes to write, specifying how many bytes to read from buf to write to fd.

Return Value:

On success, returns the actual number of bytes written (of type ssize_t). On failure, returns -1 and sets the errno variable to indicate the reason for the error.

5. The read Function

The read function is a system call in Unix and Linux systems used to read data from files or other input resources (such as pipes, network sockets, etc.) into a user-provided buffer. Similar to write, read directly obtains data from the file descriptor without going through the standard I/O buffer, suitable for low-level I/O operations.

Syntax:

ssize_t read(int fd, void *buf, size_t count);

Parameter Description:

fd: The file descriptor indicating the file or input resource to read from (e.g., STDIN_FILENO indicates standard input).
buf: Buffer pointer pointing to where the data will be stored after reading.
count: The expected number of bytes to read, i.e., the size of buf.

Return Value:

On success, returns the actual number of bytes read (of type ssize_t). If it returns 0, it indicates that the end of the file (EOF) has been reached. On failure, returns -1 and sets errno to indicate the reason for the error.

6. Summary

fopen, fclose, fread, fwrite are all functions from the C standard library, that is, library functions.

open, close, read, write are all interfaces provided by the system, that is, system call interfaces.

This part of the library functions will call the system interfaces.

It can be considered that the f* series of functions are all encapsulations of the system, facilitating secondary development.

For course inquiries, add: HCIE666CCIE

↑ Or scan the QR code above ↑

If you have any technical points or content you want to see

You can leave a message below to let me know!

Related posts

Leave a Comment Cancel reply