Linux C/C++: Processes and File Systems

Linux C/C++: Processes and File Systems

In the world of Unix-like operating systems, understanding processes and file handling is fundamental for every system programmer. Today, we will explore an interesting intersection of these two concepts: what happens to open file descriptors when a process forks? This topic may seem niche, but it significantly impacts how we design and implement multiprocess applications.

Linux C/C++: Processes and File Systems

Table of Contents

  1. 1. Introduction
  2. 2. Basics: Processes and File Descriptors
  3. 3. Fork and File Descriptors
  4. 4. Example 1: Writing to a File After Fork
  5. 5. Buffering Issues
  6. 6. Example 2: Using Low-Level I/O Functions
  7. 7. Example 3: Concurrent Reading After Fork
  8. 8. Best Practices and Considerations
  9. 9. Conclusion

Introduction

In Unix-like systems, when a process forks, it creates a child process that is almost identical to the parent process. But what happens to the open file descriptors? Are they shared between the parent and child processes? What occurs if one process closes a file descriptor? These questions will be explored in this article.

Basics: Processes and File Descriptors

Before diving into the specifics, let’s quickly review what processes and file descriptors are:

  • Process is an instance of a running program. It has its own memory space, system resources, and state.
  • File Descriptor is an abstract indicator used to access files or other input/output resources, such as pipes or network sockets.

In C, we directly manipulate file descriptors using functions like <span>open()</span>, <span>read()</span>, <span>write()</span>, and <span>close()</span>. Higher-level functions, such as <span>fopen()</span>, <span>fprintf()</span>, <span>fscanf()</span>, and <span>fclose()</span>, use FILE pointers, which internally utilize file descriptors.

Fork and File Descriptors

When a process forks, the child process inherits a copy of the parent process’s file descriptors. This means both the parent and child processes can access the same open files, pipes, or sockets. However, they each maintain their own file descriptor tables.

The key points to remember are:

  1. 1. File descriptors are inherited by the child process.
  2. 2. The position of the file descriptor (the next read/write location) is shared between the parent and child processes.
  3. 3. Closing a file descriptor in one process does not affect the other process.

Let’s explore these concepts through some examples.

Example 1: Writing to a File After Fork

Let’s start with a simple example where we open a file, fork a process, and then have both the parent and child processes write to the file.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/wait.h>

int main() {
    int fd = open("test.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (fd == -1) {
        perror("open");
        exit(1);
    }

    pid_t pid = fork();

    if (pid == -1) {
        perror("fork");
        exit(1);
    } else if (pid == 0) {
        // Child process
        const char *child_msg = "Hello from child!\n";
        write(fd, child_msg, 19);
        close(fd);
        exit(0);
    } else {
        // Parent process
        wait(NULL);  // Wait for child process to finish
        const char *parent_msg = "Hello from parent!\n";
        write(fd, parent_msg, 20);
        close(fd);
    }

    return 0;
}

In this example, we:

  1. 1. Open a file named “test.txt” in write-only mode, creating it if it does not exist, and truncating it if it does.
  2. 2. Fork the process.
  3. 3. In the child process, write a message and close the file descriptor.
  4. 4. In the parent process, wait for the child process to finish, then write its own message and close the file descriptor.

If you run this program and then check the contents of “test.txt”, you will see:

Hello from child!
Hello from parent!

This indicates:

  • • Both processes can write to the file using the same file descriptor.
  • • Closing the file descriptor in the child process does not prevent the parent process from writing.
  • • The file position is shared, so the parent process’s write continues from where the child process left off.

Buffering Issues

When using higher-level I/O functions like <span>fprintf()</span>, we need to be aware of buffering issues. Let’s look at an example that illustrates this:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
    FILE *fp = fopen("buffered.txt", "w");
    if (fp == NULL) {
        perror("fopen");
        exit(1);
    }

    fprintf(fp, "Hello");  // This may be buffered

    pid_t pid = fork();

    if (pid == -1) {
        perror("fork");
        exit(1);
    } else if (pid == 0) {
        // Child process
        fprintf(fp, " from child!\n");
        fclose(fp);
        exit(0);
    } else {
        // Parent process
        wait(NULL);
        fprintf(fp, " from parent!\n");
        fclose(fp);
    }

    return 0;
}

If you run this program, you might be surprised to find that “buffered.txt” contains:

Hello from child!
Hello from parent!

“Hello” was duplicated because it was in the buffer at the time of the fork, resulting in both the parent and child processes having a copy in their respective buffers. To avoid this, we can use <span>fflush(fp)</span> before the fork to ensure the buffer’s contents are written to the file.

Example 2: Using Low-Level I/O Functions

To avoid buffering issues, we can use low-level I/O functions like <span>open()</span>, <span>read()</span>, and <span>write()</span>. Let’s modify the previous example:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/wait.h>

int main() {
    int fd = open("unbuffered.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (fd == -1) {
        perror("open");
        exit(1);
    }

    write(fd, "Hello", 5);

    pid_t pid = fork();

    if (pid == -1) {
        perror("fork");
        exit(1);
    } else if (pid == 0) {
        // Child process
        write(fd, " from child!\n", 13);
        close(fd);
        exit(0);
    } else {
        // Parent process
        wait(NULL);
        write(fd, " from parent!\n", 14);
        close(fd);
    }

    return 0;
}

Now, when you run this program and check “unbuffered.txt”, you will see:

Hello from child! from parent!

This output indicates:

  1. 1. The initial “Hello” was written only once.
  2. 2. Both the child and parent processes continued writing from the file pointer’s location.
  3. 3. There were no buffering issues, so the data was not duplicated.

Example 3: Concurrent Reading After Fork

Now, let’s explore what happens when both the parent and child processes attempt to read from the same file simultaneously. This example will read byte by byte from the file to emphasize the interleaving of reads:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/wait.h>

#define BUFFER_SIZE 1024

int main() {
    int fd = open("input.txt", O_RDONLY);
    if (fd == -1) {
        perror("open");
        exit(1);
    }

    char buffer[BUFFER_SIZE];
    int bytes_read = 0;

    pid_t pid = fork();

    if (pid == -1) {
        perror("fork");
        exit(1);
    } else if (pid == 0) {
        // Child process
        while (read(fd, buffer + bytes_read, 1) == 1) {
            bytes_read++;
            if (bytes_read == BUFFER_SIZE - 1) break;
        }
        buffer[bytes_read] = '\0';
        printf("Child read: %s\n", buffer);
        close(fd);
        exit(0);
    } else {
        // Parent process
        while (read(fd, buffer + bytes_read, 1) == 1) {
            bytes_read++;
            if (bytes_read == BUFFER_SIZE - 1) break;
        }
        buffer[bytes_read] = '\0';
        printf("Parent read: %s\n", buffer);
        wait(NULL);
        close(fd);
    }

    return 0;
}

If you run this program with an input file containing some text, you will notice:

  1. 1. The child and parent processes read interleaved.
  2. 2. Neither process gets a complete copy of the file.
  3. 3. The file position is shared between the two processes, so each read advances the file position for both.

This behavior can lead to race conditions, which are typically undesirable in real applications. If you need both processes to read the entire file, it is best to open the file separately in each process or read the entire file before forking.

Best Practices and Considerations

Based on what we have learned, here are some best practices and considerations when using file descriptors and fork:

  1. 1. Be aware of buffering issues: If you are using buffered I/O (like <span>fprintf</span>), ensure to flush the buffer before forking to avoid duplication.
  2. 2. Close unnecessary file descriptors: In the child process, close any file descriptors that are not needed. This is especially important for long-running processes to avoid resource leaks.
  3. 3. Use separate file descriptors for concurrent access: If the parent and child processes need to read or write to the same file independently, it is usually best to open the file separately in each process.
  4. 4. Be cautious of shared file positions: Remember that the file position is shared between parent and child processes. If both processes read from or write to the file simultaneously, this can lead to race conditions.
  5. 5. Consider using memory-mapped files: For large files that need to be accessed by multiple processes, consider using memory-mapped files (<span>mmap</span>) instead of traditional file I/O.
  6. 6. Use appropriate flags when opening files: Depending on your needs, you may want to use flags like <span>O_APPEND</span> to ensure atomic writes to the end of the file.
  7. 7. Be aware of the differences between system calls and library functions: System calls like <span>open()</span>, <span>read()</span>, and <span>write()</span> work directly with file descriptors and do not involve user-space buffering, while library functions like <span>fopen()</span>, <span>fprintf()</span>, and <span>fscanf()</span> use buffering for efficiency.

Conclusion

Understanding the behavior of file descriptors when a process forks is crucial for writing robust multiprocess applications in Unix-like systems. We have seen that while file descriptors are inherited by the child process, they each maintain their own file descriptor tables. This allows for flexible and powerful designs but also requires careful consideration to avoid issues such as buffering problems or unintended sharing of file positions.

By using appropriate I/O functions, managing buffers correctly, and being mindful of shared file positions, you can effectively handle files in multiprocess applications. Remember, the key is to understand the behavior of your system calls and library functions and design your applications accordingly.

As you continue to work with processes and file I/O in C, you will encounter many more interesting scenarios and edge cases. Always thoroughly test your code, especially when dealing with concurrent access to shared resources like files. Happy coding!

Leave a Comment