Click the above blue text to follow us
I/O multiplexing in Linux refers to a mechanism that monitors multiple file descriptors simultaneously, allowing programs to wait for multiple I/O events without blocking.
I/O multiplexing is primarily implemented through three system calls: select, poll, and epoll. Applications can monitor the state changes of multiple file descriptors, such as read, write, or exceptional states.
The core of multiplexing lies in handling multiple I/O requests through a single system call, reducing process switching and blocking, thus improving efficiency.
In traditional I/O models, when a process needs to read data from a file descriptor (such as a network socket, file, or pipe), it typically enters a blocking state until the data is ready.
This model is suitable for single I/O operations, but when handling multiple I/O sources, using blocking mode can lead to inefficiency, as the blocking of one I/O operation can suspend the entire application.
I/O multiplexing can avoid this problem, allowing applications to handle multiple I/O events simultaneously.
For example, in a high-concurrency network server, I/O multiplexing can monitor multiple client connection requests and data transmissions simultaneously without the need to create a separate thread or process for each client.
Application Scenarios:
- High-Concurrency Servers: I/O multiplexing is particularly suitable for servers that need to handle a large number of connections simultaneously, such as HTTP servers and WebSocket servers.
- Real-Time Data Processing: In data stream processing applications (such as log processing and data collection), multiplexing can efficiently manage inputs from multiple data sources.
- Graphical User Interfaces: GUI programs can also utilize multiplexing to handle multiple user events, such as keyboard input, mouse clicks, and window updates.
1
select() System Call
select() is a system call that performs I/O multiplexing operations, allowing programs to monitor the state changes of multiple file descriptors for efficient I/O operations.
Calling select() will block the process until a file descriptor becomes ready (can be read or written).
Its function prototype is as follows:
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);
Parameter Details:
- nfds: This parameter specifies the range of file descriptors to monitor, typically set to the maximum value of all monitored file descriptors plus 1. File descriptors start from 0, so the maximum file descriptor number needs to be incremented by 1.
- readfds: A pointer to a set of file descriptors used to detect if any file descriptors have become readable. If a file descriptor is readable, select() will return its ready state.
- writefds: A pointer to a set of file descriptors used to detect if any file descriptors have become writable. If a file descriptor is writable, select() will return its ready state.
- exceptfds: A pointer to a set of file descriptors used to detect if any file descriptors have encountered exceptions (such as out-of-band data). This does not indicate an error with the file descriptor but is used to detect certain unconventional situations.
- timeout: Used to set the blocking behavior of select(). It is a pointer to a struct timeval structure containing two members: seconds and microseconds. If timeout is set to NULL, select() will block indefinitely until a file descriptor becomes ready. If all file descriptors are not ready and the timeout limit is exceeded, select() will return 0.
In the select() function, readfds, writefds, and exceptfds are pointers to fd_set types, representing sets of file descriptors.
The fd_set data type is internally implemented as a bitmask to store multiple file descriptors, but users do not need to understand its specific implementation details.
Linux provides four macros to manipulate these file descriptor sets:
- FD_ZERO(fd_set *set): Initializes the file descriptor set, clearing it.
- FD_SET(int fd, fd_set *set): Adds the file descriptor fd to the set.
- FD_CLR(int fd, fd_set *set): Removes the file descriptor fd from the set.
- FD_ISSET(int fd, fd_set *set): Checks if the file descriptor fd is in the set, returning true if it is.
For example:
fd_set fset;FD_ZERO(&fset); // Initialize the setFD_SET(3, &fset); // Add file descriptor 3FD_SET(4, &fset); // Add file descriptor 4FD_SET(5, &fset); // Add file descriptor 5
Return Value Details:
- Return value of -1: Indicates an error occurred, and errno will be set. Common errors include:
- EBADF: An invalid file descriptor in the set.
- EINTR: The system call was interrupted by a signal.
- EINVAL: The nfds parameter is invalid.
- ENOMEM: Insufficient system memory.
- Return value of 0: Indicates that no file descriptors were ready within the specified timeout.
- Positive integer return value: Indicates that one or more file descriptors are ready. The return value is the number of ready file descriptors.
Considerations when using select():
- Maximum capacity limit of file descriptor sets: The capacity of fd_set is limited by the constant FD_SETSIZE. In Linux systems, the default value is 1024. If the number of file descriptors to monitor exceeds 1024, consider using poll() or epoll().
- Reinitialize file descriptor sets on repeated calls to select(): The file descriptor set must be reinitialized and set before each call to select(), otherwise the results of select() may be incorrect.
- Two ways to handle timeouts: Setting timeout to NULL indicates indefinite blocking until a file descriptor is ready. Setting both members of the struct timeval pointed to by timeout to 0 indicates non-blocking mode, meaning it will return results immediately.
Here is a simple example demonstrating how to use select() to detect the readability of standard input:
int main() { fd_set readfds; struct timeval timeout; int ret; FD_ZERO(&readfds); FD_SET(STDIN_FILENO, &readfds); // Monitor standard input (file descriptor 0) timeout.tv_sec = 5; // Timeout of 5 seconds timeout.tv_usec = 0; ret = select(STDIN_FILENO + 1, &readfds, NULL, NULL, &timeout); if (ret == -1) { perror("select error"); } else if (ret == 0) { printf("Timeout: No data within 5 seconds.\n"); } else { if (FD_ISSET(STDIN_FILENO, &readfds)) { printf("Data is available on standard input.\n"); } } return 0;}
In this example, select() monitors the readability of standard input and waits for up to 5 seconds. If data is readable during this time, it returns success; otherwise, it returns a timeout.
Although select() is useful in many scenarios, it also has its limitations:
- Performance issues: When monitoring a large number of file descriptors, select() is less efficient because it requires traversing each file descriptor.
- File descriptor count limit: FD_SETSIZE limits the maximum number of file descriptors that can be monitored.
- Reinitialization: The file descriptor set must be reinitialized on each call.
2
poll() System Call
poll() provides a way to perform I/O multiplexing similar to select(), but with differences in interface and usage.
poll() uses an array of struct pollfd types to monitor the readiness of file descriptors.
Its prototype is as follows:
int poll(struct pollfd *fds, nfds_t nfds, int timeout);
Parameter Explanation:
- fds: A pointer to an array of struct pollfd types, where each element represents a file descriptor and the events of interest.
- nfds: Specifies the number of elements in the array. The type nfds_t is an unsigned integer.
- timeout: Determines the blocking behavior of poll(), in milliseconds, with the following rules:
- timeout = -1: Blocks indefinitely until a file descriptor is ready or a signal is caught (similar to select() with timeout set to NULL).
- timeout = 0: Non-blocking call, checks the state of file descriptors once.
- timeout > 0: Blocks for up to timeout milliseconds, returning if a timeout occurs.
The pollfd structure is defined as follows:
struct pollfd { int fd; /* File descriptor */ short events; /* Requested events */ short revents; /* Returned events */};
- fd: The file descriptor. Setting it to a negative value can ignore this file descriptor.
- events: Indicates the types of events we are interested in, using bitmasking.
- revents: Set by the kernel, indicating the actual events that occurred.
The events and revents fields support various flags, with common flags and their descriptions listed below:
These flags can be combined using bitwise OR operations, for example, events = POLLIN | POLLOUT, indicating simultaneous monitoring of readable and writable events.
The following example demonstrates how to use poll() to monitor the readable events of a file descriptor:
int main() { struct pollfd fds[1]; fds[0].fd = 0; // Standard input fds[0].events = POLLIN; // Monitor readable events int timeout = 5000; // 5 seconds timeout int ret = poll(fds, 1, timeout); if (ret == -1) { perror("poll"); return 1; } else if (ret == 0) { printf("Timeout: No data readable.\n"); } else { if (fds[0].revents & POLLIN) { printf("Data is available on standard input.\n"); } } return 0;}
Advantages and Limitations of poll():
- Advantages: poll() can monitor a large number of file descriptors simultaneously, and the array form of struct pollfd makes it easier to dynamically adjust the set of file descriptors.
- Limitations: Similar to select(), poll()’s performance also decreases when handling a large number of file descriptors, as it requires a linear scan of the entire array to find ready file descriptors.
Considerations:
- The return value of poll() indicates the number of ready file descriptors, but specific events must be checked through the revents field.
- If a file descriptor is set to a negative value in poll(), that entry will be ignored, suitable for dynamically adjusting the monitoring list at runtime.
3
epoll System Call
epoll is an efficient I/O multiplexing mechanism designed for handling a large number of file descriptors, and is an improved version of poll and select.
The main advantage of epoll lies in its performance support for large-scale concurrent connections and the efficiency of event notifications.
epoll was introduced in Linux kernel version 2.5.44 and is only available on Linux systems.
epoll adopts an event-driven model and consists of three main system calls:
- epoll_create or epoll_create1: Creates an epoll instance.
- epoll_ctl: Adds, deletes, or modifies file descriptors in the epoll instance.
- epoll_wait: Waits for events to occur and returns a list of ready file descriptors.
3.1, epoll_create / epoll_create1 System Call
These functions are used to create a new epoll instance.
int epoll_create(int size);
Parameter: The size parameter is a hint for the initial number of file descriptors, but it has been deprecated and no longer has practical significance.
int epoll_create1(int flags);
Parameter: The flags parameter can typically be 0 or EPOLL_CLOEXEC, the latter setting the close-on-exec flag for the file descriptor.
On success, these functions return a new epoll file descriptor; on failure, they return -1.
3.2, epoll_ctl System Call
Used to manage file descriptors in the epoll instance. The function prototype is as follows:
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
Parameter Explanation:
- epfd: The epoll instance file descriptor returned by epoll_create.
- op: The operation type, which can be one of the following three values:
- EPOLL_CTL_ADD: Adds a new file descriptor to the epoll instance.
- EPOLL_CTL_MOD: Modifies the events of an existing file descriptor.
- EPOLL_CTL_DEL: Deletes a file descriptor from the epoll instance.
- fd: The file descriptor to operate on.
- event: A pointer to the epoll_event structure, specifying the events of interest.
struct epoll_event is defined as follows:struct epoll_event { uint32_t events; /* Events to listen for */ epoll_data_t data; /* Associated user data */};
Parameter Explanation:
- events: Indicates the types of events of interest, which can be a combination of one or more of the following flags:
- EPOLLIN: Data is readable.
- EPOLLOUT: Data is writable.
- EPOLLERR: An error occurred.
- EPOLLET: Enables edge-triggered mode. Notifies only on state changes, more efficient but requires processing all data at once.
- EPOLLONESHOT: Automatically removes the event after it is triggered once.
- data: Can be a file descriptor or custom data used to identify the source of the event.
3.3, epoll_wait System Call
Used to wait for events on file descriptors. The function prototype is as follows:
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
Parameter Explanation:
- epfd: The epoll instance file descriptor.
- events: A pointer to an array of epoll_event structures used to store ready events.
- maxevents: The size of the events array, indicating the maximum number of events to return.
- timeout: Specifies the timeout period for waiting, in milliseconds.
- timeout = -1: Waits indefinitely until an event occurs.
- timeout = 0: Returns immediately, does not block.
The return value is the number of triggered events, -1 indicates an error.
The following demonstrates a basic example of using epoll to monitor standard input:
#define MAX_EVENTS 5int main() { int epfd = epoll_create1(0); if (epfd == -1) { perror("epoll_create1"); exit(EXIT_FAILURE); } struct epoll_event event; event.events = EPOLLIN; // Monitor readable events event.data.fd = STDIN_FILENO; // Standard input if (epoll_ctl(epfd, EPOLL_CTL_ADD, STDIN_FILENO, &event) == -1) { perror("epoll_ctl: STDIN_FILENO"); exit(EXIT_FAILURE); } struct epoll_event events[MAX_EVENTS]; int timeout = 10000; // 10 seconds timeout int nfds = epoll_wait(epfd, events, MAX_EVENTS, timeout); if (nfds == -1) { perror("epoll_wait"); exit(EXIT_FAILURE); } for (int i = 0; i < nfds; i++) { if (events[i].data.fd == STDIN_FILENO) { printf("Data is available on standard input.\n"); } } close(epfd); return 0;}
The advantages of epoll:
- High Performance: By using the kernel’s event notification mechanism, it avoids traversing the entire file descriptor set.
- Edge Triggered Support: Improves event handling efficiency in high-concurrency situations.
Applicable Scenarios:
- Network servers, especially in high-concurrency scenarios.
- Multitasking systems with a large number of I/O operations.
In summary, epoll is more efficient than select and poll, suitable for large-scale I/O concurrency applications, and provides flexible event control capabilities.
Click to read the original text for more exciting content~