Understanding the Relationship Between Sockets and TCP/IP

Fact One: File Descriptor Limits

Friends who write network server programs in Linux surely know that each TCP connection occupies a file descriptor. Once this file descriptor is exhausted, a new connection attempt will return the error “Socket/File: Can’t open so many files”.

At this point, you need to understand the operating system’s limit on the maximum number of open files.

Process Limit

Executing ulimit -n outputs 1024, which means a process can open at most 1024 files. Therefore, with this default configuration, you can have at most thousands of concurrent TCP connections.
Temporary modification: ulimit -n 1000000, but this temporary modification only applies to the current logged-in user’s environment and will be lost after a system reboot or user logout.
Permanent modification: Edit the /etc/security/limits.conf file, changing the content to

* soft nofile 1000000

* hard nofile 1000000

Global Limit

Executing cat /proc/sys/fs/file-nr outputs 9344 0592026, which are: 1. the number of allocated file handles, 2. the number of allocated but unused file handles, 3. the maximum number of file handles. However, in kernel version 2.6, the second value is always 0, which is not an error; it actually means that all allocated file descriptors are in use.
We can increase this value by modifying the /etc/sysctl.conf file with root permissions:

fs.file-max = 1000000

net.ipv4.ip_conntrack_max = 1000000

net.ipv4.netfilter.ip_conntrack_max = 1000000

Fact Two: Port Number Range Limit?

On operating systems, port numbers below 1024 are reserved by the system, while those from 1024 to 65535 are available for user use. Since each TCP connection occupies a port number, we can have over 60,000 concurrent connections. I believe many friends have had this misunderstanding (I used to think this way too).

Let’s analyze it.

How to identify a TCP connection: The system uses a 4-tuple to uniquely identify a TCP connection: {local ip, local port, remote ip, remote port}. Well, let’s refer to Chapter 4 of UNIX Network Programming: Volume One for the explanation of accept, where the second parameter cliaddr represents the client’s IP address and port number. As the server, we actually only use this one port when binding, which indicates that port number 65535 is not a limit on concurrency.
Maximum TCP connections for the server: The server usually listens on a fixed local port, waiting for client connection requests. Without considering address reuse (the UNIX SO_REUSEADDR option), even if the server has multiple IPs, the local listening port is exclusive. Therefore, in the server’s TCP connection 4-tuple, only remote ip (the client IP) and remote port (the client port) are variable. Thus, the maximum TCP connections equal the number of client IPs multiplied by the number of client ports. For IPv4, disregarding IP address classification and other factors, the maximum number of TCP connections is approximately 2³² (number of IPs) × 2¹⁶ (number of ports), meaning the maximum TCP connections for a single server is about 2⁴⁸.

To write network programs, one must use sockets, which every programmer knows. Moreover, during interviews, we often ask if the candidate knows socket programming. Generally, many people will say that socket programming basically includes listen, accept, and send, write, etc. Yes, it’s just like common file operations; anyone who has written it will definitely know.For network programming, we often mention TCP/IP, as if other network protocols no longer exist. Regarding TCP/IP, we also know about TCP and UDP; the former ensures data correctness and reliability, while the latter allows data loss. Finally, we know that before establishing a connection, one must know the other party’s IP address and port number. Aside from that, ordinary programmers may not know much more, as this knowledge is often sufficient. At most, when writing service programs, we use multithreading to handle concurrent access.We also know the following facts:1. A specified port number cannot be shared by multiple programs. For example, if IIS occupies port 80, Apache cannot also use port 80.2. Many firewalls only allow packets to pass through specific target ports.3. When a service program listens on a certain port and accepts a connection request, a new socket is generated to handle that request.Thus, a question that has puzzled me for a long time arises. If a socket is created and bound to port 80, does that mean the socket occupies port 80? If so, then when it accepts a request, what port does the newly generated socket use (I always thought the system would assign it an available port)? If it’s an available port, it certainly isn’t port 80, so subsequent TCP packets’ target port wouldn’t be 80—firewalls would definitely block it! In fact, we can see that firewalls do not block such connections, and this is the most common way to handle connection requests. My confusion is why the firewall doesn’t block such connections? How does it determine that the connection was generated because of a connection to port 80?Later, I carefully studied the principles of the TCP/IP protocol stack and gained a deeper understanding of many concepts. For example, TCP and UDP both belong to the transport layer, which is built on top of the IP layer (network layer). The IP layer is mainly responsible for packet transmission between nodes (End to End); here, nodes refer to network devices, such as computers. Because the IP layer is only responsible for delivering data to nodes and cannot distinguish between different applications above, the TCP and UDP protocols add port information on top of that, thus the port identifies an application on a node. Apart from adding port information, the UDP protocol does not process the IP layer data in any way. The TCP protocol adds more complex transmission control, such as sliding data sending windows and acknowledgment and retransmission mechanisms to achieve reliable data transmission. Regardless of how stable the TCP data stream appears to the application layer, the underlying transmission consists of individual IP packets that need to be reassembled by the TCP protocol. Therefore, I have reason to suspect that firewalls do not have enough information to judge more details about TCP packets, apart from the IP address and port number. Moreover, we see that the so-called port is used to differentiate between different applications to ensure correct forwarding when different IP packets arrive. TCP/IP is merely a protocol stack, much like an operating system’s operation mechanism; it must be specifically implemented while also providing external operating interfaces. Just as operating systems provide standard programming interfaces, such as the Win32 programming interface, TCP/IP must also provide programming interfaces, which is the socket programming interface—so that’s how it is! In the socket programming interface, designers have introduced a very important concept: the socket. This socket is similar to a file descriptor; in fact, in BSD systems, it is stored in the same process descriptor table as file descriptors. This socket is actually an index indicating its position in the descriptor table. We have encountered this many times, like file descriptors, window handles, etc. These handles represent specific objects in the system, used as parameters in various functions to operate on specific objects—this is essentially a C language issue; in C++, this handle is actually the this pointer, which is the object pointer. Now we understand that sockets do not necessarily have a direct relationship with TCP/IP. The socket programming interface was designed to adapt to other network protocols as well. Therefore, the emergence of sockets merely facilitates the use of the TCP/IP protocol stack; it abstracts TCP/IP, forming several basic function interfaces, such as create, listen, accept, connect, read, and write, etc. Now we understand that if a program creates a socket and lets it listen on port 80, it is essentially declaring its occupation of port 80 to the TCP/IP protocol stack. Subsequently, all TCP packets targeting port 80 will be forwarded to that program (here, because the program uses the socket programming interface, it is first handled by the socket layer). The so-called accept function actually abstracts the TCP connection establishment process. The new socket returned by the accept function actually represents the connection created this time, and a connection includes two parts of information: one is the source IP and source port, and the other is the destination IP and destination port. Therefore, accept can produce multiple different sockets, and the destination IP and destination port in those sockets remain unchanged; only the source IP and source port vary. In this way, these sockets can all have the same destination port of 80, while the socket layer can still accurately distinguish the relationship between IP packets and sockets based on the source/destination pair, thus completing the encapsulation of operations on the TCP/IP protocol! At the same time, the firewall’s rules for processing IP packets are also clear and straightforward; there are no complex scenarios as previously imagined. Understanding that a socket is merely an abstraction for operating on the TCP/IP protocol stack, and not a simple mapping relationship, is crucial!

Software Testing Free Course Link: https://ke.qq.com/course/159919#tuin=ba4122

Songqin Network: www.songqinnet.com

WeChat Official Account: Songqin Software Academy

Software Testing Communication QQ Group: 642067188

Software Automation Testing Communication QQ Group: 398140461

Software Performance Testing Communication QQ Group: 348074292

Understanding the Relationship Between Sockets and TCP/IP

Fact One: File Descriptor Limits

Fact Two: Port Number Range Limit?

Related posts

Leave a Comment Cancel reply