10 Essential TCP/IP Questions Explained with 15 Images

Source: https://juejin.im/post/6844903490595061767

Author: Ruheng

1. TCP/IP Model

The TCP/IP protocol model (Transmission Control Protocol/Internet Protocol) includes a series of network protocols that form the foundation of the Internet and is the core protocol of the Internet.

The reference model based on TCP/IP divides the protocols into four layers: Link Layer, Network Layer, Transport Layer, and Application Layer. The image below shows the correspondence between the TCP/IP model and the OSI model layers.

The TCP/IP protocol suite is layered from top to bottom, with each layer encapsulating the one below it. The top layer is the Application Layer, which contains familiar protocols like HTTP and FTP. The second layer is the Transport Layer, where the well-known TCP and UDP protocols reside. The third layer is the Network Layer, where the IP protocol is located, responsible for adding IP addresses and other data to determine the transmission target. The fourth layer is the Data Link Layer, which adds an Ethernet protocol header to the data to be transmitted and performs CRC encoding to prepare for the final data transmission.

The above image clearly shows the role of each layer in the TCP/IP protocol, while the communication process of the TCP/IP protocol corresponds to the process of data being pushed onto and popped off the stack. During the push process, the sender encapsulates headers and trailers at each layer, adding transmission information to ensure delivery to the destination. During the pop process, the receiver removes headers and trailers at each layer to obtain the final transmitted data.

The above image uses the HTTP protocol as an example to explain in detail.

2. Data Link Layer

The Physical Layer is responsible for the conversion between the 0 and 1 bit stream and the physical device’s voltage levels and light flashes. The Data Link Layer is responsible for dividing the 0 and 1 sequence into data frames for transmission from one node to a neighboring node, uniquely identified by MAC addresses (MAC, physical address, a host has one MAC address).

Framing: Encapsulating the network layer datagram with headers and trailers into frames, where the frame header includes the source and destination MAC addresses.
Transparent transmission: Zero bit padding, escape characters.
Reliable transmission: Rarely used on low-error-rate links, but wireless links (WLAN) will ensure reliable transmission.
Error detection (CRC): The receiver detects errors, and if an error is found, the frame is discarded.

3. Network Layer

1. IP Protocol

The IP protocol is the core of the TCP/IP protocol, and all TCP, UDP, ICMP, and IGMP data are transmitted in IP data format. It is important to note that IP is not a reliable protocol, meaning it does not provide a mechanism for handling undelivered data; this is the responsibility of higher-layer protocols such as TCP or UDP.

1.1 IP Address

In the Data Link Layer, we generally identify different nodes using MAC addresses, while in the IP layer, we also need a similar address identifier, which is the IP address.

A 32-bit IP address is divided into a network part and a host part, which reduces the number of entries in the routing table of routers. With the network address, all terminals with the same network address can be confined to the same range, so the routing table only needs to maintain one entry for that network address to find the corresponding terminals.

Class A IP Address: 0.0.0.0~127.0.0.0 Class B IP Address: 128.0.0.1~191.255.0.0 Class C IP Address: 192.168.0.0~239.255.255.0

1.2 IP Protocol Header

Here we introduce only the eight-bit TTL field. This field specifies how many routers the data packet can pass through before being discarded. Each time an IP data packet passes through a router, the TTL value decreases by 1, and when the TTL reaches zero, it is automatically discarded.

The maximum value of this field is 255, meaning a protocol packet can traverse a router 255 times before being discarded. Depending on the system, this number can vary, typically being 32 or 64.

2. ARP and RARP Protocols

ARP is a protocol for obtaining the MAC address based on the IP address.

ARP (Address Resolution Protocol) is a resolution protocol; originally, the host has no knowledge of which interface corresponds to the IP. When a host wants to send an IP packet, it first checks its ARP cache (which is an IP-MAC address mapping cache).

If the queried IP-MAC pair does not exist, the host will send an ARP broadcast packet to the network, which contains the IP address to be queried. All hosts that receive this broadcast will check their IP addresses, and if any host finds it matches, it will prepare an ARP packet containing its MAC address to send back to the host that sent the ARP broadcast.

After receiving the ARP packet, the broadcasting host will update its ARP cache (which stores the IP-MAC mapping). The broadcasting host will then use the new ARP cache data to prepare for sending the data link layer packet.

The RARP protocol works oppositely and will not be elaborated here.

3. ICMP Protocol

The IP protocol is not a reliable protocol; it does not guarantee data delivery. Therefore, the task of ensuring data delivery should be handled by other modules, one of which is the ICMP (Internet Control Message Protocol). ICMP is not a high-level protocol but an IP layer protocol.

When errors occur in transmitting IP data packets, such as host unreachable or route unreachable, the ICMP protocol will encapsulate the error information and send it back to the host, giving the host a chance to handle the error. This is why protocols built on top of the IP layer can potentially be secure.

4. Ping

Ping can be said to be the most famous application of ICMP and is part of the TCP/IP protocol. Using the “ping” command, we can check whether the network is reachable, which helps analyze and determine network faults.

For example, when we cannot access a particular website, we usually ping the site. The ping command will echo back some useful information. General information is as follows:

The word ping originates from sonar positioning, and this program indeed functions similarly; it uses ICMP packets to detect whether another host is reachable. The principle is to send a request using ICMP with type code 0, and the host receiving the request will respond with ICMP of type code 8.

5. Traceroute

Traceroute is an important tool for detecting the routing situation between a host and the destination host, and it is also the most convenient tool.

The principle of Traceroute is very interesting; after receiving the destination host’s IP, it first sends a TTL=1 UDP packet to the destination host. The first router that receives this packet will automatically decrease the TTL by 1, and when the TTL becomes 0, the router will discard the packet and generate an unreachable ICMP message back to the host. The host receives this message and then sends a TTL=2 UDP packet to the destination host, triggering the second router to send an ICMP message back to the host. This process continues until the destination host is reached. Thus, Traceroute obtains the IPs of all routers.

6. TCP/UDP

TCP and UDP are both transport layer protocols, but they have different characteristics and different application scenarios. Below is a comparative analysis in tabular form.

Message-oriented

In message-oriented transmission, the application layer gives UDP the length of the message, and UDP sends it as is, sending one message at a time. Therefore, the application must choose an appropriate message size. If the message is too long, the IP layer needs to fragment it, reducing efficiency. If it is too short, it will be too small for IP.

Byte stream-oriented

In byte stream-oriented transmission, although the application interacts with TCP one data block (of varying size) at a time, TCP treats the application as a continuous stream of unstructured bytes. TCP has a buffer; when the data block sent by the application is too long, TCP can segment it into shorter pieces for transmission.

Regarding congestion control and flow control, these are key aspects of TCP, which will be explained later.

Some applications of TCP and UDP protocols

When Should TCP Be Used?

When there are quality requirements for network communication, for example: when all data must be accurately delivered to the other party, this is often used in applications that require reliability, such as file transfer protocols like HTTP, HTTPS, FTP, and email transfer protocols like POP and SMTP.

When Should UDP Be Used?

When the quality requirements for network communication are not high, and the speed of network communication is prioritized, UDP can be used.

7. DNS

DNS (Domain Name System) is a distributed database on the Internet that maps domain names to IP addresses, making it easier for users to access the Internet without having to remember IP address strings that machines can read directly. The process of obtaining the IP address corresponding to a hostname is called domain name resolution (or hostname resolution). The DNS protocol runs on top of the UDP protocol, using port number 53.

8. Establishing and Terminating TCP Connections

1. Three-Way Handshake

TCP is connection-oriented. Before either party sends data, a connection must be established between both parties. In the TCP/IP protocol, TCP provides reliable connection services, which are initialized through a three-way handshake. The purpose of the three-way handshake is to synchronize the sequence numbers and acknowledgment numbers of both parties and exchange TCP window size information.

First Handshake: Establishing the connection. The client sends a connection request segment, setting SYN to 1 and Sequence Number to x; then, the client enters the SYN_SEND state, waiting for the server’s confirmation;

Second Handshake: The server receives the SYN segment. The server must confirm the client’s SYN segment, setting Acknowledgment Number to x+1 (Sequence Number+1); at the same time, it sends its own SYN request, setting SYN to 1 and Sequence Number to y; the server combines all this information into one segment (i.e., SYN+ACK segment) and sends it to the client, entering the SYN_RECV state;

Third Handshake: The client receives the server’s SYN+ACK segment. It sets the Acknowledgment Number to y+1 and sends an ACK segment to the server. After this segment is sent, both the client and server enter the ESTABLISHED state, completing the TCP three-way handshake.

Why Three-Way Handshake?

To prevent a stale connection request segment from suddenly being sent to the server, causing errors.

A specific example of a “stale connection request segment” occurs in a situation where the client’s first connection request segment was not lost but lingered at some network node for a long time, causing it to arrive after the connection has been released. This is an already stale segment. However, when the server receives this stale connection request segment, it mistakenly believes it is a new connection request from the client.

Thus, it sends a confirmation segment to the client, agreeing to establish the connection. If the three-way handshake is not used, as soon as the server sends the confirmation, a new connection would be established. Since the client has not sent a connection request, it will not acknowledge the server’s confirmation and will not send data to the server. However, the server will think a new transport connection has been established and will wait for the client to send data. This would waste many of the server’s resources. The three-way handshake prevents this situation. For example, in the case described, the client would not respond to the server’s confirmation. Since the server does not receive confirmation, it knows the client did not request to establish a connection.

2. Four-Way Handshake

After the client and server establish a TCP connection through the three-way handshake, when data transmission is complete, the TCP connection must be terminated. This is where the mysterious “four-way handshake” comes into play.

First Handshake: Host 1 (which can be the client or server) sets the Sequence Number and sends a FIN segment to Host 2; at this point, Host 1 enters FIN_WAIT_1 state; this indicates that Host 1 has no data to send to Host 2;

Second Handshake: Host 2 receives the FIN segment sent by Host 1 and replies with an ACK segment, with the Acknowledgment Number equal to the Sequence Number plus 1; Host 1 enters FIN_WAIT_2 state; Host 2 informs Host 1 that it “agrees” to the shutdown request;

Third Handshake: Host 2 sends a FIN segment to Host 1, requesting to close the connection, while Host 2 enters LAST_ACK state;

Fourth Handshake: Host 1 receives the FIN segment sent by Host 2 and sends an ACK segment to Host 2; then Host 1 enters TIME_WAIT state; after Host 2 receives the ACK segment from Host 1, it closes the connection; at this point, Host 1 waits for 2MSL, and if it does not receive a response, it confirms that the server has closed normally, and Host 1 can also close the connection.

Why Four-Way Handshake?

The TCP protocol is a connection-oriented, reliable, byte-stream-based transport layer communication protocol. TCP operates in full-duplex mode, meaning that when Host 1 sends a FIN segment, it only indicates that Host 1 has no data to send; Host 1 informs Host 2 that it has finished sending all its data; however, at this point, Host 1 can still receive data from Host 2. When Host 2 sends back the ACK segment, it indicates that it knows Host 1 has no data to send, but Host 2 can still send data to Host 1. When Host 2 also sends a FIN segment, it indicates that Host 2 has no data to send, and it informs Host 1 that it also has no data to send, after which both will happily terminate the TCP connection.

Why Wait 2MSL?

MSL: Maximum Segment Lifetime, which is the longest time any segment can exist in the network before being discarded. There are two reasons for this:

To ensure that the TCP protocol’s full-duplex connection can be reliably closed.
To ensure that any duplicate segments from this connection disappear from the network.

The first point: If Host 1 directly enters the CLOSED state, due to the unreliability of the IP protocol or other network reasons, if Host 2 does not receive Host 1’s last ACK response, Host 2 will continue to send FIN after timing out. Since Host 1 is already CLOSED, it will not find a corresponding connection for the retransmitted FIN. Therefore, Host 1 does not directly enter CLOSED; it must remain in TIME_WAIT, ensuring that it receives the FIN again to confirm the other party received the ACK, thereby correctly closing the connection.

The second point: If Host 1 directly enters CLOSED and then initiates a new connection to Host 2, we cannot guarantee that the new connection’s port number is different from that of the closed connection. This means that the new connection may have the same port number as the old connection. Generally, this will not cause any issues, but special cases can occur: if the new connection and the already closed old connection have the same port number, and some delayed data from the previous connection arrives after the new connection is established, TCP will mistakenly consider that delayed data as belonging to the new connection, causing confusion with the actual new connection’s packets. Therefore, the TCP connection must remain in TIME_WAIT for 2MSL to ensure that all data from this connection disappears from the network.

9. TCP Flow Control

If the sender sends data too quickly, the receiver may not be able to keep up, leading to data loss. Flow control is about ensuring that the sender’s transmission rate is not too fast, allowing the receiver enough time to receive.

Using the sliding window mechanism can conveniently implement flow control on a TCP connection.

Assuming A sends data to B. During connection establishment, B informs A: “My receiving window is rwnd = 400” (where rwnd indicates the receiver window). Therefore, the sender’s sending window cannot exceed the value of the receiving window provided by the receiver. Note that the TCP window is measured in bytes, not segments. Assume each segment is 100 bytes long, and the initial value of the data segment sequence number is set to 1. Uppercase ACK indicates the acknowledgment bit in the header, while lowercase ack indicates the acknowledgment field value.

From the image, we can see that B performed three flow control actions. The first reduced the window to rwnd = 300, the second to rwnd = 100, and finally to rwnd = 0, indicating that the sender is no longer allowed to send data. This state of pausing the sender will continue until Host B sends a new window value.

TCP sets a persistent timer for each connection. Whenever one side of the TCP connection receives a zero window notification from the other side, it starts the persistent timer. If the timer expires, it sends a zero window probe segment (carrying 1 byte of data), and the party receiving this segment resets the persistent timer.

10. TCP Congestion Control

The sender maintains a congestion window cwnd (congestion window) as a state variable. The size of the congestion window depends on the degree of network congestion and changes dynamically. The sender sets its sending window equal to the congestion window.

The principle for controlling the congestion window is: as long as there is no congestion in the network, the congestion window increases, allowing more packets to be sent. However, if congestion occurs, the congestion window decreases to reduce the number of packets injected into the network.

Slow start algorithm:

When a host starts sending data, if it injects a large number of bytes into the network immediately, it may cause network congestion, as the load situation of the network is unknown. Therefore, a better method is to probe first, gradually increasing the sending window from small to large, meaning the congestion window is gradually increased.

Usually, when starting to send segments, the congestion window cwnd is set to a value of one maximum segment size (MSS). After receiving acknowledgment for a new segment, the congestion window is increased by at most one MSS. Using this method to gradually increase the sender’s congestion window cwnd allows packets to be injected into the network at a more reasonable rate.

With each transmission round, the congestion window cwnd doubles. The time for one transmission round is actually the round-trip time RTT. However, the term “transmission round” emphasizes that all segments allowed by the congestion window cwnd are sent out continuously, and acknowledgment for the last byte sent is received.

Also, the “slow” in slow start does not refer to the slow rate of increase of cwnd but rather to the fact that when TCP starts sending segments, it sets cwnd=1, causing the sender to send only one segment at the beginning (to probe the network’s congestion situation) before gradually increasing cwnd.

To prevent the congestion window cwnd from growing too large and causing network congestion, a slow start threshold ssthresh state variable is also set. The usage of the slow start threshold ssthresh is as follows:

When cwnd < ssthresh, use the slow start algorithm mentioned above.
When cwnd > ssthresh, stop using the slow start algorithm and switch to the congestion avoidance algorithm.
When cwnd = ssthresh, either the slow start algorithm or the congestion control avoidance algorithm can be used. Congestion avoidance

Congestion avoidance

Gradually increase the congestion window cwnd slowly, increasing the sender’s congestion window cwnd by 1 for each round-trip time RTT, rather than doubling it. This means that the congestion window cwnd increases at a much slower rate than the slow start algorithm’s congestion window growth rate.

Whether in the slow start phase or the congestion avoidance phase, as long as the sender detects network congestion (indicated by the lack of acknowledgment), it should set the slow start threshold ssthresh to half the sender’s window value at the time of congestion (but not less than 2). Then, the congestion window cwnd is reset to 1, executing the slow start algorithm.

The purpose of this is to quickly reduce the number of packets sent to the network, giving congested routers enough time to process the packets queued in their buffers.

The following image illustrates the above congestion control process with specific values. Now the sending window size is equal to the congestion window.

2. Fast Retransmission and Fast Recovery

Fast Retransmission

The fast retransmission algorithm requires that the receiver immediately sends duplicate acknowledgments for each out-of-order segment it receives (to inform the sender as early as possible that a segment has not reached the other party) rather than waiting until it sends data to accompany the acknowledgment.

After receiving M1 and M2, the receiver sends acknowledgments for both. Now suppose the receiver did not receive M3 but then received M4.

Clearly, the receiver cannot acknowledge M4 because it is an out-of-order segment. According to the reliable transmission principle, the receiver can either do nothing or send an acknowledgment for M2 at an appropriate time.

However, according to the fast retransmission algorithm, the receiver should promptly send a duplicate acknowledgment for M2; this allows the sender to know as early as possible that segment M3 has not reached the receiver. The sender then sends M5 and M6. After receiving these two segments, the receiver will again send a duplicate acknowledgment for M2. Thus, the sender receives four acknowledgments for M2 from the receiver, three of which are duplicate acknowledgments.

The fast retransmission algorithm also stipulates that as soon as the sender receives three duplicate acknowledgments, it should immediately retransmit the segment M3 that has not been acknowledged, without waiting for the retransmission timer for M3 to expire.

By retransmitting unacknowledged segments early, the overall network throughput can be increased by about 20%.

Fast Recovery

Used in conjunction with fast retransmission, the fast recovery algorithm has the following two key points:

When the sender continuously receives three duplicate acknowledgments, it executes the “multiplicative decrease” algorithm, halving the slow start threshold ssthresh.
Unlike slow start, the congestion window cwnd is not set to 1; instead, it is set to the value of the slow start threshold ssthresh after it has been halved, and then the congestion avoidance algorithm (“additive increase”) is executed, allowing the congestion window to increase slowly in a linear manner.

Recommended Reading: 
Complete Compilation | Directory of 365 High-Quality Technical Articles 
The Beauty of Algorithms: Stacks and Queues 
The 10 Algorithms That Dominate This World 
Thorough Understanding of Cookies, Sessions, Tokens 
A Brief Discussion on Recursion Algorithms 
Focusing on Server-Side Technology Stack Knowledge Summary and Sharing 
Welcome to Follow and Communicate for Mutual Progress