Source: https://juejin.im/post/6844903490595061767
Author: Ruheng
1. TCP/IP Model
The above image uses the HTTP protocol as an example to explain in detail.
2. Data Link Layer
-
Framing: Encapsulating the network layer datagram with headers and trailers into frames, where the frame header includes the source and destination MAC addresses.
-
Transparent transmission: Zero bit padding, escape characters.
-
Reliable transmission: Rarely used on low-error-rate links, but wireless links (WLAN) will ensure reliable transmission.
-
Error detection (CRC): The receiver detects errors, and if an error is found, the frame is discarded.
3. Network Layer
1. IP Protocol
The IP protocol is the core of the TCP/IP protocol, and all TCP, UDP, ICMP, and IGMP data are transmitted in IP data format. It is important to note that IP is not a reliable protocol, meaning it does not provide a mechanism for handling undelivered data; this is the responsibility of higher-layer protocols such as TCP or UDP.
1.1 IP Address
1.2 IP Protocol Header
2. ARP and RARP Protocols
3. ICMP Protocol
The IP protocol is not a reliable protocol; it does not guarantee data delivery. Therefore, the task of ensuring data delivery should be handled by other modules, one of which is the ICMP (Internet Control Message Protocol). ICMP is not a high-level protocol but an IP layer protocol.
When errors occur in transmitting IP data packets, such as host unreachable or route unreachable, the ICMP protocol will encapsulate the error information and send it back to the host, giving the host a chance to handle the error. This is why protocols built on top of the IP layer can potentially be secure.
4. Ping
Ping can be said to be the most famous application of ICMP and is part of the TCP/IP protocol. Using the “ping” command, we can check whether the network is reachable, which helps analyze and determine network faults.
For example, when we cannot access a particular website, we usually ping the site. The ping command will echo back some useful information. General information is as follows:
5. Traceroute
Traceroute is an important tool for detecting the routing situation between a host and the destination host, and it is also the most convenient tool.
The principle of Traceroute is very interesting; after receiving the destination host’s IP, it first sends a TTL=1 UDP packet to the destination host. The first router that receives this packet will automatically decrease the TTL by 1, and when the TTL becomes 0, the router will discard the packet and generate an unreachable ICMP message back to the host. The host receives this message and then sends a TTL=2 UDP packet to the destination host, triggering the second router to send an ICMP message back to the host. This process continues until the destination host is reached. Thus, Traceroute obtains the IPs of all routers.
6. TCP/UDP
Message-oriented
In message-oriented transmission, the application layer gives UDP the length of the message, and UDP sends it as is, sending one message at a time. Therefore, the application must choose an appropriate message size. If the message is too long, the IP layer needs to fragment it, reducing efficiency. If it is too short, it will be too small for IP.
Byte stream-oriented
When Should TCP Be Used?
When Should UDP Be Used?
7. DNS
8. Establishing and Terminating TCP Connections
1. Three-Way Handshake
Why Three-Way Handshake?
2. Four-Way Handshake
After the client and server establish a TCP connection through the three-way handshake, when data transmission is complete, the TCP connection must be terminated. This is where the mysterious “four-way handshake” comes into play.
First Handshake: Host 1 (which can be the client or server) sets the Sequence Number and sends a FIN segment to Host 2; at this point, Host 1 enters FIN_WAIT_1 state; this indicates that Host 1 has no data to send to Host 2;
Second Handshake: Host 2 receives the FIN segment sent by Host 1 and replies with an ACK segment, with the Acknowledgment Number equal to the Sequence Number plus 1; Host 1 enters FIN_WAIT_2 state; Host 2 informs Host 1 that it “agrees” to the shutdown request;
Third Handshake: Host 2 sends a FIN segment to Host 1, requesting to close the connection, while Host 2 enters LAST_ACK state;
Fourth Handshake: Host 1 receives the FIN segment sent by Host 2 and sends an ACK segment to Host 2; then Host 1 enters TIME_WAIT state; after Host 2 receives the ACK segment from Host 1, it closes the connection; at this point, Host 1 waits for 2MSL, and if it does not receive a response, it confirms that the server has closed normally, and Host 1 can also close the connection.
Why Four-Way Handshake?
Why Wait 2MSL?
MSL: Maximum Segment Lifetime, which is the longest time any segment can exist in the network before being discarded. There are two reasons for this:
-
To ensure that the TCP protocol’s full-duplex connection can be reliably closed.
-
To ensure that any duplicate segments from this connection disappear from the network.
The first point: If Host 1 directly enters the CLOSED state, due to the unreliability of the IP protocol or other network reasons, if Host 2 does not receive Host 1’s last ACK response, Host 2 will continue to send FIN after timing out. Since Host 1 is already CLOSED, it will not find a corresponding connection for the retransmitted FIN. Therefore, Host 1 does not directly enter CLOSED; it must remain in TIME_WAIT, ensuring that it receives the FIN again to confirm the other party received the ACK, thereby correctly closing the connection.
The second point: If Host 1 directly enters CLOSED and then initiates a new connection to Host 2, we cannot guarantee that the new connection’s port number is different from that of the closed connection. This means that the new connection may have the same port number as the old connection. Generally, this will not cause any issues, but special cases can occur: if the new connection and the already closed old connection have the same port number, and some delayed data from the previous connection arrives after the new connection is established, TCP will mistakenly consider that delayed data as belonging to the new connection, causing confusion with the actual new connection’s packets. Therefore, the TCP connection must remain in TIME_WAIT for 2MSL to ensure that all data from this connection disappears from the network.
9. TCP Flow Control
If the sender sends data too quickly, the receiver may not be able to keep up, leading to data loss. Flow control is about ensuring that the sender’s transmission rate is not too fast, allowing the receiver enough time to receive.
Using the sliding window mechanism can conveniently implement flow control on a TCP connection.
Assuming A sends data to B. During connection establishment, B informs A: “My receiving window is rwnd = 400” (where rwnd indicates the receiver window). Therefore, the sender’s sending window cannot exceed the value of the receiving window provided by the receiver. Note that the TCP window is measured in bytes, not segments. Assume each segment is 100 bytes long, and the initial value of the data segment sequence number is set to 1. Uppercase ACK indicates the acknowledgment bit in the header, while lowercase ack indicates the acknowledgment field value.
From the image, we can see that B performed three flow control actions. The first reduced the window to rwnd = 300, the second to rwnd = 100, and finally to rwnd = 0, indicating that the sender is no longer allowed to send data. This state of pausing the sender will continue until Host B sends a new window value.
TCP sets a persistent timer for each connection. Whenever one side of the TCP connection receives a zero window notification from the other side, it starts the persistent timer. If the timer expires, it sends a zero window probe segment (carrying 1 byte of data), and the party receiving this segment resets the persistent timer.
10. TCP Congestion Control
The sender maintains a congestion window cwnd (congestion window) as a state variable. The size of the congestion window depends on the degree of network congestion and changes dynamically. The sender sets its sending window equal to the congestion window.
The principle for controlling the congestion window is: as long as there is no congestion in the network, the congestion window increases, allowing more packets to be sent. However, if congestion occurs, the congestion window decreases to reduce the number of packets injected into the network.
Slow start algorithm:
When a host starts sending data, if it injects a large number of bytes into the network immediately, it may cause network congestion, as the load situation of the network is unknown. Therefore, a better method is to probe first, gradually increasing the sending window from small to large, meaning the congestion window is gradually increased.
Usually, when starting to send segments, the congestion window cwnd is set to a value of one maximum segment size (MSS). After receiving acknowledgment for a new segment, the congestion window is increased by at most one MSS. Using this method to gradually increase the sender’s congestion window cwnd allows packets to be injected into the network at a more reasonable rate.
With each transmission round, the congestion window cwnd doubles. The time for one transmission round is actually the round-trip time RTT. However, the term “transmission round” emphasizes that all segments allowed by the congestion window cwnd are sent out continuously, and acknowledgment for the last byte sent is received.
Also, the “slow” in slow start does not refer to the slow rate of increase of cwnd but rather to the fact that when TCP starts sending segments, it sets cwnd=1, causing the sender to send only one segment at the beginning (to probe the network’s congestion situation) before gradually increasing cwnd.
To prevent the congestion window cwnd from growing too large and causing network congestion, a slow start threshold ssthresh state variable is also set. The usage of the slow start threshold ssthresh is as follows:
-
When cwnd < ssthresh, use the slow start algorithm mentioned above.
-
When cwnd > ssthresh, stop using the slow start algorithm and switch to the congestion avoidance algorithm.
-
When cwnd = ssthresh, either the slow start algorithm or the congestion control avoidance algorithm can be used. Congestion avoidance
Congestion avoidance
Gradually increase the congestion window cwnd slowly, increasing the sender’s congestion window cwnd by 1 for each round-trip time RTT, rather than doubling it. This means that the congestion window cwnd increases at a much slower rate than the slow start algorithm’s congestion window growth rate.
Whether in the slow start phase or the congestion avoidance phase, as long as the sender detects network congestion (indicated by the lack of acknowledgment), it should set the slow start threshold ssthresh to half the sender’s window value at the time of congestion (but not less than 2). Then, the congestion window cwnd is reset to 1, executing the slow start algorithm.
The purpose of this is to quickly reduce the number of packets sent to the network, giving congested routers enough time to process the packets queued in their buffers.
The following image illustrates the above congestion control process with specific values. Now the sending window size is equal to the congestion window.
2. Fast Retransmission and Fast Recovery
Fast Retransmission
The fast retransmission algorithm requires that the receiver immediately sends duplicate acknowledgments for each out-of-order segment it receives (to inform the sender as early as possible that a segment has not reached the other party) rather than waiting until it sends data to accompany the acknowledgment.
After receiving M1 and M2, the receiver sends acknowledgments for both. Now suppose the receiver did not receive M3 but then received M4.
Clearly, the receiver cannot acknowledge M4 because it is an out-of-order segment. According to the reliable transmission principle, the receiver can either do nothing or send an acknowledgment for M2 at an appropriate time.
However, according to the fast retransmission algorithm, the receiver should promptly send a duplicate acknowledgment for M2; this allows the sender to know as early as possible that segment M3 has not reached the receiver. The sender then sends M5 and M6. After receiving these two segments, the receiver will again send a duplicate acknowledgment for M2. Thus, the sender receives four acknowledgments for M2 from the receiver, three of which are duplicate acknowledgments.
The fast retransmission algorithm also stipulates that as soon as the sender receives three duplicate acknowledgments, it should immediately retransmit the segment M3 that has not been acknowledged, without waiting for the retransmission timer for M3 to expire.
By retransmitting unacknowledged segments early, the overall network throughput can be increased by about 20%.
Fast Recovery
-
When the sender continuously receives three duplicate acknowledgments, it executes the “multiplicative decrease” algorithm, halving the slow start threshold ssthresh. -
Unlike slow start, the congestion window cwnd is not set to 1; instead, it is set to the value of the slow start threshold ssthresh after it has been halved, and then the congestion avoidance algorithm (“additive increase”) is executed, allowing the congestion window to increase slowly in a linear manner.
Recommended Reading:
Complete Compilation | Directory of 365 High-Quality Technical Articles
The Beauty of Algorithms: Stacks and Queues
The 10 Algorithms That Dominate This World
Thorough Understanding of Cookies, Sessions, Tokens
A Brief Discussion on Recursion Algorithms
Focusing on Server-Side Technology Stack Knowledge Summary and Sharing
Welcome to Follow and Communicate for Mutual Progress