Understanding TCP/IP in One Article

Author: Ruheng Source: Juejin

Understanding TCP/IP in One Article

Understanding TCP/IP in One Article

1. TCP/IP Model

The TCP/IP protocol model (Transmission Control Protocol/Internet Protocol) includes a set of network protocols that form the foundation of the Internet; it is the core protocol of the Internet.
The TCP/IP reference model divides the protocols into four layers: the Link Layer, Network Layer, Transport Layer, and Application Layer. The diagram below shows the correspondence between the TCP/IP model and the OSI model.

Understanding TCP/IP in One Article

The TCP/IP protocol suite is layered from top to bottom, with each layer encapsulating the one below it. The top layer is the Application Layer, which includes familiar protocols such as HTTP and FTP. The second layer is the Transport Layer, where the famous TCP and UDP protocols reside. The third layer is the Network Layer, where the IP protocol is located; it is responsible for adding IP addresses and other data to determine the transmission target. The fourth layer is the Data Link Layer, which adds an Ethernet protocol header to the data to be transmitted and performs CRC encoding, preparing for the final data transmission.

Understanding TCP/IP in One Article

The above diagram clearly shows the role of each layer in the TCP/IP protocol, while the process of communication using the TCP/IP protocol corresponds to the process of data being pushed onto and popped off a stack. In the push process, the sender at each layer continuously encapsulates headers and trailers, adding some transmission information to ensure it reaches the destination. In the pop process, the receiver at each layer continuously removes headers and trailers to obtain the final transmitted data.

Understanding TCP/IP in One Article

The above diagram uses the HTTP protocol as an example to illustrate this.

2. Data Link Layer

The Physical Layer is responsible for the conversion between the 0 and 1 bit streams and the voltage levels of physical devices, as well as the on/off state of light. The Data Link Layer is responsible for dividing the 0 and 1 sequences into data frames for transmission from one node to another adjacent node, which are uniquely identified by MAC addresses (MAC, the physical address, with each host having a MAC address).

Understanding TCP/IP in One Article

  • Frame Encapsulation: Adding headers and trailers to the network layer datagram to encapsulate it into a frame, with the frame header including the source MAC address and destination MAC address.

  • Transparent Transmission: Zero bit padding, escape characters.

  • Reliable Transmission: Rarely used on low-error-rate links, but wireless links (WLAN) ensure reliable transmission.

  • Error Detection (CRC): The receiver detects errors, and if it finds an error, it discards the frame.

3. Network Layer

1. IP Protocol

The IP protocol is the core of the TCP/IP protocol, with all TCP, UDP, ICMP, and IGMP data transmitted in IP data format. It is important to note that IP is not a reliable protocol, meaning that it does not provide a mechanism for handling data that has not been delivered; this is considered the responsibility of upper-layer protocols: TCP or UDP.

1.1 IP Address

In the Data Link Layer, we generally identify different nodes by MAC addresses, while at the IP layer, we also need a similar address identifier, which is the IP address.
A 32-bit IP address is divided into a network part and a host part, which reduces the number of routing table entries in routers. With a network address, we can limit terminals with the same network address to the same range, allowing the routing table to maintain only one entry for that network address to find the corresponding terminals.
Class A IP Address: 0.0.0.0~127.0.0.0 Class B IP Address: 128.0.0.1~191.255.0.0 Class C IP Address: 192.168.0.0~239.255.255.0

1.2 IP Protocol Header

Understanding TCP/IP in One Article

Here we only introduce: the 8-bit TTL field. This field specifies how many routers the data packet can pass through before being discarded. Each time an IP data packet passes through a router, its TTL value decreases by 1, and when the TTL becomes zero, it is automatically discarded.
The maximum value of this field is 255, meaning a protocol packet can pass through routers 255 times before being discarded. Depending on the system, this number may vary, typically being 32 or 64.

2. ARP and RARP Protocols

ARP is a protocol for obtaining MAC addresses based on IP addresses.
The ARP (Address Resolution) protocol is a resolution protocol; originally, the host has no idea which interface of which host corresponds to this IP. When a host wants to send an IP packet, it first checks its ARP cache (which is an IP-MAC address correspondence table cache).
If the queried IP-MAC pair does not exist, the host sends an ARP broadcast packet to the network, which contains the IP address to be queried. All hosts that receive this broadcast will check their own IP addresses, and if any host finds it matches, it will prepare an ARP packet containing its MAC address to send back to the host that sent the ARP broadcast.
Once the broadcasting host receives the ARP packet, it updates its ARP cache (which stores the IP-MAC correspondence table). The broadcasting host will then use the new ARP cache data to prepare for sending data link layer packets.
The RARP protocol works oppositely and will not be elaborated here.

3. ICMP Protocol

The IP protocol is not a reliable protocol; it does not guarantee that data will be delivered. Naturally, the task of ensuring data delivery should be handled by other modules, one of which is the ICMP (Internet Control Message Protocol). ICMP is not a high-level protocol but is a protocol at the IP layer.

When an error occurs during the transmission of IP data packets, such as host unreachable or route unreachable, the ICMP protocol will encapsulate the error information and send it back to the host, giving the host a chance to handle the error. This is why it is said that protocols built on top of the IP layer can potentially be secure.

4. Ping

Ping can be considered the most famous application of ICMP and is part of the TCP/IP protocol. Using the “ping” command can check network connectivity and help us analyze and diagnose network faults effectively.

For example, when we cannot access a certain website, we usually ping that website. Ping will echo some useful information. General information is as follows:

Understanding TCP/IP in One Article

The word ping comes from sonar positioning, and the function of this program is indeed similar; it uses ICMP packets to detect whether another host is reachable. The principle is to send a request with an ICMP type code of 0, and the host receiving the request will respond with an ICMP type code of 8.

5. Traceroute

Traceroute is an important and convenient tool for detecting the routing situation between a host and the destination host.

The principle of Traceroute is very interesting. After receiving the destination host’s IP, it first sends a TTL=1 UDP packet to the destination host. The first router that receives this packet will automatically decrease the TTL by 1, and when the TTL becomes 0, the router will discard this packet and generate an unreachable ICMP data report to the host. The host receives this report and then sends a TTL=2 UDP packet to the destination host, prompting the second router to send an ICMP data report to the host. This process continues until it reaches the destination host. In this way, Traceroute obtains the IP addresses of all routers.

Understanding TCP/IP in One Article

6. TCP/UDP

TCP and UDP are both transport layer protocols, but they have different characteristics and applications. Below is a comparative analysis in tabular form.

Understanding TCP/IP in One Article

Message-oriented

Message-oriented transmission means that the application layer gives UDP a message of whatever length, and UDP sends it as is, meaning the application must choose an appropriate message size. If the message is too long, the IP layer needs to fragment it, reducing efficiency. If it is too short, it may be too small for IP.

Byte-stream oriented

In byte-stream orientation, although the application interacts with TCP one data block at a time (of varying sizes), TCP treats the application as a series of unstructured byte streams. TCP has a buffer; when the data block sent by the application is too long, TCP can divide it into shorter segments for transmission.
Regarding congestion control and flow control, these are key aspects of TCP, which will be explained later.
Some applications of TCP and UDP protocols

Understanding TCP/IP in One Article

When to Use TCP?
When there are requirements for network communication quality, such as when all data must be accurately transmitted to the other party. This is often used for reliable applications, such as HTTP, HTTPS, FTP for file transfer, and POP, SMTP for email transfer protocols.
When to Use UDP?
When there are no high requirements for network communication quality, and speed is prioritized, UDP can be used.

7. DNS

DNS (Domain Name System) is a distributed database on the Internet that maps domain names to IP addresses, making it easier for users to access the Internet without needing to remember machine-readable IP numbers. The process of obtaining the IP address corresponding to a hostname is called domain name resolution (or hostname resolution). The DNS protocol runs on top of the UDP protocol, using port number 53.

8. Establishing and Terminating TCP Connections

1. Three-way Handshake

TCP is connection-oriented, meaning that before either party sends data, a connection must be established between both parties. In the TCP/IP protocol, the TCP protocol provides reliable connection services, and the connection is initialized through a three-way handshake. The purpose of the three-way handshake is to synchronize the sequence numbers and acknowledgment numbers of both parties and exchange TCP window size information.

Understanding TCP/IP in One Article

First Handshake: Establishing the connection. The client sends a connection request segment, setting SYN to 1 and Sequence Number to x; then, the client enters the SYN_SEND state, waiting for the server’s acknowledgment;
Second Handshake: The server receives the SYN segment. The server acknowledges the client’s SYN segment by setting the Acknowledgment Number to x+1 (Sequence Number+1); at the same time, it sends a SYN request with SYN set to 1 and Sequence Number set to y; the server enters the SYN_RECV state while sending this SYN+ACK segment to the client.
Third Handshake: The client receives the server’s SYN+ACK segment. It then sets the Acknowledgment Number to y+1 and sends an ACK segment to the server. After this segment is sent, both the client and server enter the ESTABLISHED state, completing the TCP three-way handshake.

Why Three-way Handshake?

To prevent a stale connection request segment from suddenly being sent to the server, causing errors.
Specific Example: The occurrence of a “stale connection request segment” happens in a situation where the first connection request segment sent by the client is not lost but lingers at a network node for a long time, arriving after the connection has been released. This segment is already invalid. However, when the server receives this stale connection request segment, it mistakenly believes it is a new connection request from the client.
Thus, it sends an acknowledgment segment to the client, agreeing to establish the connection. If we did not use the “three-way handshake,” as soon as the server sends the acknowledgment, a new connection would be established. Since the client has not sent a request to establish the connection, it will ignore the server’s acknowledgment and will not send data to the server. However, the server will think that a new transport connection has been established and will wait for data from the client, wasting many of the server’s resources. The “three-way handshake” method can prevent this phenomenon. For instance, in the aforementioned situation, the client would not send an acknowledgment to the server. Since the server does not receive the acknowledgment, it knows that the client did not request to establish a connection.
2. Four-way Handshake

After the TCP connection is established between the client and server via the three-way handshake, once the data transmission is complete, the TCP connection must be terminated. This is where the mysterious “four-way handshake” comes in.

Understanding TCP/IP in One Article

First Handshake: Host 1 (which can be either the client or the server) sets the Sequence Number and sends a FIN segment to Host 2; at this point, Host 1 enters the FIN_WAIT_1 state; this indicates that Host 1 has no data to send to Host 2;

Second Handshake: Host 2 receives the FIN segment sent by Host 1 and sends an ACK segment back to Host 1, with the Acknowledgment Number being the Sequence Number plus 1; Host 1 enters the FIN_WAIT_2 state; Host 2 indicates to Host 1 that it “agrees” to the close request; Third Handshake: Host 2 sends a FIN segment to Host 1, requesting to close the connection, while Host 2 enters the LAST_ACK state;

Fourth Handshake: Host 1 receives the FIN segment sent by Host 2 and sends an ACK segment back to Host 2, then Host 1 enters the TIME_WAIT state; after Host 2 receives Host 1’s ACK segment, it closes the connection; at this point, Host 1 waits for 2MSL and if it still has not received a reply, it proves that the server has closed properly, and Host 1 can also close the connection.

Why Four-way Handshake?

The TCP protocol is a connection-oriented, reliable, byte-stream-based transport layer communication protocol. TCP operates in full-duplex mode, meaning that when Host 1 sends a FIN segment, it only indicates that Host 1 has no more data to send to Host 2; Host 1 informs Host 2 that it has finished sending all its data. However, at this point, Host 1 can still receive data from Host 2. When Host 2 returns the ACK segment, it indicates that it knows Host 1 has no data to send, but Host 2 can still send data to Host 1. When Host 2 also sends a FIN segment, it indicates that Host 2 also has no data to send, and it informs Host 1 that it has no data to send either, after which both can happily terminate the TCP connection.
Why Wait for 2MSL?

MSL: Maximum Segment Lifetime, which is the longest time any segment can exist in the network before being discarded. There are two reasons:

  • To ensure that the TCP protocol’s full-duplex connection can be reliably closed

  • To ensure that any duplicate segments from this connection disappear from the network

First Point: If Host 1 directly goes to CLOSED, due to the unreliability of the IP protocol or other network reasons, Host 2 may not receive Host 1’s last ACK reply. At this point, Host 2 will continue to send FIN after a timeout. Since Host 1 has already CLOSED, it cannot find the connection corresponding to the resent FIN. Therefore, Host 1 does not go directly to CLOSED, but remains in TIME_WAIT, allowing it to ensure that it receives the ACK for the FIN it sent, thus correctly closing the connection.

Second Point: If Host 1 goes directly to CLOSED and then initiates a new connection to Host 2, we cannot guarantee that this new connection will have a different port number than the one that was just closed. This means it is possible for the new connection and the old connection to have the same port number. Generally, this will not cause any issues, but special circumstances may arise: if the new connection and the already closed old connection have the same port number, and some data from the previous connection is still lingering in the network, this delayed data may arrive at Host 2 after the new connection is established. Since the new connection and the old connection have the same port number, the TCP protocol will consider that delayed data to belong to the new connection, leading to confusion with the actual data from the new connection. Therefore, the TCP connection must remain in the TIME_WAIT state for 2MSL to ensure that all data from this connection has disappeared from the network.

9. TCP Flow Control

If the sender sends data too quickly, the receiver may not be able to keep up, leading to data loss. Flow control ensures that the sender’s transmission rate is not too fast, allowing the receiver to keep up with the data.

Using the sliding window mechanism allows for easy implementation of flow control on a TCP connection.

Assuming A is sending data to B. When the connection is established, B informs A: “My receive window is rwnd = 400” (where rwnd indicates the receiver window). Therefore, the sender’s sending window cannot exceed the value of the receiver window provided by the receiver. Note that the TCP window is measured in bytes, not segments. Assuming each segment is 100 bytes long, and the initial value of the data segment sequence number is set to 1. Uppercase ACK indicates the acknowledgment bit in the header, while lowercase ack indicates the acknowledgment field value.

Understanding TCP/IP in One Article

From the diagram, it can be seen that B performs flow control three times. The first time it reduces the window to rwnd = 300, the second time to rwnd = 100, and finally to rwnd = 0, which means the sender is no longer allowed to send data. This state of pausing the sender will continue until Host B sends a new window value.

TCP establishes a persistence timer for each connection. As soon as one side of the TCP connection receives a zero window notification from the other side, it starts the persistence timer. If the persistence timer expires, it sends a zero window probe segment (carrying 1 byte of data), and the receiving party of this segment resets the persistence timer.

10. TCP Congestion Control

The sender maintains a congestion window cwnd (congestion window) state variable. The size of the congestion window depends on the level of network congestion and changes dynamically. The sender sets its sending window equal to the congestion window.

The principle for controlling the congestion window is: as long as there is no congestion in the network, the congestion window increases to allow more packets to be sent. However, if network congestion occurs, the congestion window decreases to reduce the number of packets injected into the network.

Slow start algorithm:

When the host starts sending data, if a large amount of data is injected into the network immediately, it may cause network congestion, as the current load situation of the network is unknown. Therefore, a better approach is to probe the network first, gradually increasing the sending window from small to large, meaning the congestion window’s value is increased gradually.

Typically, when starting to send segments, the congestion window cwnd is set to the value of one Maximum Segment Size (MSS). After receiving an acknowledgment for each new segment, the congestion window is increased by at most one MSS. This method of gradually increasing the sender’s congestion window cwnd allows packets to be injected into the network at a more reasonable rate.

Understanding TCP/IP in One Article

Each transmission round doubles the congestion window cwnd. The time taken for a transmission round is essentially the round-trip time RTT. However, the term “transmission round” emphasizes that all segments allowed to be sent by the congestion window cwnd are sent continuously until the acknowledgment for the last byte sent is received.

Additionally, the “slow” in slow start does not refer to the slow growth rate of cwnd but rather that when TCP starts sending segments, it first sets cwnd=1, allowing the sender to send only one segment initially (to probe the network congestion condition), and then gradually increases cwnd.

To prevent the congestion window cwnd from growing too large and causing network congestion, a slow start threshold ssthresh state variable needs to be set. The usage of the slow start threshold ssthresh is as follows:

  • When cwnd < ssthresh, use the above slow start algorithm.

  • When cwnd > ssthresh, stop using the slow start algorithm and switch to the congestion avoidance algorithm.

  • When cwnd = ssthresh, either the slow start algorithm or the congestion avoidance algorithm can be used.

Congestion Avoidance

Allow the congestion window cwnd to increase slowly, that is, increase the sender’s congestion window cwnd by 1 for each round-trip time RTT, rather than doubling it. This way, the congestion window cwnd grows slowly in a linear fashion, much slower than the congestion window growth rate in the slow start algorithm.

Understanding TCP/IP in One Article

Whether in the slow start phase or the congestion avoidance phase, as long as the sender detects network congestion (indicated by the absence of acknowledgments received), it must set the slow start threshold ssthresh to half the value of the sender’s window when congestion occurred (but not less than 2). Then it resets the congestion window cwnd to 1 and executes the slow start algorithm.

The purpose of this is to quickly reduce the number of packets sent into the network, giving the congested router enough time to process the packets in its queue.

As shown in the following diagram, specific values illustrate the above congestion control process. The current size of the sending window is equal to the congestion window.

Understanding TCP/IP in One Article

2. Fast Retransmit and Fast Recovery

Fast Retransmit

The fast retransmit algorithm requires the receiver to immediately send a duplicate acknowledgment for any out-of-order segment received, rather than waiting until it sends data to send the acknowledgment.

Understanding TCP/IP in One Article

After the receiver receives M1 and M2, it sends acknowledgments for them. Now, assume the receiver has not received M3 but has received M4.

Clearly, the receiver cannot acknowledge M4 because it is an out-of-order segment. According to reliable transmission principles, the receiver can either do nothing or send a duplicate acknowledgment for M2 at an appropriate time.

However, according to the fast retransmit algorithm, the receiver should promptly send a duplicate acknowledgment for M2, allowing the sender to know early that segment M3 has not reached the receiver. The sender then sends M5 and M6. The receiver receives these two segments and must again send a duplicate acknowledgment for M2. In total, the sender has received four duplicate acknowledgments for M2, with the last three being duplicates.

The fast retransmit algorithm also stipulates that as soon as the sender receives three duplicate acknowledgments, it should immediately retransmit the segment M3 that has not been acknowledged, without waiting for the retransmission timer for M3 to expire.

By retransmitting unacknowledged segments early, the overall throughput of the network can be increased by about 20%.

Fast Recovery

Along with fast retransmit, the fast recovery algorithm has two key points:
When the sender receives three consecutive duplicate acknowledgments, it executes a “multiplicative decrease” algorithm, halving the slow start threshold ssthresh.
Unlike slow start, the congestion window cwnd is not set to 1; instead, it is set to the value of the slow start threshold ssthresh halved, and then the congestion avoidance algorithm (“additive increase”) is executed, allowing the congestion window to increase slowly in a linear manner.

From: Juejin, Author: Ruheng

Link: https://juejin.im/post/6844903490595061767

Understanding TCP/IP in One Article
Understanding TCP/IP in One Article

Leave a Comment