Network Packet Loss and HTTP Performance

Understanding Packet Loss

Packet loss, as the name suggests, refers to data packets that are “lost” during network transmission and fail to reach their destination. The HTTP protocol runs on the TCP/IP protocol stack, and packet loss can occur at the network layer, such as when a router is overwhelmed and drops packets, or at the transport layer, such as when there is an issue with the TCP retransmission mechanism. Common causes include network congestion, line faults, hardware performance limitations, or configuration issues. For example, when a router’s queue is full, it is like a long line at a supermarket checkout where people at the back can only be “pushed out”; similarly, if the wireless network signal is unstable, data packets can be like letters blown away by the wind, going missing along the way. Packet loss in HTTP communication triggers TCP retransmissions, slowing down overall performance, making it a “network killer.”

The “Destructive Power” of Packet Loss

Packet loss can significantly impact HTTP performance, primarily reflected in soaring latency, plummeting throughput, and deteriorating user experience. Let’s break it down:

1. Soaring Latency: Packet loss forces TCP to retransmit, directly doubling response times. In scenarios like video conferencing, where real-time performance is critical, increased latency can cause the experience to lag like a slideshow, leading to a sharp decline in user satisfaction.
2. Plummeting Throughput: Retransmissions and TCP’s flow control drastically reduce data transmission efficiency, especially under high packet loss rates, where throughput can drop to “rock bottom,” causing large file downloads or video streaming services to effectively “collapse.”
3. Ripple Effects: Packet loss can trigger a “domino effect,” such as web servers timing out and disconnecting, increasing load; clients may become impatient and retry excessively, further congesting the network. In high-concurrency scenarios, these issues can be “adding fuel to the fire,” potentially causing services to completely “break down.”

HTTP’s Time Ledger

The total time taken for an HTTP request can be broken down into the following stages:

1. DNS Resolution Time: The time taken to resolve a domain name to an IP address.
2. TCP Connection Establishment Time: The time taken during the three-way handshake.
3. TLS Handshake Time (under HTTPS).
4. Request Sending Time: Including the time taken to write to the socket.
5. Time to First Byte (TTFB): The time taken for the server to process and return the first byte.
6. Response Receiving Time: The time from the first byte to the completion of the full response.

Among these stages, TCP layer retransmissions and congestion control are the most affected by packet loss, particularly TTFB and response receiving time.

How Packet Loss Slows Down HTTP Requests

TCP Retransmission: The Culprit

TCP relies on an acknowledgment (ACK) mechanism to ensure data is not lost. If a data packet is lost, the sender does not receive an ACK and must retransmit. For example, if you are sending a book and the post office requires a signature for each page, if one page is lost, you must resend that page and wait for confirmation, extending the entire process. The same goes for HTTP requests; packet loss triggers retransmissions, causing delays to “skyrocket.” For instance, in a network with a round-trip time (RTT) of 10ms, a 1% packet loss rate can increase latency from 10ms to 30ms or even higher, making the cost of retransmission significant.

TCP’s retransmission mechanism mainly includes two methods: Timeout Retransmission refers to resending a data packet if an ACK is not received within a certain time (RTO) after sending the data, suitable for scenarios with high network fluctuations or continuous packet loss, but it is slower and can lead to increased latency. Fast Retransmission occurs when the receiver receives out-of-order data and sends three identical ACKs in succession to inform the sender that a certain data packet is lost, prompting immediate retransmission without waiting for a timeout. This is suitable for minor packet loss and when subsequent data packets arrive, significantly reducing retransmission delays.

How Packet Loss Chokes Throughput

Packet loss not only slows down speed but also causes throughput to “suffer.” TCP’s congestion control algorithm treats packet loss as a signal of network congestion, immediately reducing the sending window, causing the data sending rate to drop sharply. Retransmissions also consume bandwidth, encroaching on the “territory” of effective data. In high-load scenarios, such as video streaming services, a 1% packet loss rate can reduce throughput by over 20%. Not to mention the overhead of TCP flow control, such as adjustments to the receiver’s window, which exacerbates the situation under high packet loss rates, causing throughput to “stagnate.”

Worse still, large packets (like downloading large HTTP files) are more susceptible to packet loss risks. During large data transfers, each additional round of retransmission increases CPU and memory pressure.

How HTTP/2 and HTTP/3 Tackle Packet Loss

HTTP/2 reduces the number of TCP connections through multiplexing, allowing multiple requests to share a single connection, thereby “diluting” the impact of packet loss. However, it is still based on TCP, and packet loss can cause all streams to “stutter” (head-of-line blocking).

HTTP/3, based on UDP and the QUIC protocol, takes a different approach by employing independent stream control and fast retransmission to “fight back” against packet loss. For instance, QUIC can retransmit only the lost stream without affecting other streams and uses 0-RTT connection establishment to reduce initial latency, making it more resilient to packet loss.

Experimental Validation

We used the ChaosMesh platform to simulate different packet loss rates and test HTTP response latency and throughput. The testing environment had a bandwidth of 100Mbps, with an RTT of 10ms between the client and server, and the load included small data (996B), medium data (25.1KB), and large data (227.2KB) HTTP requests, with the test script named <span>FunTester_HttpPerf</span>.

Response Latency Results

Packet Loss Rate	996 B Response Time (ms)	25.1 KB Response Time (ms)	227.2 KB Response Time (ms)
0%	1	3	22
0.1%	≈1 (no significant change)	≈3 (no significant change)	≈22 (no significant change)
1%	31	41	170
2%	42	60	190

Analysis: At a packet loss rate of 0.1%, latency showed almost no change, indicating that the TCP retransmission mechanism can “cover” a small amount of packet loss. However, at a packet loss rate of 1%, latency “rocketed,” especially for large data (227.2KB), which increased from 22ms to 170ms, an increase of nearly 8 times. At a 2% packet loss rate, latency further “exacerbated,” causing large response data transfers to effectively “freeze like a PowerPoint presentation.”

Throughput Results

Bandwidth	0%	1%	2%	3%	4%	5%	6%	7%	8%	9%	10%
Mean	804.67	222.49	168.03	106.43	63.57	36.59	24.99	15.52	10.82	36.59	15.52
STD	13.02	13.79	34.91	44.62	34.81	24.44	16.93	11.58	8.26	24.44	11.58
Min	710	51.21	5.97	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
25%	799.99	214.79	151.14	72.57	35.80	16.90	11.93	5.97	4.97	16.90	5.97
50%	809.93	222.73	182.45	108.35	59.66	31.84	21.87	11.94	8.95	31.84	11.94
75%	810.046	230.68	191.89	144.67	87.00	51.70	34.79	21.87	14.92	51.70	21.87
Max	830.419	280.88	212.79	188.91	163.07	148.64	118.81	82.03	63.64	148.64	82.03

Analysis: At a 0% packet loss rate, the average throughput was 804.67Mbps, close to theoretical bandwidth, stable as a rock. At a 1% packet loss rate, throughput dropped to 222.49Mbps, a staggering 72% decrease. At a 5% packet loss rate, throughput was only 36.59Mbps, a 95% drop, effectively causing the service to “break down.” The standard deviation (STD) skyrocketed with increasing packet loss rates, indicating that throughput fluctuations became more pronounced, leading to a “collapse” in stability.

1. Latency: Once the packet loss rate exceeds 0.1%, HTTP response latency “takes off,” with latency increases in large data transfer scenarios reaching several times, as evidenced by the test script <span>FunTester_HttpPerf</span>.
2. Throughput: For every 1% increase in packet loss rate, throughput “plummets,” and at rates above 5%, it effectively “collapses” the service.
3. Practical Significance: High-performance web applications must keep packet loss rates below 0.1%, or both latency and throughput will “suffer.”

Best Practices for Testing Packet Loss

In fault testing, setting packet loss rates should be “tailored” to the business scenario and the commitments of cloud vendors. Major cloud providers (like Alibaba Cloud, AWS) typically guarantee packet loss rates below 0.1%, with high-end services even lower at 0.01%. Based on this, the following practices are recommended:

1. Low Packet Loss Testing (0.1%): Simulate normal network conditions to verify application performance under cloud service standards, with latency and throughput remaining essentially “unchanged.”
2. Medium Packet Loss Testing (1%): Simulate network congestion or minor faults to test the application’s “stress resistance,” with latency potentially increasing 10–100 times and throughput dropping 50%–80%.
3. High Packet Loss Testing (5%): Simulate extreme faults, such as line interruptions, to test application performance under “hell mode,” where services effectively “collapse,” relying on downgrade strategies or fault tolerance mechanisms to save the situation.
4. Dynamic Adjustment: Based on business needs (such as high real-time requirements), focus on testing medium packet loss rates in high-load scenarios, using ChaosMesh combined with automated test scripts to identify weak points in stability.

It is also recommended to limit packet loss injection to specific links or services to avoid interference across the entire link, which could disrupt observations.

FunTester

FunTester Original Highlights

[Free Collection] Performance Testing Starting from Java

Fault Testing and Web Frontend

Server-side Functional Testing

Performance Testing Topics

Java, Groovy, Go

Test Development, Automation, White Box

Testing Theory, FunTester Style

Video Topics

FunTester Original Highlights

[Free Collection] Performance Testing Starting from Java

Fault Testing and Web Frontend

Server-side Functional Testing

Performance Testing Topics

Java, Groovy, Go

Test Development, Automation, White Box

Testing Theory, FunTester Style

Video Topics

Related posts

Leave a Comment Cancel reply