HTTP Chunked Transfer Encoding

Introduction

In HTTP, uploading and downloading files is always a time-consuming process, especially with large files. This led to the development of such data transfer methods after HTTP/1.1.

Chunked transfer encoding divides a large file into different chunks for transmission, allowing the client to reassemble the complete data upon receipt.

Chunked Transfer Encoding

Originally, there was no need to deal with this, but during troubleshooting, I inadvertently encountered it.

1 Obtaining the Size of Requests and Responses

When using Nginx, if you want to save the corresponding requests and responses to logs for traffic replay or security scanning, you may encounter situations with large bodies or responses, which are considered sensitive data and need to be stored securely. In Nginx’s Lua, if you save directly, uploading or downloading files or videos can lead to memory overflow.

In such scenarios, if it’s a virtual machine, it might be manageable, but if it’s a container, it can consume three times the memory. If your limit isn’t large enough, the worker process may be killed by OOM, affecting normal business requests. In the error log, you can see logs indicating that the worker process was killed, and using ‘ps’, you can observe that the process’s PID has changed multiple times.

So how can we alleviate this situation? The first reaction is to judge the size of the response based on the Content-Length in the HTTP request header. If it exceeds 1MB, truncate the response to prevent the Nginx worker process from being killed by OOM. However, it’s not that simple; it seems the problem was only slightly alleviated, not resolved.

Later, I discovered that many situations arise. If chunked transfer encoding is used, meaning data is transmitted in chunks, there is no Content-Length header. Chunked transfer means Transfer-Encoding: chunked, which makes it impossible to determine the size of the response. The format consists of the size of the chunk, followed by the data, and finally ending with a special zero-sized chunk.

Therefore, we can only use the size of the upstream_response variable in Nginx to determine the size for data truncation.

The purpose of chunked transfer size is to cut large files for transmission, especially when the size cannot be determined, such as with videos.

2 SSE

In current technology, streaming data is increasingly used, such as SSE. In Nginx, special configuration is required to use it, specifically setting the global variable proxy_buffering to off. By default, this is enabled in Nginx, which means large responses are opened and cached on Nginx’s disk, then sent to the client all at once.

The default setting is because Nginx and upstream are generally on the same network segment, allowing for faster data transmission, improving throughput, and saving backend resources. Since Tomcat, for example, cannot support too many concurrent connections, Nginx is used for buffering. When this is turned off, the client receives responses in a timely manner, reducing response time (RT), with both methods having their pros and cons.

To effectively support streaming data, it is advisable to turn off proxy_buffering, meaning not caching upstream data. This can consume disk IOPS, and if not managed carefully, it can cause fluctuations in RT. Additionally, this method can be memory-intensive. Turning it off can also save memory for Nginx, especially in environments with heavy usage.

3 Curl Download File Error

When using curl to download files, you may encounter the following error:

curl transfer closed with remaining to read

When you use a packet capture tool, it may seem normal, as data is transmitted and the connection is closed. However, the curl error message indicates that the connection was unexpectedly closed while reading data, especially when the size of the data is nearly equal to the size of the file, leading to an interruption.

At this point, you need to check the packet capture information to see if the chunk data sent the final zero-sized chunk, which is:

0
 (Hexadecimal encoding 30 0d 0a)

Check the code to ensure correctness, as this is mainly related to the format of chunked sending. It may also involve checking the last packet to ensure the size and data are correct. In such cases, it is common to forget to send the final terminator.

4 Others

When transferring large files, in addition to chunked transfer encoding, there are other types where the content length cannot be determined. One example is video seeking, which involves range requests, and there is also multipart data transfer. These are controlled via HTTP headers, and you can write a piece of code to capture and examine the corresponding header information.

Conclusion

There are various methods for transferring large files, such as enabling compression, segmenting transfers, and supporting resumable downloads, all achieved through different HTTP headers.

In the face of authority, all struggles are futile, just like chunked transfer, which has established corresponding rules. Once these rules are not followed, errors will occur directly.

In different scenarios, some parameters may need to change. For example, while many believe that enabling proxy_buffering is better, in reality, turning off buffering for streaming is more advantageous, especially in scenarios sensitive to response time.

Leave a Comment