Introduction
In the previous article, “Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 2),” we introduced the concept, principles, development history, traffic decoding, and simple analysis methods of the HTTP protocol from a traffic analysis perspective. We also discussed two important metrics: request methods and status codes. Today, we bring you the third article in this short series, discussing important HTTP header fields that will greatly assist us in traffic analysis.We welcome everyone to discuss in the comments:What header fields should we pay attention to in application load balancing scenarios? Concept of Header Fields
HTTP header content provides the necessary information for clients and servers to process requests and responses, generally formatted as “key: value” (e.g., host: colasoft.com.cn). Each key mentioned in our title is a header field, and the value is the field’s value.
To perform effective HTTP traffic analysis, it is essential to analyze the details of HTTP requests and responses, which are stored in the HTTP headers. The position of the HTTP headers is shown in the diagram below (we have discussed HTTP packet decoding in the first article, so we will not elaborate further here):
As shown in the diagram, header fields are categorized into four types based on their actual use:
(1) General Header Fields
Header fields used in both request and response messages.
(2) Request Header Fields
Header fields used when sending request messages from the client to the server. They provide additional content about the request, client information, and response content-related priorities.
(3) Response Header Fields
Header fields used when returning response messages from the server to the client. They provide additional content about the response and may request the client to include extra content information.
(4) Entity Header Fields
Header fields used for the entity part of request and response messages (the header and body parts mentioned above). They provide information related to the resource content update time.
According to the HTTP/1.1 specification, 47 header fields are defined (many more fields have been added later based on different protocol scenarios). Below, we will explain the commonly used fields in daily traffic analysis in detail.
Request Header Fields
(1) Host Field
Before discussing the Host field, let’s introduce an important concept—URL, as shown in the diagram:
A URI (Uniform Resource Identifier) is a string used to identify a specific internet resource. It consists of two subsets: URL and URN.
- URL (Uniform Resource Locator): A subset of URI, it is the standard address of a resource on the internet.
- URN (Uniform Resource Name): Also a subset of URI, it aims to provide a persistent, location-independent way to identify resources and allows simple mapping of multiple namespaces to a single URN namespace.
URL and URN were originally considered twin brothers, but URN has largely been forgotten, leaving only URL.
Therefore, to access resources on a server via HTTP, a specific URL must be provided! The composition of a URL is as follows:
Returning to the HTTP protocol itself, the HTTP request divides the URL into two parts, where the request line includes the protocol type and the “resource path” part of the URL (see the above diagram). The server information’s host part is listed separately as a header field (see the host position in the above diagram).
The Host field can be a hostname (domain name) or an IP address + port (why not just list the IP address? Because a server can host multiple websites, and using the host or different ports represents these different websites).
Clarifying the Host field is crucial; for example, if the server is compromised, identifying which website was attacked and led to the server’s breach is a prerequisite for subsequent emergency response and vulnerability remediation.
(2) User-Agent Field
This represents the client agent, which is essentially the browser installed on the client.
In traffic analysis, this field can be used to collect internal network information. The User-Agent acts as a fingerprint, revealing the browser information and even system information of the HTTP request, such as:
With this information, it is possible to determine whether there are security risks due to outdated operating system or browser versions.
(3) Referer Field
This indicates the original source of the URI in the HTTP request, simply put, it shows which website the current page was redirected from. It provides significant help in understanding user access behavior.
This field is commonly used in traffic analysis for tracing origins.
(4) Accept Field
This indicates the media types that the user agent (browser) can handle. The subsequent value is in MIME (Multipurpose Internet Mail Extensions) standard format, which categorizes data into eight major types, each with subtypes, formatted as type/subtype. Common types in HTTP include text (text), image (image), audio/video (audio/video), application (application), etc., for example: application/json.
Sometimes you may see “*”, which is a wildcard representing any type; there may also be a weight value q, which indicates the expected priority, with a maximum value of 1, where values closer to 1 represent higher expectations;
(5) Accept-Charset, Accept-Encoding, Accept-Language Fields
Accept-Charset: Informs the user agent of the supported character sets and their relative priority.Accept-Encoding: Informs the server of the content encodings supported by the user agent and their priority order.
Accept-Language: Informs the server of the natural language set supported by the user agent and their priority order.
All of the above fields can specify multiple different types at once, with relative priority indicated by weight value q.
(6) If-Match Field
This field is slightly more complex, as it compares the resource ETag value (ETag is also a header field). The request will only be executed if the If-Match field matches the ETag value; otherwise, it will return status code 412 Precondition Failed (see the previous article “HTTP Request Methods and Status Codes”). Additionally, “*” can be used as the value for the If-Match field, in which case the server will ignore the ETag value and process the request as long as the resource exists.
In traffic analysis, this field can assist us in diagnosing HTTP business faults.
(7) Other Header Fields
Header Field Name |
Field Description |
Authorization |
Web authentication information |
Expect |
Expecting specific behavior from the server |
From |
User’s email address |
If-Modified-Since |
Comparing resource update time |
If-None-Match |
Comparing entity tags (opposite of If-Match) |
If-Range |
Sending entity byte range request when the resource has not been updated |
If-Unmodified-Since |
Comparing resource update time (opposite of If-Modified-Since) |
Max-Forwards |
Maximum transmission hop count |
Proxy-Authorization |
Client authentication information required by the proxy server |
Range |
Entity byte range request |
TE |
Priority of transfer encoding |
Response Header Fields
(1) Age Field
This field is calculated, and the specific algorithm can be referenced in RFC7234 (RFC2616 also has it, but RFC7234’s method is more scientific). It can be simply understood as the time from resource creation (the original server creating the resource, without considering the time stored by intermediate proxy servers) to the time the response is received. This includes the time for intermediate network transmission.
(2) ETag Field
ETag stands for Entity Tag, which is a web cache validation mechanism provided by the HTTP protocol, making caching more efficient and saving bandwidth. Here is a brief description of this caching mechanism:
-
In most scenarios, when a URL is requested, the web server returns the resource along with its corresponding ETag value, which is placed in the ETag field of the HTTP response header;
-
The client can decide whether to cache this resource and its ETag;
-
Later, if the client wants to request the same URL again, it will send a request containing the saved ETag and If-None-Match field;
- After the client request, the server may compare the client’s ETag with the current version resource’s ETag. If the ETag values match, it means the resource has not changed, and the server will send back a very short response containing the HTTP “304 Not Modified” status. The 304 status code tells the client that its cached version is the latest and can be used directly.
(3) Location Field
The Location header field can guide the response recipient to a resource located at a different URI than the requested one. This field is often used in conjunction with 3xx: Redirection responses to provide the redirect URI. Almost all browsers will forcibly attempt to access the indicated redirect resource upon receiving a response containing the Location header field.
(4) Server Field
The Server field contains information about the HTTP server installation.
This information can be passively collected from the traffic regarding the opposing server or host.
(5) Other Response Fields
Header Field Name |
Field Description |
Accept-Ranges |
Whether byte range requests are accepted |
Proxy-Authenticate |
Client authentication information required by the proxy server |
Retry-After |
Requirements for the timing of re-initiating requests |
Vary |
Management information for proxy server caching |
WWW-Authenticate |
Server authentication information for the client |
General Header Fields
(1) Cache-Control Field
This is a cache control field, which is a very important part of front-end development, displayed in the header as Cache-Control: XXXX. Its value can take many forms. From a traffic analysis perspective, understanding the meaning of the field is sufficient; we will not delve into caching in depth. Below is a simple list of directives in table form:
i. Cache request directives:
Value of Cache-Control |
Description |
no-cache |
Forces revalidation with the original server |
no-store |
Does not cache any content of the request or response |
max-age = [seconds] |
Maximum age value of the response |
max-stale( = [seconds]) |
Accepts expired responses |
min-fresh = [seconds] |
Expects the response to remain valid within the specified time |
no-transform |
Proxies cannot change the media type |
only-if-cached |
Proxies cannot change the media type |
cache-extension |
New directive marker (token) |
ii. Cache response directives:
Directive |
Description |
public |
Response can be cached by any party |
private |
Response returned only to specific users |
no-cache |
Must confirm validity before caching |
no-store |
Does not cache any content of the request or response |
no-transform |
Proxies cannot change the media type |
must-revalidate |
Can be cached but must confirm with the origin server |
proxy-revalidate |
Requires intermediate cache servers to confirm the validity of cached responses |
max-age=[seconds] |
Maximum age value of the response |
s-maxage=[seconds] |
Maximum age value of public cache server responses |
cache-extension |
New directive marker (token) |
Note: It is important to note that no-cache does not mean not caching; it means not caching expired resources. Caching will confirm the validity with the origin server before processing the resource. No-store is the true directive for not caching.
(2) Connection Field
The Connection header field serves two purposes:
- Controls header fields that are not forwarded to proxies
-
Manages persistent connections
In HTTP/1.1, the default connection is persistent. Therefore, the client will continuously send requests over the persistent connection. When the server explicitly wants to close the connection, it specifies the value of the Connection header field as Close.
(3) Date FieldThe Date header field indicates the date and time when the HTTP message was created.
(4) Pragma FieldPragma is a legacy field from versions prior to HTTP/1.1, defined only for backward compatibility with HTTP/1.0. The only defined form is: Pragma: no-cache. This header field belongs to general header fields but is only used in requests sent by the client. The client requests that all intermediate servers do not return cached resources.Why not just use the previously mentioned Cache-Control: no-cache? This is mainly to avoid situations where intermediate servers do not use HTTP/1.1, so both fields are often present in requests: Cache-Control: no-cache and Pragma: no-cache.(5) Upgrade FieldThe Upgrade header field is used to detect whether a higher version of the HTTP protocol or other protocols can be used for communication, and its parameter value can specify a completely different communication protocol. For example, WeChat captures packets as HTTP packets but specifies its own encrypted application layer protocol mmtls:
When using the Upgrade header field, it is often additionally specified as Connection: Upgrade.(6) Other General Header Fields
Header Field Name | Description |
Trailer | Overview of headers at the end of the message |
Transfer-Encoding | Specifies the transfer encoding method of the message body |
Via | Information related to the proxy server |
Warning | Error notifications |
Previously discussed request, response, and general header fields are relatively easy to understand. In addition, what about entity header fields? Consider that in the previous article, we discussed decoding, where requests have request entities, and responses also have corresponding content (response entities). Therefore, entity header fields are specifically used to describe the content of these request and response body parts (entities). Entity Header Fields (1) Content-Encoding FieldThe Content-Encoding field informs the client of the content encoding method used by the server for the entity’s body part. Content encoding refers to the compression method applied without losing entity information.(2) Content-Length FieldThe Content-Length field indicates the size of the entity body part (in bytes). When content encoding is applied to the entity body for transmission, the Content-Length header field cannot be used.
(3) Content-Type FieldThe Content-Type field describes the media type of the object within the entity body. Similar to the Accept header field, the field value is assigned in type/subtype format (as seen in the previous request header field section).
(4) Last-Modified FieldThe Last-Modified header field indicates the time when the resource was last modified. Generally, this value is the time when the resource specified by the Request-URI was modified. However, when using CGI scripts for dynamic data processing, this value may change to the time when the data was last modified.
(5) Other Entity Header Fields
Header Field Name | Description |
Allow | Supported HTTP methods for the resource |
Content-Language | Natural language of the entity body |
Content-Location | URI that substitutes the corresponding resource |
Content-MD5 | Message digest of the entity body |
Content-Range | Position range of the entity body |
Expires | Date and time when the entity body expires |
Extended Header Fields HTTP header fields are similar to the status codes discussed in the previous article; they are extensible. Therefore, when conducting traffic analysis, we often encounter many header fields that have not been previously discussed. Below are some extended fields that may be encountered during analysis:(1) Header Fields Starting with XWhen analyzing, you often see fields that start with X. These fields are either deprecated or are extensions added by certain devices. As shown in the diagram:RFC4229 and RFC6648 mention that many fields are “deprecated.” However, they still provide a lot of information, and the RFC does not state that fields starting with X are disabled; they are just not recommended. Sometimes, these X-prefixed fields should not be deprecated and can provide significant convenience. Therefore, the decision to use X-prefixed fields should be based on the specific scenario. We do not need to memorize their functions (as long as the server and client can recognize and communicate, that is sufficient). Below, we will introduce the most commonly used X-prefixed fields: X-Forwarded-For, X-Real-IP.(2) X-Forwarded-For, X-Real-IP, Remote-addr Fields
-
X-Forwarded-For: Abbreviated as XFF header, it represents the real IP of the client, i.e., the request end of HTTP. This field is added only when passing through an HTTP proxy or load balancing server (generally requires manual configuration; devices do not automatically add it). This is crucial for tracing origins; without this field, the security device alert you see may indicate that an internal server is being attacked by the load device! Generally, devices that perform SNAT (Source Network Address Translation) write the real IP before the conversion into the XFF field, allowing the server to know the true access IP.
-
X-Real-IP: Generally used when nginx acts as a reverse proxy, using the $remote_addr variable to obtain the user’s real IP. However, X-Real-IP has a problem: if accessed through a CDN, the web server will obtain the CDN’s IP instead of the real user’s IP. Therefore, when analyzing origins, XFF is prioritized over X-Real-IP for reliability.
-
Remote-addr: Similar to X-Real-IP, but X-Real-IP can record the IP of each proxy hop, while Remote-addr records the IP of the farthest client.
Thus, when conducting traffic analysis (often for tracing and evidence collection), we can prioritize information as follows: X-Forwarded-For > X-Real-IP > Remote-addr.(3) HTTP Security Fields
-
Security Fields in Request Headers
Request headers sent by the client to the server contain detailed information about the request. Here are some key security protection request header fields:
Referer: This field indicates the source page of the current request (already introduced earlier, so we will not elaborate further).
User-Agent: This field identifies the type and version of the client’s browser (already introduced earlier, so we will not elaborate further).
Origin: This field is used for cross-origin requests, indicating the source of the request. By validating the Origin field, the server can prevent Cross-Site Request Forgery (CSRF) attacks.
-
Security Fields in Response Headers
Here are some key security protection response header fields:Content-Security-Policy (CSP): This field defines a set of policies that restrict the types and sources of resources that can be loaded on the page. By configuring CSP, XSS attacks and data injection can be effectively prevented.Strict-Transport-Security (HSTS): This field forces the client to use HTTPS in future requests, preventing man-in-the-middle attacks. Once HSTS is enabled, even if a user attempts to access the site via HTTP, the browser will automatically redirect to HTTPS.
X-Frame-Options: This field controls whether the page can be displayed in an iframe. By setting X-Frame-Options to DENY or SAMEORIGIN, clickjacking attacks can be prevented.
X-XSS-Protection: This field enables the browser’s XSS filter, which can help detect and prevent XSS attacks. Although modern browsers have this feature enabled by default, explicitly setting this field can enhance security.
By properly configuring these security protection header fields, the security of HTTP communication can be significantly improved, protecting user data from attackers’ threats. Other security-related HTTP header fields are listed below:
Header Field Name | Description |
X-Content-Type-Options | Used to prevent MIME type sniffing attacks |
X-Forwarded-Proto | Confirms the protocol type of the original request |
Content-Security-Policy-Report-Only | Used for testing and debugging CSP policies |
Referrer-Policy | Controls the value of the Referer field, including:no-referrer, no-referrer-when-downgrade, same-origin, strict-origin-when-cross-origin |
X-Permitted-Cross-Origin-Request-Headers | Used to control custom header fields allowed in cross-origin requests. |
X-Content-Security-Policy | A variant of CSP, mainly for compatibility with older browsers. |
Feature-Policy | Used to control various features and APIs in the browser. |
Expect-CT | Certificate Transparency, abbreviated as CT. It is a technology that ensures the authenticity of SSL/TLS certificates by recording certificate information in public logs, allowing anyone to verify the validity of the certificate. |
Report-To | Used to collect and report security events. Often used in conjunction with CSP and other security header fields to build a comprehensive security reporting mechanism. |
Note: When filtering fields, please note that the “Referer” field is a typo; the letter “r” was not doubled, yet it became a factual standard in the RFC. However, the other instances of the word “Referer” in “Refferer-Policy” have been corrected. Therefore, when searching for fields (filtering), be mindful of spelling issues.(4) Set-Cookie FieldThe Set-Cookie field is quite common, so it is listed separately; it is also an HTTP security header field. Set-Cookie is used by the server to set client cookie information to manage client state. The HttpOnly attribute is an extension feature of cookies that prevents JavaScript from accessing the cookie. Its main purpose is to prevent XSS from stealing cookie information. Traffic Analysis Examples Example 1. When studying the communication of the WebShell IceScorpion, because IceScorpion uses encrypted communication, the content of the communication is unknown. During the communication phase, there are no strong features for identification; we can combine multiple weak features of the various header fields contained in the IceScorpion HTTP traffic to form detection rules. When the data stream (traditional security devices can only perform single packet detection) matches the combined rules, an alert can be triggered. (This example uses IceScorpion 3 to illustrate a traffic detection approach).Analysis: Through research on the IceScorpion 3 WebShell management tool and the captured traffic, it was found that HTTP has the following traffic characteristics:
(1) Content-Type: application/octet-stream;
(2) User-Agent, IceScorpion 3 has 15 built-in types, but they are all relatively outdated and can be used as weak features for judgment;
(3) Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2;
(4) Cache-Control: no-cache;
(5) Pragma: no-cacheWhen all five of the above header fields’ weak features are satisfied, and subsequent encrypted traffic communication occurs, an alert will be triggered. (More traffic features can be added, such as upload actions, accessing script files, etc., left for the reader to explore).Example 2. A user was deceived by a phishing website due to a browser upgrade and installed a trojan, resulting in losses. How to determine: whether the user’s browser version is indeed too low? How did they access the phishing website?Analysis:(1) By filtering HTTP traffic, the User-Agent field can be used to determine the user’s browser version;
It can be seen that it is indeed an outdated browser, so the user was misled by the phishing website’s prompt to upgrade the browser.(2) The Referer field can be checked to determine whether the user was redirected to the phishing page from another page;By examining the Referer field, it can be seen that the user was redirected from the link www.stmarybahrain.com/new/ to the phishing webpage. Conclusion This short series of tutorials on learning the HTTP protocol from a traffic perspective has come to an end. We hope that through reading these three articles, you have gained a deeper understanding of HTTP. In the future, we will bring you explanations of other application layer protocols to help you enhance your traffic analysis skills!What new thoughts do you have after reading this article? Feel free to leave a message in the comments and discuss with us~What header fields should we pay attention to in application load balancing scenarios?Download the free and easy-to-use traffic analysis toolClick “Read the original text”More network analysis techniques, tips, and valuable content sharingClick “Book” for the public class
– End –Historical Reads
Click below to get analysis tools for free⇙