Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

Introduction

In the previous article, “Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 2),” we introduced the concept, principles, development history, traffic decoding, and simple analysis methods of the HTTP protocol from a traffic analysis perspective. We also discussed two important metrics: request methods and status codes. Today, we bring you the third article in this short series, discussing important HTTP header fields that will greatly assist us in traffic analysis.We welcome everyone to discuss in the comments:What header fields should we pay attention to in application load balancing scenarios? Concept of Header Fields

HTTP header content provides the necessary information for clients and servers to process requests and responses, generally formatted as “key: value” (e.g., host: colasoft.com.cn). Each key mentioned in our title is a header field, and the value is the field’s value.

To perform effective HTTP traffic analysis, it is essential to analyze the details of HTTP requests and responses, which are stored in the HTTP headers. The position of the HTTP headers is shown in the diagram below (we have discussed HTTP packet decoding in the first article, so we will not elaborate further here):

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

As shown in the diagram, header fields are categorized into four types based on their actual use:

(1) General Header Fields

Header fields used in both request and response messages.

(2) Request Header Fields

Header fields used when sending request messages from the client to the server. They provide additional content about the request, client information, and response content-related priorities.

(3) Response Header Fields

Header fields used when returning response messages from the server to the client. They provide additional content about the response and may request the client to include extra content information.

(4) Entity Header Fields

Header fields used for the entity part of request and response messages (the header and body parts mentioned above). They provide information related to the resource content update time.

According to the HTTP/1.1 specification, 47 header fields are defined (many more fields have been added later based on different protocol scenarios). Below, we will explain the commonly used fields in daily traffic analysis in detail.

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

Request Header Fields

(1) Host Field

Before discussing the Host field, let’s introduce an important concept—URL, as shown in the diagram:

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

A URI (Uniform Resource Identifier) is a string used to identify a specific internet resource. It consists of two subsets: URL and URN.

  • URL (Uniform Resource Locator): A subset of URI, it is the standard address of a resource on the internet.
  • URN (Uniform Resource Name): Also a subset of URI, it aims to provide a persistent, location-independent way to identify resources and allows simple mapping of multiple namespaces to a single URN namespace.

URL and URN were originally considered twin brothers, but URN has largely been forgotten, leaving only URL.

Therefore, to access resources on a server via HTTP, a specific URL must be provided! The composition of a URL is as follows:

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

Returning to the HTTP protocol itself, the HTTP request divides the URL into two parts, where the request line includes the protocol type and the “resource path” part of the URL (see the above diagram). The server information’s host part is listed separately as a header field (see the host position in the above diagram).

The Host field can be a hostname (domain name) or an IP address + port (why not just list the IP address? Because a server can host multiple websites, and using the host or different ports represents these different websites).

Clarifying the Host field is crucial; for example, if the server is compromised, identifying which website was attacked and led to the server’s breach is a prerequisite for subsequent emergency response and vulnerability remediation.

(2) User-Agent Field

This represents the client agent, which is essentially the browser installed on the client.

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

In traffic analysis, this field can be used to collect internal network information. The User-Agent acts as a fingerprint, revealing the browser information and even system information of the HTTP request, such as:

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

With this information, it is possible to determine whether there are security risks due to outdated operating system or browser versions.

(3) Referer Field

This indicates the original source of the URI in the HTTP request, simply put, it shows which website the current page was redirected from. It provides significant help in understanding user access behavior.

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

This field is commonly used in traffic analysis for tracing origins.

(4) Accept Field

This indicates the media types that the user agent (browser) can handle. The subsequent value is in MIME (Multipurpose Internet Mail Extensions) standard format, which categorizes data into eight major types, each with subtypes, formatted as type/subtype. Common types in HTTP include text (text), image (image), audio/video (audio/video), application (application), etc., for example: application/json.

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

Sometimes you may see “*”, which is a wildcard representing any type; there may also be a weight value q, which indicates the expected priority, with a maximum value of 1, where values closer to 1 represent higher expectations;

(5) Accept-Charset, Accept-Encoding, Accept-Language Fields

Accept-Charset: Informs the user agent of the supported character sets and their relative priority.Accept-Encoding: Informs the server of the content encodings supported by the user agent and their priority order.

Accept-Language: Informs the server of the natural language set supported by the user agent and their priority order.

All of the above fields can specify multiple different types at once, with relative priority indicated by weight value q.

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

(6) If-Match Field

This field is slightly more complex, as it compares the resource ETag value (ETag is also a header field). The request will only be executed if the If-Match field matches the ETag value; otherwise, it will return status code 412 Precondition Failed (see the previous article “HTTP Request Methods and Status Codes”). Additionally, “*” can be used as the value for the If-Match field, in which case the server will ignore the ETag value and process the request as long as the resource exists.

In traffic analysis, this field can assist us in diagnosing HTTP business faults.

(7) Other Header Fields

Header Field Name

Field Description

Authorization

Web authentication information

Expect

Expecting specific behavior from the server

From

User’s email address

If-Modified-Since

Comparing resource update time

If-None-Match

Comparing entity tags (opposite of If-Match)

If-Range

Sending entity byte range request when the resource has not been updated

If-Unmodified-Since

Comparing resource update time (opposite of If-Modified-Since)

Max-Forwards

Maximum transmission hop count

Proxy-Authorization

Client authentication information required by the proxy server

Range

Entity byte range request

TE

Priority of transfer encoding

Response Header Fields

(1) Age Field

This field is calculated, and the specific algorithm can be referenced in RFC7234 (RFC2616 also has it, but RFC7234’s method is more scientific). It can be simply understood as the time from resource creation (the original server creating the resource, without considering the time stored by intermediate proxy servers) to the time the response is received. This includes the time for intermediate network transmission.

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

(2) ETag Field

ETag stands for Entity Tag, which is a web cache validation mechanism provided by the HTTP protocol, making caching more efficient and saving bandwidth. Here is a brief description of this caching mechanism:

  • In most scenarios, when a URL is requested, the web server returns the resource along with its corresponding ETag value, which is placed in the ETag field of the HTTP response header;

  • The client can decide whether to cache this resource and its ETag;

  • Later, if the client wants to request the same URL again, it will send a request containing the saved ETag and If-None-Match field;

  • After the client request, the server may compare the client’s ETag with the current version resource’s ETag. If the ETag values match, it means the resource has not changed, and the server will send back a very short response containing the HTTP “304 Not Modified” status. The 304 status code tells the client that its cached version is the latest and can be used directly.

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

(3) Location Field

The Location header field can guide the response recipient to a resource located at a different URI than the requested one. This field is often used in conjunction with 3xx: Redirection responses to provide the redirect URI. Almost all browsers will forcibly attempt to access the indicated redirect resource upon receiving a response containing the Location header field.

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

(4) Server Field

The Server field contains information about the HTTP server installation.

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

This information can be passively collected from the traffic regarding the opposing server or host.

(5) Other Response Fields

Header Field Name

Field Description

Accept-Ranges

Whether byte range requests are accepted

Proxy-Authenticate

Client authentication information required by the proxy server

Retry-After

Requirements for the timing of re-initiating requests

Vary

Management information for proxy server caching

WWW-Authenticate

Server authentication information for the client

General Header Fields

(1) Cache-Control Field

This is a cache control field, which is a very important part of front-end development, displayed in the header as Cache-Control: XXXX. Its value can take many forms. From a traffic analysis perspective, understanding the meaning of the field is sufficient; we will not delve into caching in depth. Below is a simple list of directives in table form:

i. Cache request directives:

Value of Cache-Control

Description

no-cache

Forces revalidation with the original server

no-store

Does not cache any content of the request or response

max-age = [seconds]

Maximum age value of the response

max-stale( = [seconds])

Accepts expired responses

min-fresh = [seconds]

Expects the response to remain valid within the specified time

no-transform

Proxies cannot change the media type

only-if-cached

Proxies cannot change the media type

cache-extension

New directive marker (token)

ii. Cache response directives:

Directive

Description

public

Response can be cached by any party

private

Response returned only to specific users

no-cache

Must confirm validity before caching

no-store

Does not cache any content of the request or response

no-transform

Proxies cannot change the media type

must-revalidate

Can be cached but must confirm with the origin server

proxy-revalidate

Requires intermediate cache servers to confirm the validity of cached responses

max-age=[seconds]

Maximum age value of the response

s-maxage=[seconds]

Maximum age value of public cache server responses

cache-extension

New directive marker (token)

Note: It is important to note that no-cache does not mean not caching; it means not caching expired resources. Caching will confirm the validity with the origin server before processing the resource. No-store is the true directive for not caching.

(2) Connection Field

The Connection header field serves two purposes:

  1. Controls header fields that are not forwarded to proxies
  2. Manages persistent connections

In HTTP/1.1, the default connection is persistent. Therefore, the client will continuously send requests over the persistent connection. When the server explicitly wants to close the connection, it specifies the value of the Connection header field as Close.

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

(3) Date FieldThe Date header field indicates the date and time when the HTTP message was created.

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

(4) Pragma FieldPragma is a legacy field from versions prior to HTTP/1.1, defined only for backward compatibility with HTTP/1.0. The only defined form is: Pragma: no-cache. This header field belongs to general header fields but is only used in requests sent by the client. The client requests that all intermediate servers do not return cached resources.Why not just use the previously mentioned Cache-Control: no-cache? This is mainly to avoid situations where intermediate servers do not use HTTP/1.1, so both fields are often present in requests: Cache-Control: no-cache and Pragma: no-cache.Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)(5) Upgrade FieldThe Upgrade header field is used to detect whether a higher version of the HTTP protocol or other protocols can be used for communication, and its parameter value can specify a completely different communication protocol. For example, WeChat captures packets as HTTP packets but specifies its own encrypted application layer protocol mmtls:Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)When using the Upgrade header field, it is often additionally specified as Connection: Upgrade.(6) Other General Header Fields

Header Field Name Description
Trailer Overview of headers at the end of the message
Transfer-Encoding Specifies the transfer encoding method of the message body
Via Information related to the proxy server
Warning Error notifications

Previously discussed request, response, and general header fields are relatively easy to understand. In addition, what about entity header fields? Consider that in the previous article, we discussed decoding, where requests have request entities, and responses also have corresponding content (response entities). Therefore, entity header fields are specifically used to describe the content of these request and response body parts (entities). Entity Header Fields (1) Content-Encoding FieldThe Content-Encoding field informs the client of the content encoding method used by the server for the entity’s body part. Content encoding refers to the compression method applied without losing entity information.Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)(2) Content-Length FieldThe Content-Length field indicates the size of the entity body part (in bytes). When content encoding is applied to the entity body for transmission, the Content-Length header field cannot be used.Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)(3) Content-Type FieldThe Content-Type field describes the media type of the object within the entity body. Similar to the Accept header field, the field value is assigned in type/subtype format (as seen in the previous request header field section).Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)(4) Last-Modified FieldThe Last-Modified header field indicates the time when the resource was last modified. Generally, this value is the time when the resource specified by the Request-URI was modified. However, when using CGI scripts for dynamic data processing, this value may change to the time when the data was last modified.Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)(5) Other Entity Header Fields

Header Field Name Description
Allow Supported HTTP methods for the resource
Content-Language Natural language of the entity body
Content-Location URI that substitutes the corresponding resource
Content-MD5 Message digest of the entity body
Content-Range Position range of the entity body
Expires Date and time when the entity body expires

Extended Header Fields HTTP header fields are similar to the status codes discussed in the previous article; they are extensible. Therefore, when conducting traffic analysis, we often encounter many header fields that have not been previously discussed. Below are some extended fields that may be encountered during analysis:(1) Header Fields Starting with XWhen analyzing, you often see fields that start with X. These fields are either deprecated or are extensions added by certain devices. As shown in the diagram:Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)RFC4229 and RFC6648 mention that many fields are “deprecated.” However, they still provide a lot of information, and the RFC does not state that fields starting with X are disabled; they are just not recommended. Sometimes, these X-prefixed fields should not be deprecated and can provide significant convenience. Therefore, the decision to use X-prefixed fields should be based on the specific scenario. We do not need to memorize their functions (as long as the server and client can recognize and communicate, that is sufficient). Below, we will introduce the most commonly used X-prefixed fields: X-Forwarded-For, X-Real-IP.(2) X-Forwarded-For, X-Real-IP, Remote-addr Fields

  • X-Forwarded-For: Abbreviated as XFF header, it represents the real IP of the client, i.e., the request end of HTTP. This field is added only when passing through an HTTP proxy or load balancing server (generally requires manual configuration; devices do not automatically add it). This is crucial for tracing origins; without this field, the security device alert you see may indicate that an internal server is being attacked by the load device! Generally, devices that perform SNAT (Source Network Address Translation) write the real IP before the conversion into the XFF field, allowing the server to know the true access IP.

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

  • X-Real-IP: Generally used when nginx acts as a reverse proxy, using the $remote_addr variable to obtain the user’s real IP. However, X-Real-IP has a problem: if accessed through a CDN, the web server will obtain the CDN’s IP instead of the real user’s IP. Therefore, when analyzing origins, XFF is prioritized over X-Real-IP for reliability.

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

  • Remote-addr: Similar to X-Real-IP, but X-Real-IP can record the IP of each proxy hop, while Remote-addr records the IP of the farthest client.

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)Thus, when conducting traffic analysis (often for tracing and evidence collection), we can prioritize information as follows: X-Forwarded-For > X-Real-IP > Remote-addr.(3) HTTP Security Fields

  • Security Fields in Request Headers

Request headers sent by the client to the server contain detailed information about the request. Here are some key security protection request header fields:

Referer: This field indicates the source page of the current request (already introduced earlier, so we will not elaborate further).

User-Agent: This field identifies the type and version of the client’s browser (already introduced earlier, so we will not elaborate further).

Origin: This field is used for cross-origin requests, indicating the source of the request. By validating the Origin field, the server can prevent Cross-Site Request Forgery (CSRF) attacks.Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

  • Security Fields in Response Headers

Here are some key security protection response header fields:Content-Security-Policy (CSP): This field defines a set of policies that restrict the types and sources of resources that can be loaded on the page. By configuring CSP, XSS attacks and data injection can be effectively prevented.Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)Strict-Transport-Security (HSTS): This field forces the client to use HTTPS in future requests, preventing man-in-the-middle attacks. Once HSTS is enabled, even if a user attempts to access the site via HTTP, the browser will automatically redirect to HTTPS.Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)X-Frame-Options: This field controls whether the page can be displayed in an iframe. By setting X-Frame-Options to DENY or SAMEORIGIN, clickjacking attacks can be prevented.Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)X-XSS-Protection: This field enables the browser’s XSS filter, which can help detect and prevent XSS attacks. Although modern browsers have this feature enabled by default, explicitly setting this field can enhance security.Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)By properly configuring these security protection header fields, the security of HTTP communication can be significantly improved, protecting user data from attackers’ threats. Other security-related HTTP header fields are listed below:

Header Field Name Description
X-Content-Type-Options Used to prevent MIME type sniffing attacks
X-Forwarded-Proto Confirms the protocol type of the original request
Content-Security-Policy-Report-Only Used for testing and debugging CSP policies
Referrer-Policy Controls the value of the Referer field, including:no-referrer, no-referrer-when-downgrade, same-origin, strict-origin-when-cross-origin
X-Permitted-Cross-Origin-Request-Headers Used to control custom header fields allowed in cross-origin requests.
X-Content-Security-Policy A variant of CSP, mainly for compatibility with older browsers.
Feature-Policy Used to control various features and APIs in the browser.
Expect-CT Certificate Transparency, abbreviated as CT. It is a technology that ensures the authenticity of SSL/TLS certificates by recording certificate information in public logs, allowing anyone to verify the validity of the certificate.
Report-To Used to collect and report security events. Often used in conjunction with CSP and other security header fields to build a comprehensive security reporting mechanism.

Note: When filtering fields, please note that the “Referer” field is a typo; the letter “r” was not doubled, yet it became a factual standard in the RFC. However, the other instances of the word “Referer” in “Refferer-Policy” have been corrected. Therefore, when searching for fields (filtering), be mindful of spelling issues.(4) Set-Cookie FieldThe Set-Cookie field is quite common, so it is listed separately; it is also an HTTP security header field. Set-Cookie is used by the server to set client cookie information to manage client state. The HttpOnly attribute is an extension feature of cookies that prevents JavaScript from accessing the cookie. Its main purpose is to prevent XSS from stealing cookie information.Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3) Traffic Analysis Examples Example 1. When studying the communication of the WebShell IceScorpion, because IceScorpion uses encrypted communication, the content of the communication is unknown. During the communication phase, there are no strong features for identification; we can combine multiple weak features of the various header fields contained in the IceScorpion HTTP traffic to form detection rules. When the data stream (traditional security devices can only perform single packet detection) matches the combined rules, an alert can be triggered. (This example uses IceScorpion 3 to illustrate a traffic detection approach).Analysis: Through research on the IceScorpion 3 WebShell management tool and the captured traffic, it was found that HTTP has the following traffic characteristics:

(1) Content-Type: application/octet-stream;

(2) User-Agent, IceScorpion 3 has 15 built-in types, but they are all relatively outdated and can be used as weak features for judgment;

(3) Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2;

(4) Cache-Control: no-cache;

(5) Pragma: no-cacheWhen all five of the above header fields’ weak features are satisfied, and subsequent encrypted traffic communication occurs, an alert will be triggered. (More traffic features can be added, such as upload actions, accessing script files, etc., left for the reader to explore).Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)Example 2. A user was deceived by a phishing website due to a browser upgrade and installed a trojan, resulting in losses. How to determine: whether the user’s browser version is indeed too low? How did they access the phishing website?Analysis:(1) By filtering HTTP traffic, the User-Agent field can be used to determine the user’s browser version;Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)It can be seen that it is indeed an outdated browser, so the user was misled by the phishing website’s prompt to upgrade the browser.(2) The Referer field can be checked to determine whether the user was redirected to the phishing page from another page;By examining the Referer field, it can be seen that the user was redirected from the link www.stmarybahrain.com/new/ to the phishing webpage. Conclusion This short series of tutorials on learning the HTTP protocol from a traffic perspective has come to an end. We hope that through reading these three articles, you have gained a deeper understanding of HTTP. In the future, we will bring you explanations of other application layer protocols to help you enhance your traffic analysis skills!What new thoughts do you have after reading this article? Feel free to leave a message in the comments and discuss with us~What header fields should we pay attention to in application load balancing scenarios?Download the free and easy-to-use traffic analysis toolClick “Read the original text”More network analysis techniques, tips, and valuable content sharingClick “Book” for the public classCommon Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)– End –Historical Reads

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

Common Application Layer Protocols: Analyzing HTTP from a Traffic Perspective (Part 3)

Click below to get analysis tools for free⇙

Leave a Comment