Detailed Explanation of HTTP Protocol

HTTP is an object-oriented protocol belonging to the application layer. Due to its simplicity and speed, it is suitable for distributed hypermedia information systems. It was proposed in 1990 and has undergone continuous improvement and expansion over the years. Currently, HTTP/1.0 version 6 is in use on the WWW, while the standardization work for HTTP/1.1 is still ongoing, and recommendations for HTTP-NG (Next Generation of HTTP) have been proposed. The main features of the HTTP protocol can be summarized as follows: 1. Support for client/server model. 2. Simple and fast: When a client requests a service from the server, it only needs to send the request method and path. The commonly used request methods are GET, HEAD, and POST. Each method specifies a different type of connection between the client and server. The simplicity of the HTTP protocol results in a small program size for HTTP servers, thus enabling fast communication. 3. Flexible: HTTP allows the transmission of any type of data object. The type of data being transmitted is marked by Content-Type. 4. Connectionless: Connectionless means that each connection handles only one request. After the server processes the client’s request and receives the client’s response, the connection is disconnected. This method can save transmission time. 5. Stateless: The HTTP protocol is stateless. Stateless means that the protocol has no memory capacity for transaction processing. The lack of state means that if subsequent processing requires previous information, it must be retransmitted, which may increase the amount of data transmitted with each connection. On the other hand, when the server does not need prior information, its response is faster.

1. Detailed Explanation of HTTP Protocol: URL Section

HTTP (Hypertext Transfer Protocol) is a stateless, application-layer protocol based on a request-response model, commonly using TCP connection. The HTTP/1.1 version provides a mechanism for persistent connections, and the majority of web development is built on web applications based on the HTTP protocol.

The format of an HTTP URL (URL is a special type of URI that contains enough information to locate a resource) is as follows:http://host[“:”port][abs_path] HTTP indicates that the resource is to be located using the HTTP protocol; host indicates a valid Internet hostname or IP address; port specifies a port number, and if empty, the default port 80 is used; abs_path specifies the URI of the requested resource; if the URL does not provide abs_path, it must be given as “/” when used as a request URI, which is typically handled automatically by the browser. For example: 1. Entering:www.guet.edu.cn results in the browser automatically converting it to:http://www.guet.edu.cn/ 2. http:192.168.0.116:8080/index.jsp

2. Detailed Explanation of HTTP Protocol: Request Section

An HTTP request consists of three parts: the request line, message headers, and request body.

1. The request line begins with a method symbol, separated by spaces, followed by the requested URI and the version of the protocol, formatted as follows: Method Request-URI HTTP-Version CRLF Where Method indicates the request method; Request-URI is a Uniform Resource Identifier; HTTP-Version indicates the version of the HTTP protocol requested; CRLF indicates carriage return and line feed (except for the ending CRLF, individual CR or LF characters are not allowed).

The request methods (all methods are uppercase) include several, with explanations as follows: GET Requests to retrieve the resource identified by Request-URI; POST Appends new data to the resource identified by Request-URI; HEAD Requests to retrieve the response message headers for the resource identified by Request-URI; PUT Requests the server to store a resource and uses Request-URI as its identifier; DELETE Requests the server to delete the resource identified by Request-URI; TRACE Requests the server to send back the received request information, mainly used for testing or diagnostics; CONNECT Reserved for future use; OPTIONS Requests to query the server’s performance or inquire about options and requirements related to the resource. Application examples: GET method: When accessing a webpage by entering a URL in the browser’s address bar, the browser uses the GET method to request the resource from the server, e.g., GET /form.html HTTP/1.1 (CRLF)

POST method requires the requested server to accept the data attached to the request, commonly used for submitting forms. e.g.: POST /reg.jsp HTTP/ (CRLF)Accept:image/gif,image/x-xbit,… (CRLF)…HOST:www.guet.edu.cn (CRLF)Content-Length:22 (CRLF)Connection:Keep-Alive (CRLF)Cache-Control:no-cache (CRLF)(CRLF) // This CRLF indicates that the message header has ended, and before this, it is the message header user=jeffrey&pwd=1234 // Below this line is the submitted data

HEAD method is almost the same as the GET method, for HEAD requests, the response part contains the same information in the HTTP header as that obtained through a GET request. This method is often used to test the validity of hyperlinks, whether they are accessible, and whether they have been updated recently. 2. Request headers will be described later 3. Request body (omitted)

3. Detailed Explanation of HTTP Protocol: Response Section

After receiving and interpreting the request message, the server returns an HTTP response message.

The HTTP response also consists of three parts: the status line, message headers, and response body. 1. The format of the status line is as follows: HTTP-Version Status-Code Reason-Phrase CRLF where HTTP-Version indicates the version of the HTTP protocol of the server; Status-Code indicates the response status code returned by the server; Reason-Phrase indicates the textual description of the status code. The status code consists of three digits, the first digit defines the category of the response, with five possible values: 1xx: Informational — indicates that the request has been received and is being processed; 2xx: Success — indicates that the request has been successfully received, understood, and accepted; 3xx: Redirection — further action must be taken to complete the request; 4xx: Client Error — indicates that there is a syntax error in the request or that the request cannot be fulfilled; 5xx: Server Error — indicates that the server failed to fulfill a valid request. Common status codes, status descriptions, and explanations: 200 OK // Client request successful; 400 Bad Request // Client request has syntax errors and cannot be understood by the server; 401 Unauthorized // Request not authorized, this status code must be used with the WWW-Authenticate header field; 403 Forbidden // Server received the request but refused to provide the service; 404 Not Found // Requested resource does not exist, e.g., entered an incorrect URL; 500 Internal Server Error // The server encountered an unexpected error; 503 Server Unavailable // The server cannot currently handle the client’s request, it may recover after some time. e.g.: HTTP/1.1 200 OK (CRLF)

2. Response headers will be described later

3. The response body is the content of the resource returned by the server

4. Detailed Explanation of HTTP Protocol: Message Header Section

HTTP messages consist of requests from the client to the server and responses from the server to the client. Both request messages and response messages consist of a start line (for request messages, the start line is the request line; for response messages, the start line is the status line), message headers (optional), an empty line (a line with only CRLF), and a message body (optional).

HTTP message headers include general headers, request headers, response headers, and entity headers. Each header field consists of a name + “:” + space + value, and the names of message header fields are case-insensitive.

1. General headers: In the general headers, a few header fields are used for all request and response messages but are not used for the transmitted entity; they are only used for the transmitted message. e.g.: Cache-Control Used to specify caching directives, caching directives are one-way (the caching directives appearing in the response may not appear in the request) and independent (the caching directives of one message do not affect the caching mechanism of another message). A similar header field used in HTTP/1.0 is Pragma. Caching directives for requests include: no-cache (used to indicate that the request or response message cannot be cached), no-store, max-age, max-stale, min-fresh, only-if-cached; caching directives for responses include: public, private, no-cache, no-store, no-transform, must-revalidate, proxy-revalidate, max-age, s-maxage. e.g.: To instruct the IE browser (client) not to cache the page, the server-side JSP program can write as follows: response.setHeader(“Cache-Control”,”no-cache”);//response.setHeader(“Pragma”,”no-cache”); The effect is equivalent to the above code, usually both are used together. This line of code will set the general header field in the sent response message: Cache-Control:no-cache

Date general header field indicates the date and time the message was generated

Connection general header field allows specifying options for the connection. For example, specify that the connection is persistent or specify the “close” option to notify the server to close the connection after the response is completed.

2. Request headers: Request headers allow the client to pass additional information about the request and information about itself to the server. Common request headers include Accept, Accept request header field is used to specify what types of information the client accepts. e.g.: Accept:image/gif, indicates that the client wishes to accept resources in GIF image format; Accept:text/html, indicates that the client wishes to accept HTML text. Accept-Charset, Accept-Charset request header field is used to specify the character set accepted by the client. e.g.: Accept-Charset:iso-8859-1,gb2312. If this field is not set in the request message, the default is that any character set can be accepted. Accept-Encoding, Accept-Encoding request header field is similar to Accept but is used to specify acceptable content encoding. e.g.: Accept-Encoding:gzip.deflate. If this field is not set in the request message, the server assumes that the client accepts various content encodings. Accept-Language, Accept-Language request header field is similar to Accept but is used to specify a natural language. e.g.: Accept-Language:zh-cn. If this header field is not set in the request message, the server assumes that the client accepts various languages. Authorization, Authorization request header field is mainly used to prove that the client is authorized to view a resource. When a browser accesses a page, if it receives a server response code of 401 (Unauthorized), it can send a request containing the Authorization request header field to ask the server for verification. Host (this header field is required when sending requests) Host request header field is mainly used to specify the Internet host and port number of the requested resource, which is usually extracted from the HTTP URL. e.g.: When we enter:http://www.guet.edu.cn/index.html in the browser, the request message sent by the browser will include the Host request header field, as follows: Host:www.guet.edu.cn Here, the default port number 80 is used, and if a port number is specified, it becomes: Host:www.guet.edu.cn:specified port number. User-Agent: When we log into forums online, we often see some welcome messages that list the name and version of your operating system, the name and version of the browser you are using, which often amazes many people. In fact, the server application obtains this information from the User-Agent request header field. User-Agent request header field allows the client to inform the server of its operating system, browser, and other attributes. However, this header field is not required; if we write our own browser and do not use the User-Agent request header field, the server will not be able to know our information. Example of request headers: GET /form.html HTTP/1.1 (CRLF)Accept:image/gif,image/x-xbitmap,image/jpeg,application/x-shockwave-flash,application/vnd.ms-excel,application/vnd.ms-powerpoint,application/msword,*/* (CRLF)Accept-Language:zh-cn (CRLF)Accept-Encoding:gzip,deflate (CRLF)If-Modified-Since:Wed,05 Jan 2007 11:21:25 GMT (CRLF)If-None-Match:W/”80b1a4c018f3c41:8317″ (CRLF)User-Agent:Mozilla/4.0(compatible;MSIE6.0;Windows NT 5.0) (CRLF)Host:www.guet.edu.cn (CRLF)Connection:Keep-Alive (CRLF)(CRLF)

3. Response headers: Response headers allow the server to pass additional response information that cannot be placed in the status line, as well as information about the server and information for accessing resources identified by the Request-URI. Common response headers include Location, Location response header field is used to redirect the recipient to a new location. The Location response header field is commonly used when changing domain names. Server, Server response header field contains information about the software used by the server to process the request. It corresponds to the User-Agent request header field. Here is an example of the Server response header field: Server: Apache-Coyote/1.1. WWW-Authenticate, WWW-Authenticate response header field must be included in 401 (Unauthorized) response messages. When the client receives a 401 response message and sends a request containing the Authorization header field for verification, the server’s response header will include this header field. e.g.: WWW-Authenticate:Basic realm=”Basic Auth Test!” // It can be seen that the server uses basic authentication for the requested resource.

4. Entity headers: Both request and response messages can transmit an entity. An entity consists of entity header fields and entity body, but it does not mean that entity header fields and entity body must be sent together; entity header fields can be sent alone. Entity headers define metadata about the entity body (e.g., whether there is an entity body) and the resource identified by the request. Common entity headers include Content-Encoding, Content-Encoding entity header field is used as a modifier for media type, and its value indicates the additional content encoding that has been applied to the entity body. Therefore, to obtain the media type referenced in the Content-Type header field, the corresponding decoding mechanism must be applied. Content-Encoding is used to record the compression method of the document, e.g.: Content-Encoding:gzip. Content-Language, Content-Language entity header field describes the natural language used for the resource. If this field is not set, it is assumed that the entity content will be provided to all language readers. e.g.: Content-Language:da. Content-Length, Content-Length entity header field is used to indicate the length of the entity body, represented as a decimal number stored in bytes. Content-Type, Content-Type entity header field is used to specify the media type of the entity body sent to the recipient. e.g.: Content-Type:text/html;charset=ISO-8859-1. Content-Type:text/html;charset=GB2312. Last-Modified, Last-Modified entity header field is used to indicate the date and time the resource was last modified. Expires, Expires entity header field gives the date and time when the response expires. To allow proxy servers or browsers to update the cached page after a period of time (when accessing a previously visited page, load directly from the cache to shorten response time and reduce server load), we can use the Expires entity header field to specify the expiration time of the page. e.g.: Expires:Thu,15 Sep 2006 16:23:12 GMT. HTTP/1.1 clients and caches must consider other illegal date formats (including 0) as expired. e.g.: To prevent the browser from caching the page, we can also use the Expires entity header field, setting it to 0, the JSP program is as follows: response.setDateHeader(“Expires”,”0″);

5. Observing the Communication Process of HTTP Protocol Using Telnet

Experiment Objective and Principle: Using MS’s telnet tool, by manually inputting HTTP request information, send a request to the server. After the server receives, interprets, and accepts the request, it will return a response, which will be displayed in the telnet window, thereby deepening the understanding of the communication process of the HTTP protocol.

Experiment Steps:

1. Open telnet 1.1 Open telnet run–>cmd–>telnet

1.2 Open telnet echo function set localecho

2. Connect to the server and send a request 2.1 open www.guet.edu.cn 80 // Note: port number cannot be omitted

HEAD /index.asp HTTP/1.0 Host:www.guet.edu.cn /* We can change the request method to request content from the Guilin University of Electronic Technology homepage, input the message as follows */ open www.guet.edu.cn 80 GET /index.asp HTTP/1.0 // Request the content of the resource Host:www.guet.edu.cn

2.2 open www.sina.com.cn 80 // Directly enter telnet www.sina.com.cn 80 HEAD /index.asp HTTP/1.0 Host:www.sina.com.cn

3 Experiment Results:

3.1 Request Information 2.1 The response obtained is:

HTTP/1.1 200 OK // Request successful Server: Microsoft-IIS/5.0 // web server Date: Thu,08 Mar 200707:17:51 GMT Connection: Keep-Alive Content-Length: 23330 Content-Type: text/html Expires: Thu,08 Mar 2007 07:16:51 GMT Set-Cookie: ASPSESSIONIDQAQBQQQB=BEJCDGKADEDJKLKKAJEOIMMH; path=/ Cache-control: private

// Resource content omitted

3.2 Request Information 2.2 The response obtained is:

HTTP/1.0 404 Not Found // Request failed Date: Thu, 08 Mar 2007 07:50:50 GMT Server: Apache/2.0.54 <Unix> Last-Modified: Thu, 30 Nov 2006 11:35:41 GMT ETag: “6277a-415-e7c76980” Accept-Ranges: bytes X-Powered-By: mod_xlayout_jh/0.0.1 vhs.markII.remix Vary: Accept-Encoding Content-Type: text/html X-Cache: MISS from zjm152-78.sina.com.cn Via: 1.0 zjm152-78.sina.com.cn:80<squid/2.6.STABLES-20061207> X-Cache: MISS from th-143.sina.com.cn Connection: close

The connection to the host has been lost

Press any key to continue…

4. Notes: 1. If input errors occur, the request will not succeed. 2. Header fields are case-insensitive. 3. To further understand the HTTP protocol, refer to RFC2616, which can be found at http://www.letf.org/rfc.

4. Developers of backend programs must master the HTTP protocol.

6. Related Technical Supplement to HTTP Protocol

1. Basics: High-level protocols include File Transfer Protocol (FTP), Simple Mail Transfer Protocol (SMTP), Domain Name System (DNS), Network News Transfer Protocol (NNTP), and HTTP protocol. Mediators include three types: Proxy, Gateway, and Tunnel. A proxy accepts requests based on the absolute format of the URI, rewrites all or part of the message, and sends the formatted request to the server via the URI identifier. A gateway is an accepting proxy and acts as an upper layer for other servers, and if necessary, can translate the request to the lower-level server protocol. A tunnel serves as a relay point between two connections without altering the messages. Tunnels are often used when communication needs to go through an intermediary (e.g., firewalls) or when the intermediary cannot recognize the content of the messages. Proxy: A middle program that can act as a server or a client, establishing requests for other clients. Requests are passed internally or through possible translations to other servers. A proxy must interpret and rewrite the request information before sending it. Proxies often serve as client-side portals through firewalls and can also act as helpers to process requests that have not been completed by user agents through the protocol. Gateway: A server that serves as an intermediary for other servers. Unlike a proxy, a gateway accepts requests as if it were the source server for the requested resource; the client making the request is unaware that it is dealing with a gateway. Gateways often serve as server-side portals through firewalls and can also act as protocol translators to access resources stored in non-HTTP systems. Tunnel: A middleware program that acts as a relay between two connections. Once activated, a tunnel is considered to not belong to HTTP communication, even though it may be initialized by an HTTP request. When the connections being relayed are closed at both ends, the tunnel disappears. Tunnels are often used when a portal must exist or when an intermediary cannot interpret the relayed communication. 2. Advantages of protocol analysis—HTTP analyzers detect network attacks and modularly analyze high-level protocols, which will be the future direction of intrusion detection. Common ports for HTTP and its proxies are 80, 3128, and 8080, which are specified with the port label in the network section. 3. HTTP protocol Content-Length limit vulnerability leads to denial-of-service attacks. When using the POST method, the Content-Length can be set to define the length of the data to be transmitted, e.g., Content-Length:999999999. Before the transmission is completed, the memory will not be released, and attackers can exploit this flaw to continuously send garbage data to the WEB server until the server’s memory is exhausted. This type of attack typically leaves no trace. http://www.cnpaf.net/Class/HTTP/0532918532667330.html 4. Some ideas for denial-of-service attacks using HTTP protocol features: The server is too busy processing forged TCP connection requests from attackers to attend to normal client requests (after all, the ratio of normal client requests is very small). At this point, from the perspective of normal clients, the server appears unresponsive; this situation is referred to as: the server is under SYN Flood attack (SYN flood attack). Smurf, TearDrop, and others use ICMP packets to Flood and IP fragment attacks. This article uses the method of “normal connections” to generate denial-of-service attacks. Port 19 has been used in the past for Chargen attacks, i.e., Chargen_Denial_of_Service. However! The method they used was to generate UDP connections between two Chargen servers, causing the server to process excessive information and crash. Therefore, to take down a WEB server, two conditions must be met: 1. There must be Chargen service 2. There must be HTTP service. Method: The attacker forges the source IP to send connection requests to N Chargen servers (Connect). Once Chargen receives the connection, it will return a character stream of 72 bytes per second (actually, this speed is faster depending on network conditions) to the server. 5. HTTP fingerprinting technology: The principle of HTTP fingerprinting is roughly the same: record the slight differences in how different servers execute the HTTP protocol to identify them. HTTP fingerprinting is much more complex than TCP/IP stack fingerprinting because customizing the configuration files of HTTP servers, adding plugins or components makes it easy to change the HTTP response information, making identification more difficult; however, customizing TCP/IP stack behavior requires modifications to the core layer, making it easier to identify. Setting the server to return different Banner information is very easy. For open-source HTTP servers like Apache, users can modify the Banner information in the source code and restart the HTTP service to take effect; for non-open-source HTTP servers like Microsoft’s IIS or Netscape, modifications can be made in the DLL files that store Banner information. There are articles that discuss this, and it will not be elaborated on here. Another way to obscure Banner information is by using plugins. Common test requests: 1: HEAD/Http/1.0 send a basic HTTP request 2: DELETE/Http/1.0 send requests that are not allowed, such as DELETE requests 3: GET/Http/3.0 send an illegal version of the HTTP protocol request 4: GET/JUNK/1.0 send an incorrectly formatted HTTP protocol request. The HTTP fingerprinting tool Httprint effectively determines the type of HTTP server by using statistical principles and combining fuzzy logic techniques. It can be used to collect and analyze signatures generated by different HTTP servers. 6. Others: To improve user performance when using browsers, modern browsers also support concurrent access methods, establishing multiple connections while browsing a webpage to quickly obtain multiple icons on a webpage, thus completing the entire webpage transmission more quickly. HTTP/1.1 provides this persistent connection method, while the next generation HTTP protocol: HTTP-NG further increases support for session control, rich content negotiation, and other methods to provide more efficient connections.

Software Testing Free Video Link:https://ke.qq.com/course/159919#tuin=ba4122

Songqin Network:www.songqinnet.com

WeChat Public Account: Songqin Software Academy

Software Testing Communication QQ Group: 642067188

Software Automation Testing Communication QQ Group: 398140461

Software Performance Testing Communication QQ Group: 348074292

Detailed Explanation of HTTP Protocol

Leave a Comment