Understanding the HTTP Protocol
1. Concept of HTTP Protocol:
Concept: Hypertext Transfer Protocol, which specifies the rules for data transmission between browsers and servers.
2. Characteristics of HTTP Protocol
-
Based on TCP protocol: connection-oriented and secure.
-
Based on request-response model: one request corresponds to one response.
-
HTTP is a stateless protocol: it has no memory of previous transactions, each request-response is independent.
-
Disadvantage: Data cannot be shared between multiple requests.
-
Advantage: Fast speed.
3. HTTP Request Process
The HTTP protocol uses a request/response model, where the client sends a request message to the server, and then the server responds to the request. Below is an introduction to the process of a single HTTP request:
-
Domain Name Resolution: Using DNS protocol for domain name resolution.1.1 The browser searches its own DNS cache.1.2 If the browser’s cache does not contain the information, it will search the operating system’s DNS cache.1.3 If neither of the above two sources yields results, it will attempt to find the location in the operating system’s hosts file (usually located at C:\Windows\System32\drivers\etc\hosts).1.4 If none of the above three processes yield results, it will recursively query the domain name server.
-
Establishing Connection: Initiating TCP three-way handshake.
-
Initiating HTTP Request: After successfully establishing a TCP connection, the browser initiates the HTTP request.
-
HTTP is a standard for requests and responses between clients and servers (TCP). The client is the end user, and the server is the website. By using web browsers, web crawlers, or other tools, the client initiates an HTTP request to a specified port on the server (default port is 80).In simple terms, it is the rule for communication between computers over the network, based on request and response, stateless, and an application layer protocol, commonly transmitting data based on TCP/IP protocol. Currently, any communication between any terminals (mobile phones, laptops, etc.) must follow the HTTP protocol; otherwise, they cannot connect.
Responding to HTTP Request: The server responds to the HTTP request, and the browser receives the returned response.
-
After the browser sends a request to the server, the server will respond, including: the version number of the protocol and the response status code: HTTP/1.1 200OK, response header information to record the server’s own data, and the content of the requested document. Finally, it sends a blank line to indicate the end of the header information, followed by sending the actual data requested by the user in the format described by the Content-Type response header.
Parsing Response: The browser parses the response and requests other resources (such as js, css, etc.).
-
The browser receives the HTML code and CSS returned by the server, along with the JS code, to render the page or perform other operations on the received response files.
Browser Rendering Display Page: The browser renders and displays the page based on the kernel.
-
Rendering the content based on the parsed information for the user.
Disconnecting: TCP four-way handshake.
-
Refer to Java basic network programming.
4. Format of Request Data
1. Request data is divided into 3 parts:
-
Request Line: The first line of the request data, where GET indicates the request method, / indicates the resource path being requested, and HTTP/1.1 indicates the protocol version.
-
Request Header: Starting from the second line, formatted as key-value pairs.
-
Request Body: The last part of a POST request, containing the request parameters.
2. Common HTTP Request Headers:
-
Host: Indicates the hostname being requested.
-
User-Agent: Browser version, for example: Chrome browser representation.
-
Accept: Indicates the resource types the requester wishes to receive or can parse and recognize.
-
Accept-Language: Indicates the language of the browser, allowing the server to return web pages in different languages.
-
Accept-Encoding: Indicates the compression types supported by the browser.
-
Content-Type: Indicates the actual type of resource being sent.
-
Referer: The Referer in the HTTP request header can solve some hotlinking issues.
3. Hotlink Protection: A webpage cannot access the correct network image on the server, resulting in a 403 error for link requests, while it can be accessed correctly in the browser.
-
Solution:
-
Add the following code in the head of the page to remove the Referer request header, which is effective for link requests on the page and for Ajax requests initiated by JavaScript code.
<span><meta name="referrer" content="no-referrer" /></span>
Or to remove the referrer for images:
<img referrer="no-referrer|origin|unsafe-url" src="image link"/>
<!-- For example -->
<img src="https://gitee.com/alanway/resources/raw/master/files/iis-reverse-proxy/site-access-with-proxy.png" referrerpolicy="no-referrer">
4. Request Methods
HTTP/1.1 has 7 request methods: 1. GET; 2. POST; 3. PUT; 4. DELETE; 5. HEAD; 6. TRACE; 7. OPTIONS;
GET | Operation to retrieve this resource |
---|---|
DELETE | Operation to delete this resource (Note: The client cannot guarantee that the delete operation will be executed, as the HTTP specification allows the server to revoke the request without notifying the client). |
HEAD | Similar behavior to the GET method, but the server only returns the header portion of the entity in the response. It can quickly obtain resource information, such as resource type; by checking the status code in the response, it can determine whether the resource exists; by checking the headers, it can test whether the resource has been modified. |
POST | Submits data to the specified resource for processing (e.g., submitting a form or uploading a file). The data is included in the request body. A POST request may result in the creation of new resources and/or modification of existing resources (suitable for update operations). |
PUT | Transmits data from the client to the server to replace the content of the specified document (suitable for add operations). |
OPTIONS | Used to obtain the methods supported by the current URL. If the request is successful, it will include a header named “Allow” in the HTTP header, with the value being the supported methods, such as “GET, POST”. |
TRACE | Initiates a “loopback” diagnostic on the destination server, as the client’s request may pass through firewalls, proxies, gateways, or other applications. Each of these nodes may modify the original HTTP request, and the TRACE method allows the client to see what the request looks like when it finally reaches the server. Since there is a “loopback” diagnostic, the server will respond with a TRACE response, carrying the original request message it received. |
5. Differences Between GET and POST Requests (Interview Question)
-
GET request parameters are in the request line, with no request body. POST request parameters are in the request body.
-
GET request parameters have size limitations, while POST has no limitations.
-
GET requests are generally faster than POST requests.
6. Introduction to Response Data Format
1. Response data is divided into 3 parts:
-
Response Line: The first line of the response data, where HTTP/1.1 indicates the protocol version, 200 indicates the response status code, and OK indicates the status code description.
-
Response Header: Starting from the second line, formatted as key:value pairs.
-
Response Body: The last part, containing the response data.
2. Common HTTP Response Headers
-
Content-Type: Indicates the actual type of resource being sent.
-
In the HTTP protocol message header, Content-Type is used to indicate media type information. It tells the server how to handle the request data and informs the client (usually the browser) how to parse the response data, such as displaying images, parsing HTML, or simply displaying text.
-
For POST requests, the content is placed in the request body, and Content-Type defines the encoding format of the request body. After the data is sent, the receiving end must parse it. The receiving end relies on the Content-Type field in the request header to know the encoding format of the request body and then parses it accordingly.
-
Content-Length: Indicates the length of the response content (in bytes).
-
Content-Encoding: Indicates the compression algorithm of the response, such as gzip.
-
Cache-Control: Indicates how the client should cache, e.g., max-age=300, indicating it can be cached for a maximum of 300 seconds.
3. Major Categories of Status Codes
-
1xx: Informational – Temporary status codes indicating that the request has been received, telling the client to continue the request or ignore it if it has already completed.
-
2xx: Success – Indicates that the request has been successfully received and processed. 3xx: Redirection – Redirects to another location, prompting the client to make another request to complete the process.
-
4xx: Client Error – Indicates an error occurred in processing, the responsibility lies with the client, e.g., the client requests a non-existent resource, unauthorized access, etc.
-
5xx: Server Error – Indicates an error occurred in processing, the responsibility lies with the server, e.g., the server throws an exception, routing error, unsupported HTTP version, etc.
4. Common Response Status Codes:
Status Code | Status Code English Name | Chinese Description |
---|---|---|
100 | Continue | Continue. The client should continue its request. |
101 | Switching Protocols | Switching Protocols. The server switches protocols based on the client’s request. Can only switch to a higher-level protocol, such as switching to a new version of HTTP. |
200 | OK | Request successful. Generally used for GET and POST requests. |
201 | Created | Created. The request was successful and a new resource was created. |
202 | Accepted | Accepted. The request has been accepted but not yet processed. |
203 | Non-Authoritative Information | Non-Authoritative Information. The request was successful, but the returned meta-information is not from the original server, but a copy. |
204 | No Content | No Content. The server successfully processed the request but did not return any content. This ensures that the browser continues to display the current document without updating the webpage. |
205 | Reset Content | Reset Content. The server successfully processed the request, and the user terminal (e.g., browser) should reset the document view. This return code can clear the browser’s form fields. |
206 | Partial Content | Partial Content. The server successfully processed part of the GET request. |
300 | Multiple Choices | Multiple Choices. The requested resource may include multiple locations, and the response may return a list of resource characteristics and addresses for the user terminal (e.g., browser) to choose from. |
301 | Moved Permanently | Moved Permanently. The requested resource has been permanently moved to a new URI, and the returned information will include the new URI. The browser will automatically redirect to the new URI. Any future requests should use the new URI instead. |
302 | Found | Found. Similar to 301, but the resource is only temporarily moved. The client should continue to use the original URI. |
303 | See Other | See Other. Similar to 301. Uses GET and POST requests to view. |
304 | Not Modified | Not Modified. The requested resource has not been modified; when the server returns this status code, it will not return any resource. The client usually caches accessed resources and provides a header indicating that the client only wishes to return resources modified after a specified date. |
305 | Use Proxy | Use Proxy. The requested resource must be accessed through a proxy. |
306 | Unused | Deprecated HTTP status code. |
307 | Temporary Redirect | Temporary Redirect. Similar to 302. Uses GET requests for redirection. |
400 | Bad Request | Client request syntax error, server cannot understand. |
401 | Unauthorized | The request requires user authentication. |
402 | Payment Required | Reserved for future use. |
403 | Forbidden | The server understands the client’s request but refuses to execute it. |
404 | Not Found | The server cannot find the resource (webpage) based on the client’s request. With this code, web designers can set up a personalized page stating “The resource you requested cannot be found.” |
405 | Method Not Allowed | The method in the client’s request is prohibited. |
406 | Not Acceptable | The server cannot complete the request based on the content characteristics requested by the client. |
407 | Proxy Authentication Required | The request requires proxy authentication, similar to 401, but the requester should use the proxy for authorization. |
408 | Request Time-out | The server waited too long for the client to send the request, timing out. |
409 | Conflict | The server may return this code when completing the client’s PUT request, indicating a conflict occurred while processing the request. |
410 | Gone | The resource requested by the client no longer exists. 410 differs from 404; if the resource previously existed and has now been permanently deleted, the 410 code can be used. Web designers can specify the new location of the resource using the 301 code. |
411 | Length Required | The server cannot process the request information sent by the client without a Content-Length. |
412 | Precondition Failed | The precondition specified in the client’s request information is incorrect. |
413 | Request Entity Too Large | The server cannot process the request because the entity is too large, thus rejecting the request. To prevent continuous requests from the client, the server may close the connection. If the server is temporarily unable to process, it will include a Retry-After response message. |
414 | Request-URI Too Large | The requested URI is too long (the URI is usually the URL), and the server cannot process it. |
415 | Unsupported Media Type | The server cannot process the media format attached to the request. |
416 | Requested range not satisfiable | The range requested by the client is invalid. |
417 | Expectation Failed | The server cannot meet the expectations specified in the Expect field of the request header. |
418 | I’m a teapot | Status code 418 is actually an April Fool’s joke. It is defined in RFC 2324, which is a joke document about the Hypertext Coffee Pot Control Protocol (HTCPCP). In this joke, status code 418 was added to the HTTP protocol as a joke. |
500 | Internal Server Error | Internal server error, unable to complete the request. |
501 | Not Implemented | The server does not support the requested functionality and cannot complete the request. |
502 | Bad Gateway | The server acting as a gateway or proxy received an invalid response from the remote server while attempting to execute the request. |
503 | Service Unavailable | The server is temporarily unable to process the client’s request due to overload or system maintenance. The length of the delay may be included in the server’s Retry-After header information. |
504 | Gateway Time-out | The server acting as a gateway or proxy did not receive a timely response from the remote server. |
505 | HTTP Version not supported | The server does not support the version of the HTTP protocol requested and cannot complete the processing. |
5. Common Data Transmission Formats
-
application/x-www-form-urlencoded:
-
This is the most common way to submit POST data. If the native form of the browser does not set the enctype attribute, it will ultimately submit data in the application/x-www-form-urlencoded format, which is the default value when no attribute is specified. During data transmission, the data is serialized and sent to the server in the form of key-value pairs, such as ?key1=value1&key2=value2. The data is encoded into key-value pairs separated by ‘&’, with keys and values separated by ‘=’. Non-alphanumeric characters are percent-encoded. In axios, when the request parameters are qs.stringify(data), it will submit data in this way. If the backend uses an object to receive it, it can automatically wrap it into an object.
-
Advantage: Compatible with all browsers.
-
Problem: When the data structure is complex, server-side data parsing becomes difficult.
application/json:
-
With the increasing popularity of the JSON specification and better browser support, many developers include content-type: application/json in the request header. This makes it easy to submit complex structured data, especially suitable for RESTful interfaces. It tells the server that the content of the request body is a JSON formatted string, and the server will parse the JSON string. JSON format supports much more complex structured data than key-value pairs. The benefit of this method is that frontend developers do not need to worry about the complexity of the data structure; they can successfully submit data using standard JSON format. When the request parameters in axios are a regular object, the default POST request sends data in application/json format. If application/json needs to be wrapped into an object, the @RequestBody annotation can be added.
-
Advantage: The frontend does not need to worry about the complexity of the data structure, and the backend parsing is convenient.
-
Problem: Some browsers are not compatible.
multipart/form-data:
-
Mainly used for file uploads, converting files into binary data for transmission without involving encoding.
text/plain:
-
Used for transmitting plain text, rarely used in practice.