Introduction to HTTP Protocol
The HTTP protocol is a text-based transmission protocol that operates at the application layer of the OSI network model.
The HTTP protocol communicates through request-response interactions between clients and servers. The protocol has been split from the previous RFC 2616 into six separate protocol specifications (RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235). The communication messages are as follows:
Request:POST http://www.baidu.com HTTP/1.1Host: www.baidu.comConnection: keep-aliveContent-Length: 7User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36wd=HTTPResponse:HTTP/1.1 200 OKConnection: Keep-AliveContent-Encoding: gzipContent-Type: text/html;charset=utf-8Date: Thu, 14 Feb 2019 07:23:49 GMTTransfer-Encoding: chunked<html>…</html>
It can be observed that most websites requiring user input now use the HTTPS protocol, which is simply a more secure version of HTTP, employing advanced encryption methods. Even if some websites use the HTTP protocol, user input data may still be protected using hash functions or other encryption methods. Why is the HTTP protocol considered insecure? We can experience this firsthand by capturing packets; the account and password we input can be directly obtained!
Experimental Steps
1. Start Wireshark and select the network you are on
2. After double-clicking, click the red square in the upper left corner to stop capturing
3. Fill in the filter, with the filter content http and ip.addr==your_ip ,where the IP address is that of the website you want to access and click the Apply button, then click Start to capture again. Filtering before capturing makes the page much cleaner and aids in subsequent analysis, preparing for packet capture.
4. Go to the login page and log in
5. After logging in, Wireshark captures the data
Up to this point, we have successfully captured the username and password!
HTTP Man-in-the-Middle Attack
The HTTP protocol is indeed very convenient, but it has a fatal flaw: it is insecure.
We know that messages in the HTTP protocol are transmitted in plaintext without any encryption. What problems can this lead to?
Let’s illustrate with an example:
①Xiao Ming posts on the Java forum, saying “I love Java”:②He is attacked by a man-in-the-middle, and the content is modified to “I love PHP”:
③Xiao Ming is ridiculed (manual dog head)
It can be seen that during the HTTP transmission process, the man-in-the-middle can see and modify all request and response content in the HTTP communication, making the use of HTTP very insecure.
Preventing Man-in-the-Middle Attacks
At this point, some may think, since the content is in plaintext, I can use symmetric encryption to encrypt the messages so that the man-in-the-middle cannot see the plaintext. Thus, the modification is as follows:
①Both parties agree on an encryption method, as shown in the following diagram:②Use AES to encrypt the messages, as shown in the following diagram:
While this seems to prevent the man-in-the-middle from obtaining plaintext information, the encryption method and key will still be exposed in plaintext during communication. If the first communication is intercepted, the key will be leaked to the man-in-the-middle, allowing them to decrypt subsequent communications, as shown in the following diagram:
In this case, we would certainly consider whether the key can be encrypted to prevent the man-in-the-middle from seeing it. The answer is yes; asymmetric encryption can be used, and we can implement this using the RSA algorithm.
During the agreement on the encryption method, the server generates a pair of public and private keys, returns the public key to the client, and the client locally generates a secret key (AES_KEY) for symmetric encryption, which is then encrypted using the public key received from the server to obtain (AES_KEY_SECRET), and finally returns it to the server.
The server uses its private key to decrypt the AES_KEY_SECRET sent by the client to obtain the AEK_KEY, and finally, the client and server communicate using the AEK_KEY for message encryption.
The modification is as follows:
In this case, the man-in-the-middle cannot steal the key used for AES encryption, so subsequent communications cannot be decrypted. But does this make it absolutely secure?
As the saying goes, “where there is a will, there is a way”; the man-in-the-middle has devised a new cracking scheme to counter this encryption method. Since they cannot obtain the AES_KEY, they simulate a combination of a client and server.
In the process from user to the man-in-the-middle, the man-in-the-middle simulates the server’s behavior to obtain the plaintext of the user’s request. In the process from the man-in-the-middle to the server, the man-in-the-middle simulates the client’s behavior to obtain the plaintext of the server’s response, thus conducting a man-in-the-middle attack:
This communication is intercepted again by the man-in-the-middle, who also forges a pair of public and private keys and sends the public key to the user to steal the AES_KEY generated by the client. Once the AES_KEY is obtained, they can easily decrypt the communication.
Is there no way to punish the man-in-the-middle for their actions? Of course, there is. Next, let’s see how HTTPS solves communication security issues.
HTTPS Protocol
HTTPS is actually short for SSL + HTTP. Although SSL has largely been replaced by TLS, we will still refer to it as SSL for consistency.
The SSL protocol is not only applied to the HTTP protocol but also to various application layer protocols, such as FTP and WebSocket.
The SSL protocol is similar to the nature of asymmetric encryption discussed in the previous section. During the handshake process, it primarily aims to exchange keys, and then uses symmetric encryption for communication.
The general process is as follows:
This is just a schematic diagram; the actual SSL handshake is much more complex, but the nature is similar. Here, we need to focus on how HTTPS prevents man-in-the-middle attacks.
From the diagram, it can be observed that the server transmits the public key through the SSL certificate, and the client verifies the SSL certificate. The certificate authentication system is the key to ensuring SSL security. Next, we will explain the CA authentication system and how it prevents man-in-the-middle attacks.
CA Authentication System
In the previous section, we saw that the client needs to verify the SSL certificate returned by the server. How does the client verify the security of the server’s SSL certificate?
In the CA authentication system, all certificates are issued by authoritative institutions, and the CA certificates of these institutions are already built into the operating system. We refer to these certificates as CA root certificates:
To issue a certificate, our application server must obtain a CA certificate from an authoritative certification authority. We send the public key generated by the server along with site-related information to the CA issuing authority, which then signs it using the relevant information sent by the server.
This results in the certificate for our application server, which generates a signature for the certificate content and encrypts that signature using the private key of the CA issuing authority to obtain the certificate fingerprint, forming a chain of relationships with the upper-level certificate.
Here, let’s download the certificate from Baidu to take a look:
It can be seen that Baidu is trusted by GlobalSign G2, and GlobalSign G2 is trusted by GlobalSign R1.
When the client (browser) performs certificate verification, it checks up the hierarchy until it reaches the root certificate. If there are no issues, it indicates that the server’s certificate can be trusted.
How does the client verify the server’s certificate? The client (browser) finds the upper-level certificate through the hierarchical relationship and uses the public key in the upper-level certificate to decrypt the server’s certificate fingerprint to obtain the signature (sign1). It then calculates the signature of the server’s certificate (sign2) using the signature algorithm.
By comparing sign1 and sign2, if they are equal, it indicates that the certificate has not been tampered with and is not forged.
Interestingly, the RSA used for certificate verification cleverly validates the certificate’s validity by encrypting the certificate signature with the private key and decrypting it with the public key.
Through the certificate authentication system, we can avoid the man-in-the-middle from stealing the AES_KEY and thus intercepting and modifying HTTP communication messages.
Conclusion
First, we learned about HTTP’s insecurity through the man-in-the-middle attack, and then summarized the evolution of security offense and defense technologies leading to the principles of HTTPS, hoping to provide a deeper understanding of HTTPS.