In-Depth Analysis of HTTPS Security: The Secret Weapon for Safe Web Transmission!

Linux | Red Hat Certified | IT Technology | Operations Engineer

👇 Join the technical exchange QQ group of 1000 people, note 【Public Account】 for faster approval

1. What is HTTPS?

HTTPS is HTTP that has been encrypted and decrypted.

HTTPS is also an application layer protocol that introduces an encryption layer on top of the HTTP protocol.

So why does the HTTP protocol need encryption? Because HTTP itself is very insecure!

The content of the HTTP protocol is transmitted in plaintext format. This leads to situations where the content can be tampered with during transmission.

2. Concept Preparation

2.1. What are Encryption, Decryption, and Keys

Encryption is the process of transforming plaintext (the information to be transmitted) into ciphertext through a series of transformations.

Decryption is the process of transforming ciphertext back into plaintext through a series of transformations.

In the processes of encryption and decryption, one or more intermediate data are often needed to assist in the process; this data is called a key.

2.2. Why Encrypt

The notorious “Carrier Hijacking”:

Download a daily music app.

If not hijacked, clicking the download button will pop up the direct download link.

If hijacked, clicking the download button will pop up the download link for QQ Browser.

The client and server do not communicate directly; instead, they communicate through a third-party carrier for information relay.

As an intermediary, the carrier can see the information being communicated between the client and server, which can lead to user privacy leakage!

Transmitting plaintext over the internet is quite dangerous!!!

HTTPS encrypts HTTP to further ensure the security of user information.

2.3. Common Encryption Methods

1. Symmetric Encryption

This method uses a single-key cryptosystem; the same key can be used for both encryption and decryption. This method is called symmetric encryption, also known as single-key encryption. Its characteristic is that the keys used for encryption and decryption are the same.

Common symmetric encryption algorithms (for reference): DES, 3DES, AES, TDEA, Blowfish, RC2, etc.

Characteristics: The algorithm is public, the computational load is small, the encryption speed is fast, and the encryption efficiency is high.

Symmetric encryption is essentially using the same “key” to encrypt plaintext into ciphertext and decrypt ciphertext back into plaintext.

A simple example of symmetric encryption is bitwise XOR.

Assuming plaintext a = 1234, key = 8888,

then the ciphertext b obtained by encrypting a ^ key is 9834.

Then, by performing the operation b ^ key again on ciphertext 9834, you get back the original plaintext 1234.

(The same applies to symmetric encryption of strings; each character can be represented as a number.)

Of course, bitwise XOR is just the simplest form of symmetric encryption. HTTPS does not use bitwise XOR.

2. Asymmetric Encryption

Concept:

This method requires two keys for encryption and decryption: a public key and a private key.

Common asymmetric encryption algorithms (for reference): RSA, DSA, ECDSA.

Characteristics:The strength of the algorithm is complex, security depends on the algorithm and keys, but due to its complexity, the speed of encryption and decryption is not as fast as symmetric encryption.

Asymmetric encryption uses two keys: one called the “public key” and the other called the “private key.”

The public and private keys are paired. The biggest drawback is that the computation speed is very slow, much slower than symmetric encryption.

Usage:

Encrypt plaintext using the public key to obtain ciphertext.

Decrypt ciphertext using the private key to obtain plaintext.

It can also be reversed:

Encrypt plaintext using the private key to obtain ciphertext.

Decrypt ciphertext using the public key to obtain plaintext.

2.4. Data Summary & Data Fingerprint

Data summary is called a data fingerprint because it has uniqueness, just like a human fingerprint.

A digital fingerprint (data summary) operates using a one-way hash function (Hash function) to generate a fixed-length digital summary. A digital fingerprint is not an encryption mechanism, but can be used to determine whether data has been tampered with.

Common summary algorithms: MD5, SHA1, SHA256, SHA512, etc. The algorithm maps the infinite to the finite, so there may be collisions (two different pieces of information yield the same summary, but the probability is very low).

Summary characteristics:The difference from encryption algorithms is that a summary is not strictly speaking encryption because there is no decryption; it is just very difficult to reverse-engineer the original information from the summary, usually used for data comparison.

For example, saving plaintext passwords in a database is not recommended; we can save the password as a summary.

2.5. Digital Signature

A summary that is encrypted becomes a digital signature.

3. Exploring the Working Process of HTTPS

To ensure data security, “encryption” is necessary.

In network transmission, plaintext is no longer directly transmitted, but rather “ciphertext” after encryption.

There are many ways to encrypt, but overall they can be divided into two main categories: symmetric encryption and asymmetric encryption.

Plan 1 – Using Only Symmetric Encryption

If both parties of communication hold the same key X, and no one else knows it, the security of communication between these two parties can be guaranteed (unless the key is compromised).

However, how do we let both clients know what this lock is? If it is transmitted in plaintext, won’t that leak the information again?

So using only symmetric encryption, it is impossible to synchronize the key X between the two parties initially.

Plan 2 – Using Only Asymmetric Encryption

Given the mechanism of asymmetric encryption, if the server first transmits its public key in plaintext to the browser, then before the browser sends data to the server, it first encrypts the data with this public key, the channel from the client to the server seems secure (but has security issues), because only the server has the corresponding private key to decrypt the public key encrypted data.

But how can the route from the server to the browser ensure security?

If the server encrypts data with its private key to send to the browser, then the browser can decrypt it with the public key, but this public key was initially transmitted in plaintext to the browser. If this public key is intercepted by a man-in-the-middle, he can also use this public key to decrypt the information sent from the server!

Plan 2 guarantees one-way data security (but it is also temporary), and the computation speed is very slow, so it is not adopted!

Security Issues of Plan 2:

But how can the route from the server to the browser ensure security?

So there are also security risks!

Plan 3 – Both Parties Use Asymmetric Encryption

The server possesses public key S and corresponding private key S’, while the client possesses public key C and corresponding private key C’.

The client and server exchange public keys.

The client sends a message to the server: first encrypts the data with S, then sends it, which can only be decrypted by the server because only the server has private key S’.

The server sends a message to the client: first encrypts the data with C, then sends it, which can only be decrypted by the client because only the client has private key C’.

Existing Issues:

Actually, it is not safe! (Same as Plan 2)

The communication speed will be relatively slow!

Plan 4 – Asymmetric Encryption + Symmetric Encryption

Communication Process:

The server possesses asymmetric public key S and private key S’.

The client initiates an HTTPS request and obtains the server’s public key S.

The client locally generates a symmetric key C, encrypts it with public key S, and sends it to the server.

Since the intermediate network devices do not have the private key, even if they intercept the data, they cannot restore the original plaintext, and they cannot obtain the symmetric key (is that true?).

The server decrypts it with private key S’ and restores the symmetric key C sent by the client. It uses this symmetric key to encrypt the response data sent back to the client.

Subsequent communication between the client and server uses only symmetric encryption. Since this key is only known to the client and server, other hosts/devices do not know the key, making intercepted data meaningless.

Since symmetric encryption is much more efficient than asymmetric encryption, asymmetric encryption is only used at the beginning to negotiate the key, while subsequent transmissions continue to use symmetric encryption, thus solving the efficiency issue!

Although the above is quite close to the answer, there are still security issues.

Plans 2, 3, and 4 all have a problem: what if, at the very beginning, the man-in-the-middle has already started attacking?

4. Man-in-the-Middle Attack – Addressing the Above Scenarios

Man-in-the-Middle Attack, abbreviated as “MITM Attack”

Indeed, in Plans 2/3/4, after the client obtains public key S, it uses the public key S to encrypt the symmetric key X formed by the client, and even if the man-in-the-middle intercepts the data, he cannot decrypt the symmetric key X because only the server has private key S’.

However, if the man-in-the-middle attack occurs during the initial handshake negotiation, it may not be the case. What if the hacker has successfully become a man-in-the-middle?!

Attack Process:

The server has a public key S of the asymmetric encryption algorithm and a private key S’.

The man-in-the-middle has a public key M of the asymmetric encryption algorithm and a private key M’.

The client sends a request to the server, and the server sends its public key S to the client in plaintext.

The man-in-the-middle intercepts the data packet, extracts public key S, saves it, then replaces the public key S in the intercepted packet with his own public key M, and sends the forged packet to the client.

The client receives the packet, extracts public key M (of course, it doesn’t know the public key has been replaced), forms the symmetric key X, encrypts X with public key M, and sends the packet to the server.

The man-in-the-middle intercepts it, decrypts it with his private key M’, obtaining the communication key X, then encrypts it with the server’s saved public key S and pushes the packet to the server.

The server receives the packet, decrypts it with its private key S’, obtaining the communication key X.

Both parties start using X for symmetric encryption and communication. However, everything is under the control of the man-in-the-middle, and data interception, eavesdropping, or even modification are possible.

The attack scheme above also applies to Plans 2 and 3.

Where is the essence of the problem? The client cannot determine whether the public key received is legitimate! So how can the client know that the “public key” it received is legitimate?

5. The Introduction of Certificates

CA Certification

Before using HTTPS, the server needs to apply for a digital certificate from a CA organization. The digital certificate contains information about the certificate applicant, public key information, etc. The server transmits the certificate to the browser, and the browser retrieves the public key from the certificate. The certificate serves as an identity card, proving the authority of the server’s public key.

This certificate can be understood as a structured string containing the following information:

• Certificate Issuer

• Certificate Validity Period

• Public Key

• Certificate Owner

• Signature

6. Understanding Data Signatures

A data signature is the encryption of the data summary or fingerprint.

When the server applies for a CA certificate, the CA organization will review the server and specifically generate a

digital signature process for that website.

The CA organization has a private key A and a public key A’ of asymmetric encryption.

The CA organization hashes the plaintext data of the server’s certificate application to form a data summary.

Then, it encrypts the data summary with CA’s private key A’ to obtain the digital signature S.

Verification Process:

When receiving data, first separate the original data from the signature.

Then, use the same hashing algorithm to obtain a new hash value from the original data.

Decrypt the signature using the signer’s public key to get a new hash value, and finally compare the new hash value with the signature to determine whether the data has been tampered with!

So where does the client’s signer public key come from?

All browsers (clients) generally have built-in public keys from trusted CA organizations or authorized sub-organizations!

7. The Final Solution of HTTPS: Asymmetric Encryption + Symmetric Encryption + Certificate Authentication

When the client and server first establish a connection, the server returns a certificate to the client, which contains the server’s public key and the identity information of the website.

Client Authentication

When the client sends its first request, it not only receives the public key but actually receives a “certificate,” which is issued by the CA organization.

After obtaining this certificate, the client will verify the certificate (to prevent it from being forged).

It checks whether the certificate has expired.

It checks whether the certificate issuer is trusted (the trusted certificate issuing organizations built into the operating system).

It verifies whether the certificate has been tampered with: it retrieves the public key from the certificate issuer, decrypts the signature, obtains a hash value (called a data summary), denoted as hash1. Then it calculates the hash value of the entire certificate, denoted as hash2. It compares hash1 and hash2 to see if they are equal. If they are equal, it means the certificate has not been tampered with.

Common Questions:

Can a man-in-the-middle modify this certificate?

Since the man-in-the-middle does not have the CA organization’s private key, he cannot hash and then encrypt to form a signature, so he cannot create a matching signature for a tampered certificate. Only the CA organization has the private key, which means only the CA organization has the ability to sign the data!

Can the man-in-the-middle swap out the certificate entirely?

Since the man-in-the-middle does not have the CA private key, he cannot create a fake certificate.

So the man-in-the-middle can only apply for a real certificate from the CA and swap it out with his own.

This can indeed accomplish a complete swap of the certificate, but don’t forget that the plaintext of the certificate contains domain name and other server authentication information. If a complete swap occurs, the client can still identify it.

Always remember: the man-in-the-middle does not have the CA private key, so he cannot legally modify any certificate, including his own.

Why hash before encrypting the signature?

To reduce the length of the signature ciphertext and speed up the verification process of the digital signature.

Complete Process of HTTPS Communication:

Summary

There are three sets of keys involved in the HTTPS working process:

First Set (Asymmetric Encryption): Used to verify whether the certificate has been tampered with. The server holds the private key (obtained when forming the CSR file and applying for the certificate), and the client holds the public key (the operating system includes which CA certification organizations are trusted, and also holds the corresponding public key). The server returns the signed certificate upon client request. The client verifies the certificate using this public key to ensure its legitimacy, further guaranteeing the authority of the server’s public key contained in the certificate.

Second Set (Asymmetric Encryption): Used to negotiate and generate the symmetric encryption key. The client uses the public key from the CA certificate received (which is trusted) to encrypt the randomly generated symmetric encryption key and transmits it to the server, which decrypts it with its private key.

Third Set (Symmetric Encryption): The data transmitted subsequently between the client and server is encrypted and decrypted using this symmetric key.

In fact, everything revolves around this symmetric encryption key. Other mechanisms are all auxiliary to the work of this key!

For course inquiries, add: HCIE666CCIE

↓ Or scan the QR code below ↓

What technical points and content would you like to see?

You can leave a message below to let us know!

Related posts

Leave a Comment Cancel reply