1. A rough understanding of several basic terms (HTTPS, SSL, TLS) 2. A rough understanding of the relationship between HTTP and TCP (especially “short connection” vs. “long connection”) 3. A rough understanding of the concept of encryption algorithms (especially the difference between “symmetric encryption and asymmetric encryption”) 4. A rough understanding of the purpose of CA certificates
Considering that many technical beginners may not understand the above background, I’ll first describe it in the simplest terms. If you believe you are not a beginner, please skip this chapter and go directly to “The Requirements of the HTTPS Protocol”.
Let’s Clarify a Few Terms — HTTPS, SSL, TLS
1. What is “HTTP” Used For?
First of all, HTTP is a network protocol specifically designed to help you transmit web content. Even if you don’t understand this protocol, at least you must have heard of it, right? For example, when you visit the homepage of Baidu, the browser’s address bar will display the following URL:http://www.baidu.com/
The part I made bold refers to the HTTP protocol. Most websites transmit web pages and various elements contained in web pages (images, CSS styles, JS scripts) through the HTTP protocol.
2. What is “SSL/TLS” Used For?
SSL is the abbreviation for “Secure Sockets Layer”. It was designed by Netscape in the mid-1990s. (By the way, Netscape not only invented SSL but also many foundational web infrastructures, such as “CSS stylesheets” and “JS scripts”)
Why was the SSL protocol invented? Because the original HTTP protocol used on the internet was in plain text, which had many drawbacks — for example, the transmitted content could be eavesdropped on (sniffed) and tampered with. The SSL protocol was invented to solve these problems.
By 1999, SSL had become a de facto standard on the internet due to its widespread application. The IETF standardized SSL that year. After standardization, its name was changed to TLS (which stands for “Transport Layer Security”).
Many related articles refer to both as SSL/TLS because they can be seen as different stages of the same thing.
3. What Does “HTTPS” Mean?
Having explained HTTP and SSL/TLS, we can now explain HTTPS. The HTTPS protocol we usually refer to is essentially a combination of the “HTTP protocol” and the “SSL/TLS protocol”. You can roughly understand HTTPS as — “HTTP over SSL” or “HTTP over TLS” (after all, SSL and TLS are quite similar).
Let’s Talk About the Characteristics of the HTTP Protocol
As background knowledge, we also need to briefly discuss the characteristics of the HTTP protocol itself. HTTP has many characteristics, but considering the limited space, I will only discuss those related to HTTPS.
1. The Versions and History of HTTP
The HTTP protocol we are using today is version 1.1 (that is, HTTP 1.1). This version 1.1 was drafted in late 1995 (the technical document is RFC2068) and was officially released in 1999 (the technical document is RFC2616).
Before 1.1, there were two versions, 0.9 and 1.0, where HTTP 0.9 was not widely used, while HTTP 1.0 was widely used.
Additionally, it is said that next year (2015) the IETF will release the standard for HTTP 2.0. I am looking forward to it.
2. The Relationship Between HTTP and TCP
Simply put, the TCP protocol is the cornerstone of the HTTP protocol — the HTTP protocol relies on the TCP protocol to transmit data.
In the network layering model, TCP is referred to as the “transport layer protocol”, while HTTP is referred to as the “application layer protocol”.
Many common application layer protocols are based on TCP, such as “FTP, SMTP, POP, IMAP”, etc.
TCP is known as a “connection-oriented” transport layer protocol. I won’t elaborate on its specific details (otherwise, the length will get out of control again).
You just need to know: there are mainly two transport layer protocols, TCP and UDP. TCP is more reliable than UDP.
You can think of the TCP protocol as a water pipe, where water flows in from one end, and comes out from the other end. Moreover, the TCP protocol ensures that the data sent first arrives first (in contrast, UDP does not guarantee this).
3. How Does HTTP Use TCP Connections?
HTTP’s use of TCP connections can be divided into two modes: commonly referred to as “short connection” and “long connection” (the “long connection” is also called “persistent connection”, known in English as “Keep-Alive”). Suppose there is a webpage that contains many images and several external CSS and JS files.
In the “short connection” mode, the browser first initiates a TCP connection to get the HTML source code of the webpage (after obtaining the HTML, this TCP connection is closed).
Then, the browser begins to analyze the source code of the webpage, realizing that this page contains many external resources (images, CSS, JS).
Then, for each external resource, a separate TCP connection is initiated to retrieve these files locally (similarly, after fetching each external resource, the corresponding TCP connection is closed). In contrast, if it is a “long connection” mode, the browser will also initiate a TCP connection to fetch the page.
However, after fetching the page, that TCP connection will not immediately close but will temporarily remain open (the so-called “Keep-Alive”).
Then, after the browser analyzes the HTML source code and finds many external resources, it will use that same TCP connection to fetch those external resources.
In HTTP 1.0, “short connection” was used by default (at that time, during the early days of the web, pages were relatively simple, and there were not many issues with “short connection”);
By the time the draft for HTTP 1.1 began in late 1995, web pages had started to become more complex (with more images and scripts). Using short connections at this point would be inefficient (because establishing a TCP connection has a “time cost” and a “CPU cost”). Therefore, in HTTP 1.1, “Keep-Alive” was adopted by default.
For more information about “Keep-Alive”, you can refer to the Wikipedia entry (here).
Let’s Discuss the Concepts of “Symmetric Encryption” and “Asymmetric Encryption”
1. What Are “Encryption” and “Decryption”?
In simple terms, you can think of “encryption” and “decryption” as certain [inverse] mathematical operations. Just like “addition and subtraction” are inverse operations, and “multiplication and division” are inverse operations.
The process of “encryption” is the process of transforming “plaintext” into “ciphertext”; conversely, the process of “decryption” is the process of transforming “ciphertext” back into “plaintext”. In both processes, a key element — called a “key” — is required to participate in the mathematical operations.
2. What Is “Symmetric Encryption”?
“Symmetric encryption technology” means that the same key is used for both “encryption” and “decryption”. This is relatively easy to understand. It’s like creating a password-protected (encrypted) compressed file with 7zip or WinRAR.
When you want to unzip this compressed file next time, you need to enter the same password. In this example, the password is like the “key” mentioned earlier.
3. What Is “Asymmetric Encryption”?
“Asymmetric encryption technology” means that different keys are used for “encryption” and “decryption”. This concept is relatively difficult to understand and think about. The invention of “asymmetric encryption” was hailed as a revolution in the history of cryptography.
Due to space limitations, I won’t elaborate further on the topic of “asymmetric encryption”. If I have time, I’ll write a separate article to explain it.
4. What Are Their Advantages and Disadvantages?
After seeing the definitions, it is clear that (from a functional perspective) “asymmetric encryption” can do more than “symmetric encryption”. This is the advantage of “asymmetric encryption”. However, the implementation of “asymmetric encryption” often involves “complex mathematical problems”.
Therefore, the performance of “asymmetric encryption” is usually much worse (compared to “symmetric encryption”). The advantages and disadvantages of both also affect the design of the SSL protocol.
CA Certificates: Principles and Uses
For more on this topic, please see my article from four years ago titled “Introduction to Digital Certificates and CA”. I won’t repeat it here to avoid making the length too long.
What Are the Requirements for the HTTPS Protocol?
After spending a lot of time explaining the background knowledge, let’s officially get to the main topic. First, let’s discuss what requirements were intended to be met when designing HTTPS.
Many articles introducing HTTPS start right away with implementation details. Personally, I think this is not a good approach.
Back in 2009 when I started my blog, I wrote an article titled “The Three Stages of Learning Technology: WHAT, HOW, WHY”, which discusses the importance of “WHY questions”.
If you start by discussing protocol details, you can at best understand WHAT and HOW, but you cannot grasp WHY. In the previous chapter, I discussed “background knowledge”, and in this chapter, I discussed “requirements”, which helps you understand: why was it designed this way? — that is the WHY question.
Compatibility
Since HTTP came before HTTPS, the designers of HTTPS had to consider compatibility with the existing HTTP.
This compatibility includes many aspects. For example, existing web applications should migrate to HTTPS as seamlessly as possible; for browser vendors, the changes should be minimal; …
Based on compatibility considerations, several conclusions can be easily drawn: 1. HTTPS must still be based on TCP for transmission (if it were to switch to UDP for the transport layer, both the web server and browser client would require significant changes, which would be too disruptive) 2. A new protocol should encapsulate the HTTP protocol (the so-called “HTTP over SSL” essentially adds a layer of SSL encapsulation around the original HTTP data. The original mechanisms of the HTTP protocol, such as GET and POST, remain largely unchanged).
For example: if the original HTTP was a plastic water pipe that could easily be punctured, then the newly designed HTTPS is like wrapping a metal pipe around the original plastic water pipe. This way, the original plastic pipe still functions, and with the added metal layer, it is not easily punctured.
Scalability
As mentioned earlier, HTTPS is essentially “HTTP over SSL”.
If the SSL protocol is designed well in terms of scalability, it can not only work with HTTP but also with other application layer protocols. Wouldn’t that be great?
It seems that the designers of SSL were indeed quite brilliant. Nowadays, SSL/TLS can be paired with many commonly used application layer protocols (such as: FTP, SMTP, POP, Telnet) to enhance the security of these protocols.
Continuing the analogy I made earlier: If we consider SSL/TLS as a metal pipe used for reinforcement, it can be used to reinforce not only water pipes but also gas pipes.
Confidentiality (Preventing Leakage)
HTTPS needs to achieve sufficient confidentiality.
When it comes to confidentiality, it must first be able to resist sniffing (a technical term called Sniffer). The term “sniffing” refers to monitoring your network traffic. If you browse the web using plain HTTP, an observer can see which websites and pages you are accessing through sniffing.
Sniffing is the most basic form of attack. In addition to sniffing, HTTPS must also be able to resist other slightly more advanced attack methods — such as “replay attacks” (which will be discussed when explaining the protocol principles later).
Integrity (Preventing Tampering)
Aside from “confidentiality”, another equally important goal is to “ensure integrity”. I briefly mentioned the concept of “integrity” in my previous blog post titled “Understanding File Integrity Check — About Hash Values and Digital Signatures”. Students who are forgetful should review that again.
Before HTTPS was invented, since HTTP was in plain text, it was not only easy to sniff but also easy to tamper with.
For example:
Many internet service providers (ISPs) in our country are quite unscrupulous, and many users complain that when they visit a certain website (which originally had no ads), they suddenly see many advertisements from China Telecom.
Why does this happen? Because your network traffic has to pass through the ISP’s lines to reach the public network.
If you are using plain HTTP, it is easy for the ISP to inject advertisements into the pages you are visiting.
Therefore, when designing HTTPS, another requirement was to “ensure that the content of the HTTP protocol is not tampered with”.
Authenticity (Preventing Forgery)
When discussing the requirements of HTTPS, “authenticity” is often overlooked. In fact, the importance of “authenticity” is no less than that of “confidentiality” and “integrity”.
For example:
If you need to access a bank’s web site for online banking, how can you ensure that the website you are visiting is indeed the one you want to visit? (This sounds a bit convoluted)
Some naive students might say: by looking at the domain name in the URL. Why do I say such students are “naive”? Because the DNS system itself is unreliable (especially in the era when SSL was designed, even DNSSEC had not been invented). Due to the unreliability of DNS (which includes “domain spoofing” and “domain hijacking”), the domain name you see in the URL may not be real!
(Students who do not understand “domain spoofing” and “domain hijacking” can refer to my previous article titled “Understanding DNS Principles, Along with Discussions on Domain Hijacking and Domain Spoofing/Domains Pollution”)
Therefore, the HTTPS protocol must have some mechanism to ensure the requirement of “authenticity” (how to ensure it will be discussed in detail later).
Performance
Finally, let’s discuss the last requirement — performance. The introduction of HTTPS must not lead to significantly degraded performance. Otherwise, who would want to use it? To ensure performance, the designers of SSL must consider at least the following points:
-
How to choose encryption algorithms (“symmetric” or “asymmetric”)? 2. How to balance the “short connection” TCP method used by HTTP? (SSL was designed before 1995, at that time the HTTP version was still 1.0, which by default used the “short connection” TCP method — Keep-Alive was not enabled by default)
Advertisement Content
Follow Us