An Introduction to TCP/IP and HTTP

An Introduction to TCP/IP and HTTP

Source: sowhat1412

Author: sowhat1412

1 TCP/IP

1.1 TCP/IP Definition

TCP/IP protocol suite is a set of protocols, also known as Internet Protocol Suite. Computers can communicate only by adhering to these rules. TCP and IP are just two of the important protocols, hence the name TCP/IP for this internet protocol suite, which actually includes four layers of protocols.

An Introduction to TCP/IP and HTTP

1.2 TCP/IP Functions

As mentioned above, TCP/IP is broadly divided into four layers. Next, let’s discuss the specific functions of these four layers.

1.2.1 Application Layer

The application layer directly provides users with various network service protocols, such as HTTP, Email, FTP, etc. These protocols are developed to address different needs in real life. Most of the time, users operate and assemble data at this layer, which essentially boils down to socket programming! As for how the specific data is transmitted over the network, that is managed by the three layers below.

1.2.2 Transport Layer

The transport layer provides communication services to the application layer, being the highest layer focused on communication, and also the lowest layer in user functionality. The transport layer provides logical communication for application processes that communicate with each other. It mainly includes TCP and UDP protocols.

  1. TCP provides connection-oriented data stream support, reliability, flow control, multiplexing, and other services.

  2. UDP does not provide complex control mechanisms.

Functions of the Transport Layer:

  1. Segmenting and encapsulating data sent from the application layer.

  2. Providing end-to-end transport services.

  3. Establishing logical communication between the sending host and the receiving host.

1.2.3 Network Layer

The network layer’s function is to achieve routing and forwarding of data packets. Wide area networks typically connect dispersed hosts or local area networks using many hierarchical routers. Therefore, the two communicating hosts are usually connected through multiple intermediate node routers. The network layer’s task is to select these intermediate nodes to determine the communication path between the two hosts. It also hides the details of the network topology connection from the upper layer protocols, making it appear as if the two communicating parties are directly connected from the perspective of the transport layer and network applications.

The IP protocol operates at this layer, providing routing and addressing functions, allowing two terminal systems to interconnect and determine the best path, while also having certain congestion control and flow control capabilities.

1.2.4 Link Layer

The data link layer implements the network driver for the network card interface to handle data transmission over physical media. Two common protocols at the data link layer are ARP (Address Resolution Protocol) and RARP (Reverse Address Resolution Protocol). They achieve the mutual conversion between IP addresses and physical MAC addresses of machines.

1.2.5 Data Transmission
An Introduction to TCP/IP and HTTP
  1. When using the TCP/IP protocol suite for network communication, communication occurs in a layered order with the other party. The sender moves down from the application layer, while the receiver moves up from the link layer.

  2. When the sender transmits data between layers, each layer will add a header information of that layer. Conversely, the receiving end will remove the corresponding header when transmitting data between layers.

  3. This practice of packaging data information is called encapsulation.

An Introduction to TCP/IP and HTTP

However, it is important to note that the IP layer has a Maximum Transmission Unit (MTU) limit, and similarly, the TCP layer has a Maximum Segment Size (MSS) limit during data transmission.

The MTU for Ethernet is 1500, the basic IP header length is 20, and the TCP header is 20, so the maximum value for MSS can reach 1460 (MSS does not include protocol headers, only application data).

Therefore, a large application layer message may be split into several pieces and transmitted one by one. The receiving party assembles the application layer data from each received packet, and only when a complete request is received is it considered complete, which is also the significance of the Content-Length field.

An Introduction to TCP/IP and HTTP
Data Packet Sending

1.3 OSI and TCP/IP

OSI

  • OSI, also known as the Open Systems Interconnection Reference Model, is a conceptual model proposed by the International Organization for Standardization, an attempt to create a standard framework for interconnecting various computers into a network worldwide, focusing on what the necessary functions of communication protocols are.

TCP/IP

  • The real-world network transmission communication protocol, focusing on what kind of programs should be developed to implement the protocol on computers.

Differences between OSI and TCP/IP

  1. OSI introduces the concepts of service, interface, protocol, and layering, while TCP/IP borrowed these concepts from OSI to establish the TCP/IP model.

  2. OSI has a model before protocols, and standards before practice.

  3. TCP/IP has protocols and applications before proposing a model, and it is based on the OSI model.

  4. OSI is a theoretical model, while TCP/IP has been widely used and has become the de facto standard for network interconnection.

After introducing the macro TCP/IP protocol suite, let’s dive into the world of networks.

An Introduction to TCP/IP and HTTP
Driving Now

2 Application Layer HTTP

2.1 A Brief Introduction to HTTP

2.1.1 Definition of HTTP

HyperText Transfer Protocol (HTTP) is a convention and specification for transmitting text, images, audio, video, and other hypertext data between any two points in the computer world.

An Introduction to TCP/IP and HTTP
HTTP
2.1.2 URI, URN, URL

URI: Uniform Resource Identifier, represents every available resource on the web. URI is just a concept; how it is implemented does not matter, the focus is on identifying a resource.

URN: Universal Resource Name, identifies a resource through a unique name or ID within a specific namespace.

URL: Universal Resource Locator, is actually a subset of URI that not only identifies a resource but also tells you how to access it. A standard URL must include: protocol, host, port, and path.

An Introduction to TCP/IP and HTTP
URL Template
  1. Protocol: The protocol used for communication between both parties, such as HTTP, ftp, file, etc.

  2. IP: The real IP address of the server.

  3. Port: The port exposed for service resources on the IP machine.

  4. Path: The storage path of the resource on the server, which is generally a file or access directory.

  5. Query: Optional configuration, separated by &, parameters stored in KV format.

Example of the relationship among the three:

  1. If you want to find a person, the person here is a type of resource URI.

  2. If you use an ID number + name to find them, that is a URN; the ID number + name only identifies the resource but does not confirm the resource’s address.

  3. If you use an address: XX province, XX city, XX district, XX unit, XX room, that is a URL, which not only identifies the resource but also locates its address.

2.2 HTTP Message Format

Both request and response messages consist of start line, header, empty line, and body, with the start line differing slightly.

2.2.1 Request

An Introduction to TCP/IP and HTTP
Request Message Format
2.2.1.1 Request Line

The request line contains three parts: request method, URL, protocol version. They are separated by spaces, and the request line ends with a carriage return + a line feed.

Request Method: Indicates what operation to perform on the target resource. HTTP 1.1 defines eight request methods listed in the table below, with the most commonly used being GET and POST.

An Introduction to TCP/IP and HTTP

URL: Specifies the target address for this access.

Protocol Version: Specifies the HTTP version supported by the client. Currently, the common HTTP versions are 1.1, 2.0, and 3.0. If the requester specifies 1.1, the responder will also reply using the HTTP 1.1 protocol.

2.2.1.2 Request Header

The request header informs the server of additional information about the request and the client itself. Each request header is a key-value pair, separated by a colon. Each request header forms a separate line, ending with a carriage return and line feed. Among all request headers, only Host is required; other request headers are optional. Here are some common request headers:

An Introduction to TCP/IP and HTTP

2.2.1.3 Empty Line

Contains only a carriage return and a line feed, with no other content. This empty line marks the end of the request header and is mandatory.

2.2.1.4 Request Body

Typically consists of user-defined information, which can be specified by the Content-Type in the message header.

2.2.1.5 Request Example
An Introduction to TCP/IP and HTTP
Request Example

2.2.2 Response

An Introduction to TCP/IP and HTTP
Response Message Format
2.2.2.1 Response Line

Specifies the HTTP version, status code, and a brief reason for the response.

2.2.2.2 Response Header

An Introduction to TCP/IP and HTTP

As for the empty line and message body, they are almost the same as the request, while the message body type is specified by Content-Type.

2.2.2.4 Response Example
An Introduction to TCP/IP and HTTP
Response Example

2.3 HTTP Header Fields

The HTTP protocol specifies a large number of header fields that can achieve various functions, but they can generally be divided into the following four categories:

  1. General Fields: Can appear in both request and response headers.

  2. Request Fields: Can only appear in request headers, further explaining request information or additional conditions.

  3. Response Fields: Can only appear in response headers, providing additional information about the response message.

  4. Entity Fields: Actually belongs to general fields but specifically describes additional information about the body.

By setting HTTP header fields, HTTP provides the following important functions:

  1. Content Negotiation: The client and server agree on the content of the response resource, such as language, character set, encoding method, and compression type.

  2. Cache Management: Resource characteristics can determine whether resources are cached on the client; note the differences among max-age, no-cache, no-store, must-revalidate.

  3. Entity Type: Obtained by parsing Content-Type to find out the MIME type of requests and responses.

  4. Connection Management: Achieved through reading configuration parameters for long and short connections.

2.4 HTTPS and HTTP

HTTP transmits data in plaintext, which poses several risks:

  1. Eavesdropping Risk: Information confidentiality, such as the ability to obtain communication content on the communication link.

  2. Tampering Risk: Information integrity, such as the forced insertion of spam ads.

  3. Impersonation Risk: Identity recognition, such as counterfeit websites posing as shopping sites like Taobao.

2.4.1 Overview of SSL/TLS
An Introduction to TCP/IP and HTTP
SSL/TLS

To ensure security, HTTPS was born. HTTPS adds the SSL/TLS encryption protocol between the HTTP and TCP layers, which can solve the three aforementioned problems.

  1. Achieves information confidentiality through hybrid encryption.

  2. Ensures integrity through hash algorithms, which can generate a unique sequence number for data.

  3. Places the server’s public key into a digital certificate to mitigate the risk of impersonation.

It should be noted that HTTP defaults to port 80, while HTTPS defaults to port 443.

2.4.2 Encryption Algorithms

Encryption algorithms are divided into symmetric encryption and asymmetric encryption.

  1. Symmetric Encryption: Uses a single key for both encryption and decryption, fast in operation but the key must be kept secret, making secure key exchange impossible. Common encryption algorithms include AES, DES, RC4, BlowFish, etc.

  2. Asymmetric Encryption: Uses public key and private key, where the public key can be distributed freely while the private key remains confidential, solving the key exchange issue but slower. The process of deriving the private key from the public key is one-way, ensuring the security of the private key. Common encryption algorithms include RSA, DSA, Diffie-Hellman, etc.

HTTPS employs a symmetric encryption + asymmetric encryption = hybrid encryption approach:

  1. Before establishing communication, asymmetric encryption is used to exchange keys, and thereafter, asymmetric encryption is no longer used.

  2. During communication, all plaintext data is encrypted using the session key through symmetric encryption.

2.4.3 Hash Algorithms

The main feature of hash algorithms is that the encryption process does not require a key, and the encrypted data cannot be decrypted. Currently, the only algorithm that can be decrypted reversely is CRC32. Only identical plaintext data processed through the same message digest algorithm can yield the same ciphertext.

Message digest algorithms are mainly used in the field of digital signatures as a summary algorithm for plaintext. Notable hash algorithms include the MD5 algorithm and SHA-1 algorithm from RSA Company and their numerous variants.

An Introduction to TCP/IP and HTTP
Checksum Integrity
  1. The client generates a digest by processing plaintext data through the specified hash algorithm.

  2. The plaintext data + hash algorithm is encrypted with the public key and transmitted.

  3. The server uses the private key to decrypt the message to obtain the plaintext + digest.

  4. The server generates a digest for the plaintext using the same hash algorithm.

  5. By comparing the two digests generated by the client and server, it can be determined if the data is intact.

2.4.4 CA Certificates

During asymmetric encryption, the client stores the public key. Ensuring the accuracy of the public key is a challenge; if someone steals the server’s public key, the entire data transmission process would be unaware of the third party’s existence, but the information would have already been leaked!

An Introduction to TCP/IP and HTTP
Asymmetric Encryption Information Leakage

The key issue is how to ensure that the client receives the server’s public key! At this point, the digital certificate comes into play, which is based on the aforementioned private key encrypted data, decrypted with the public key to verify identity.

An Introduction to TCP/IP and HTTP
CA Ensures Correct Transmission of Public Key
  1. The CA is a certification authority that issues certificates; there are only a few authoritative companies worldwide. This authority generates a pair of public and private keys using RSA.

  2. Contents of the server’s public key + issuer ID + certificate issued to whom (Subject) + validity period + other information = plaintext content P.

  3. The plaintext content P is processed through a hash algorithm to generate H1, which is then encrypted with the CA’s private key to obtain S.

  4. P + S = digital certificate.

  5. The client receives the digital certificate and applies the same hash algorithm to P to generate H2.

  6. Using the CA’s public key, we decrypt S to obtain H3.

  7. By comparing H2 and H3, we can determine if the certificate is valid. If they match, it indicates that the certificate is OK. If they do not match, it means that P has been modified or the certificate was not issued by the CA.

  8. If they match, the server’s public key can be correctly extracted. Done!

2.4.5 SSL/TLS Establishment Process

First, a TCP three-way handshake occurs, followed by preparing for encrypted communication. Before starting encrypted communication, the client and server must first establish a connection and exchange parameters; this process is called the handshake HandShake, which is the SSL/TLS module mentioned earlier. So what is the main workflow? You can think of it as ClientHello, ServerHello, Finish.

An Introduction to TCP/IP and HTTP
SSL/TLS Establishment Process
  1. Client Request

The client initiates an encrypted communication request: the client provides the SSL/TLS protocol version + a random number generated by the client Random1 + the supported encryption methods.

  1. Server Request

The server confirms whether the SSL/TLS version is supported, confirms the encryption algorithm used, generates a random number Random2 (used to generate the session key), and generates the server’s digital certificate.

  1. Client Certificate Verification

  1. The client verifies the authenticity of the server’s digital certificate through the CA’s public key and retrieves the server’s public key.

  2. The client generates a random number Random3, encrypts it with the server’s public key to generate the PreMaster Key, and sends it to the server, along with the agreed encryption algorithm.

  3. The server decrypts the PreMaster Key using its private key to obtain Random3. At this point, both the server and client use the same encryption algorithm to encrypt Random1 + Random2 + Random3 = Session Key, which will be used for future communication.

  4. The client generates a digest of the previous handshake messages and encrypts it with the agreed key, which is the first encrypted message sent by the client. After receiving it, the server decrypts it, and if it can be decrypted, it confirms that the negotiated key is consistent.

  1. Server Response

  1. The server receives Random3 + the final encryption algorithm, and finalizes the Session Key.

  2. The server notifies the client that the encryption algorithm has changed, and from now on, it will use the Session Key to encrypt information.

  3. The server also generates a digest of the handshake messages and encrypts it; this is the first encrypted message sent by the server. After the client receives it, it decrypts it, confirming that the negotiated key is consistent.

  1. Normal Data Transmission

  • At this point, both parties have safely negotiated the same key, and the SSL/TLS handshake phase is fully completed. All application layer data will be encrypted with this key and transmitted reliably via TCP.

2.4 HTTP Development History

Currently, HTTP versions are divided into HTTP/1.1, HTTP/2, and HTTP/3, with the first two being the mainstream.

An Introduction to TCP/IP and HTTP
Comparison of HTTP Versions
2.4.1 HTTP/1.1

Compared to older versions, HTTP/1.1 has the following advantages and disadvantages:

Advantages:

  1. TCP begins to use long connections instead of short connections to avoid unnecessary performance overhead.

  2. For example, when sending ABC, the sending of B does not need to wait for A to finish before starting to send B.

Disadvantages:

  1. Request/response headers are sent without compression, only the body part can be compressed.

  2. Redundant configuration information is sent back and forth.

  3. Can cause header blocking.

  4. FIFO mode, with no concept of priority.

  5. Only client requests and server responses.

2.4.1 HTTP/2

The HTTP/2 protocol is based on HTTPS, maintaining backward compatibility while also optimizing as follows:

  1. Header Compression: Introduces the HPACK algorithm, where the client and server maintain a header information table, storing all fields in this table, so repeated header information does not need to be sent, only the index number is sent.

  2. Binary Transmission: The new version uses a binary mode of transmission that is more friendly to computers, transmitting data in frames.

  3. Stream Priority Transmission: Different request-response data packets are distinguished by Stream, with each Stream having an independent number. Priority can also be specified.

  4. Multiplexing: Multiple streams can simultaneously send and receive request-response data frames within one connection. The data packets in each stream are transmitted and assembled in order, and each stream is independent, so whoever processes the request first can send the response to the other party through the connection.

  5. Server Push: The server actively pushes potentially needed static variables, such as JS, CSS, etc.

Disadvantages:

  1. Blocking Issues: HTTP/2’s frame transmission occurs at the application layer, and the final data must be transmitted via TCP, which is a reliable connection with packet loss and retransmission capabilities. If a packet is lost, all HTTP requests will wait for the lost packet to be retransmitted.

2.4.1 HTTP/3

HTTP/3 changes the TCP protocol to UDP because UDP does not care about order or packet loss. At the same time, Google has added TCP connection management, congestion window, flow control, etc., on top of UDP, which is referred to as the QUIC protocol. Overall, the optimization points of HTTP/3 are as follows:

  1. QUIC has a unique mechanism to ensure transmission reliability. When a stream experiences packet loss, only that stream will be blocked, and other streams will not be affected.

  2. The TLS algorithm has also been upgraded from 1.2 to 1.3, and the header compression algorithm has been upgraded to QPack.

  3. Before HTTP/3 communication, three TCP handshakes + three TLS encryption interactions were required. QUIC combines these six steps into three.

  4. QUIC is a protocol that combines TCP + TLS + HTTP/2 multiplexing over UDP.

2.5 HTTP Features

  1. Flexible Extension

The brilliance of HTTP lies in the fact that it only specifies the basic framework of header + body; what to fill in is up to the user, and its underlying components are modular, such as adding SSL/TLS, binary frame transmission, replacing TCP with UDP, etc.

  1. Reliable Transmission

Whether using TCP or QUIC, the reliability of data transmission is guaranteed.

  1. Request-Response Model

HTTP implements data transmission based on the request-response model.

  1. Stateless

Each request-response in HTTP is stateless, so each message sent and received is completely independent. If some chain reactions need to be implemented, the Session and Cookie mechanisms must be used.

  1. Application Layer Protocol

HTTP is merely a transmission protocol defined at the application layer, while its underlying transport protocol is TCP.

2.6 Common HTTP Status Codes

Common HTTP status codes are categorized into five types.

An Introduction to TCP/IP and HTTP

3 Appendix

This article only briefly explains the application layer and transport layer of the TCP/IP protocol; the next article will cover the network layer in more detail. For a more detailed version of the TCP/IP protocol.

An Introduction to TCP/IP and HTTP
TCP/IP Protocol

4 References

  1. SSL/TLS: https://www.bilibili.com/read/cv1003133

  2. HTTP Comprehensive Guide: https://t.1yb.co/gcKW

  3. Xiao Lin’s Network Special: https://t.1yb.co/fQG3

  4. HTTP Status Codes: http://tools.jb51.net/table/http_status_code

  5. TCP/IP Explanation: https://developer.51cto.com/art/201906/597961.htm

-End-

An Introduction to TCP/IP and HTTP

An Introduction to TCP/IP and HTTP

The program dog has a shared mask project, with a development cycle of one month.

An Introduction to TCP/IP and HTTP

These few movies are suitable for programmers to watch! Make sure to recommend them!

An Introduction to TCP/IP and HTTP

This Java class has a 100% usage rate, can you guess which one?

An Introduction to TCP/IP and HTTP Remember to add ice to your cola; if you love me, you must pin it An Introduction to TCP/IP and HTTP
An Introduction to TCP/IP and HTTP
An Introduction to TCP/IP and HTTP
An Introduction to TCP/IP and HTTPQuality three consecutive biubiubiu~An Introduction to TCP/IP and HTTP

Leave a Comment