Key Summary of URL and HTTP Protocols

URL:

is a type of URI that identifies an Internet resource and specifies how to operate on or retrieve that resource. It can be identified either through the description of the primary access method or through its “location” on the network.

Most URLs follow a standard format that consists of three parts:

First part: scheme, which tells the web client how to access the resource.

Second part: server location, which informs the web client where the resource is located.

Third part: specifies a particular resource on the server, indicating which specific resource on the server is being requested.

URL Format

The syntax of most URL protocols is based on a general format composed of the following 9 parts:

The three most important parts are: scheme, host, and path.

Introduction to URL Components

URL Shortcuts

The web client can understand and use several URL shortcuts. Relative URLs provide a convenient shorthand for specifying a resource within another resource.Many browsers support auto-expansion, where the user inputs key parts of the URL, and the browser fills in the rest.

Relative URL

There are two types of URLs: absolute URLs and relative URLs.

Absolute URLs contain all the information needed to access a resource; relative URLs are incomplete and require resolution against another URL known as the base URL to obtain the full information required to access the resource. Relative URLs are a convenient shorthand for URLs.

Relative URLs are merely fragments or small parts of a URL, and applications processing URLs need to convert between relative and absolute URLs.

Relative URLs provide a convenient way to maintain a set of resources (HTML pages) since if relative URLs are used, links remain valid when a group of documents is moved; because relative URLs are interpreted relative to the new base, similar to functionalities like mirroring content on other servers.

Base URL

The first step in the conversion process is to find the base URL, which serves as the reference point for the relative URL.The base URL can come from several places:

(1) Explicitly provided in the text: some resources will explicitly specify the base URL.

(2) Encapsulated resource base URL: if a relative URL is found in a resource that does not explicitly specify a base URL, the URL of the resource it belongs to can be taken as the base.

(3) No base URL: if there is no base URL, it indicates that the relative URL is incomplete or broken.

Resolving Relative References

Resolution:To convert a relative URL into an absolute URL, the relative URL and absolute URL must be broken down into component segments, effectively parsing the URL, but this practice segments it into components, which can be referred to as parsing/decomposing the URL.

Automatic URL Expansion

Many browsers will attempt to automatically expand the URL when a user submits or enters it, providing convenience for the user, who does not need to input the complete URL; the browser automatically expands it.

There are two types of automatic expansion features:

(1) Hostname expansion: with hostname expansion, as long as there are some small hints, the browser can usually expand the hostname you entered into a complete hostname without help.

(2) History expansion: previously accessed URLs are stored, and when a user inputs a URL, it is matched against the prefixes of URLs in the history, providing some complete options for the user to choose from. Note: the behavior of URL automatic expansion may differ when used with proxies, which will be explained in detail later.

URL Character Set

URLs are portable:It is important that URLs can be securely transmitted through any Internet protocol because URLs must uniformly name all resources on the Internet, and different protocols use different mechanisms for data transmission.

URLs are readable: therefore, even if invisible or unprintable characters can traverse email programs and become portable, they cannot be used in URLs.

URLs are complete: sometimes people want URLs to include binary data or characters other than the common safe alphabet. Therefore, a transfer mechanism is needed to encode unsafe characters into safe characters before transmission.

URL Character Set

Many computer applications use the US-ASCII character set, which uses 7-bit binary to represent most keys provided by English keyboards and a few unprintable control characters used for text formatting and hardware notifications.

US-ASCII has good portability, but it does not support variant characters common in hundreds of non-Roman languages.

Thus, an escape sequence set was introduced, allowing any data or self-replicating information to be encoded using the limited set of US-ASCII characters, achieving portability and completeness.

Encoding Mechanism

To avoid the limitations posed by the safe character set, a “escaping” notation was designed to represent unsafe characters, which includes a percent sign (%), followed by two hexadecimal digits representing the ASCII code of the character.For example, the ~ symbol is escaped as %7E, the % symbol is escaped as %25, and the = symbol is escaped as %3D.

Character Limitations

In URLs, several characters are reserved and have special meanings, and it is not recommended to use them. If they are to be used outside of their reserved purposes, they should be encoded in the URL.

Add Teacher Tang on WeChat for Free Access

[Testing Development Experience Course]

Key Summary of URL and HTTP Protocols

Introduction to URL Components

Related posts

Leave a Comment Cancel reply