Understanding HTTP Proxies: Theory and Practice

Understanding HTTP Proxies: Theory and Practice
/ Today’s Tech News /
Sam Altman, the CEO of OpenAI, known as the “father of OpenAI”, warned at a Brookings Institution discussion on artificial intelligence and geopolitics that the economic impact of AI may be greatly underestimated.
Altman said, “What I am most worried about right now is what kind of changes socioeconomic changes might bring, how fast and to what extent these changes will happen, and what impacts they will have.”
Altman pointed out that discussions around the economic impact of AI have decreased compared to last year. He expressed concern about what might happen in the future if people do not pay attention to these potential issues.
/ Author’s Profile /
This article comes from a submission by Lu Yecong, mainly sharing information related to HTTP proxies, which is believed to be helpful for everyone! Thanks to the author for contributing this wonderful article.
Original link:
https://mp.weixin.qq.com/s/9H7aWD-nXdpnh-etIb2o0Q
/ Introduction /
This article is divided into three parts, gradually delving into a lot of theoretical knowledge and practical development related to networking:
  • Part One: Discussing the application scenarios and basic theoretical knowledge of HTTP proxies.
  • Part Two: Introducing a case from the author’s project where a local proxy service is used to proxy WebView traffic, enabling access to internal applications from the external network.
  • Part Three: Introducing some source code implementations of the proxy module in Chromium.
/ Knowledge of HTTP Proxies /
Application Scenarios of Proxy Servers
There are many use cases for proxy servers, such as:
  1. In company or school networks, access to the Internet may require a proxy server.
  2. To protect privacy, users may use proxy servers to hide their IP addresses.
  3. To bypass geographical restrictions, users may use proxy servers located in specific countries/regions to access certain websites or services.
  4. Developers may use proxy servers to debug HTTP requests and responses.
Normal Proxies and Tunnel Proxies
Normal proxies and tunnel proxies are both forms of network proxies, and they have some similarities and differences in handling client requests and data transmission. Below is a description of these two types of proxies.
Normal Proxy
Definition
A normal proxy, also known as a forward proxy, sits between the client and the target server. The client sends requests to the proxy server, which forwards the requests to the target server. The responses returned by the target server are also routed back to the client through the proxy server.
The definition from the “HTTP: The Definitive Guide” is:
HTTP clients send request messages to proxies, which need to handle requests and connections correctly (e.g., correctly handle Connection: keep-alive), send requests to the server, and forward the received responses back to the client.
Understanding HTTP Proxies: Theory and Practice
Characteristics
The main characteristics of a normal proxy are:
  1. The proxy server can modify the client’s requests and the target server’s responses, such as adding, deleting, or modifying HTTP headers.
  2. The proxy server can cache responses from the target server to improve access speed and reduce network bandwidth consumption.
  3. The proxy server can filter and audit HTTP requests, implementing access control and security policies.
Tunnel Proxy
Definition
A tunnel proxy is a special type of proxy server that establishes a transparent TCP tunnel between the client and the target server. The client establishes a direct TCP connection to the target server through the tunnel, and the proxy server does not modify or parse the data being transmitted.
The definition from the “HTTP: The Definitive Guide” is:
HTTP clients request a tunnel proxy to create a TCP connection to any destination server and port using the CONNECT method, forwarding subsequent data between the client and server blindly.
Understanding HTTP Proxies: Theory and Practice
Characteristics
The main characteristics of a tunnel proxy are:
  1. The proxy server does not modify or parse the data transmitted through the tunnel; it only forwards the data packets.
  2. Tunnel proxies are often used to establish secure connections (such as SSL/TLS), in which case the proxy server cannot view or modify the encrypted data.
  3. Tunnel proxies can traverse firewalls and NAT devices, accessing internal or restricted network resources.
Similarities and Differences between Normal Proxies and Tunnel Proxies
Understanding HTTP Proxies: Theory and Practice
Proxy Server Authentication Process
When Chromium initiates a request through a proxy server that requires authentication, the following process occurs:
Understanding HTTP Proxies: Theory and Practice
Differences between Proxy Connection and Direct Connection
There are some key differences in the process of sending traffic to a proxy server versus sending it directly to the target server:
Understanding HTTP Proxies: Theory and Practice
/ How to Establish a Local Proxy for WebView in Android /
Case Background
In the project I am involved in, one application scenario using web proxies is: some pages are internal applications that cannot be accessed under mobile networks, so it is necessary to forward requests from internal applications to the internal proxy gateway, while other requests can be sent directly to the external network.
Thought Analysis
Our solution is to establish a local proxy service on the App side, forwarding all WebView traffic to the local proxy service for processing, which will decide whether to send requests through the proxy connection or directly.
Why establish a local proxy service on the client side? Can the WebView proxy domain be set to the actual gateway domain directly?
In theory, it is possible, but establishing a local proxy service on the client side has several advantages:
  • The local proxy service contains a lot of business logic (such as nearby access, determining whether to go through a VPN tunnel or direct connection, determining whether to access resources through specific access nodes, etc.). If the local proxy service is removed, each client would need to implement this logic themselves.
  • In addition to proxying WebView traffic, the local proxy service also proxies CGI traffic (by setting a proxy for libcurl). These two forms of business are quite different, and the local proxy service can consolidate the processing of both types of traffic in one place.
  • Each client (Android, iOS, PC) sets the proxy directly for the browser, making it impossible to embed business logic into the browser kernel. By using C++, this can be reused across Android, iOS, and PC.
Based on the above reasons, we established a local proxy service within the client, which is essentially a cross-platform HTTP server within the client.
Overview of the Solution
Thus, we set the WebView proxy address to the local address 127.0.0.1 and initialized a local HTTP server to proxy the WebView requests. For the local proxy service, we used a C++ implementation based on libevent, allowing reuse of this proxy service across Android, iOS, and PC.
Below is the sequence diagram of the overall solution implementation:
Understanding HTTP Proxies: Theory and Practice
Setting the WebView Proxy Address and Port
First, it is necessary to set the WebView proxy address and port. Since we are using a local proxy service, the host is set to 127.0.0.1, and the port is randomly chosen from an available port number.
The following implementation mainly does the following:
  1. Check if the WebView supports proxy override functionality.
  2. Create a ProxyConfig.Builder instance and add proxy rules.
  3. Check if the WebView supports reverse proxy override functionality and add exception rules. When setReverseBypassEnabled is true, addresses added through addBypassRule will be treated as a blacklist. That is, addresses in the list will be accessed through the proxy server, while addresses not in the list will be accessed directly without going through the proxy server.
  4. Add direct connections to the proxy rules.
  5. Apply the proxy configuration and set the callback function.
// Check if the WebView supports proxy override functionality
if (ReflecterHelper.invokeStaticMethod("androidx.webkit.WebViewFeature", "isFeatureSupported", arrayOf(WebViewFeature.PROXY_OVERRIDE)) as? Boolean != true) {
    return
}

// Create a ProxyConfig.Builder instance and add proxy rules
val builder = ReflecterHelper.newInstance("androidx.webkit.ProxyConfig$Builder")
ReflecterHelper.invokeMethod(builder, "addProxyRule", arrayOf(proxy))

// Check if the WebView supports reverse proxy override functionality and add exception rules
if (ReflecterHelper.invokeStaticMethod("androidx.webkit.WebViewFeature", "isFeatureSupported", arrayOf(WebViewFeature.PROXY_OVERRIDE_REVERSE_BYPASS)) as? Boolean == true) {
    urlsToProxy?.forEach {
        if (!TextUtils.isEmpty(it)) {
            ReflecterHelper.invokeMethod(builder, "addBypassRule", arrayOf(it))
            ReflecterHelper.invokeMethod(builder, "setReverseBypassEnabled", arrayOf(true))
        }
    }
}

// Add direct connections to the proxy rules
ReflecterHelper.invokeMethod(builder, "addDirect")

// Apply the proxy configuration and set the callback function
val controller = ReflecterHelper.invokeStaticMethod("androidx.webkit.ProxyController", "getInstance")
val config = ReflecterHelper.invokeMethod(builder, "build")
ReflecterHelper.invokeMethod(controller, "setProxyOverride", arrayOf(config.javaClass, Executor::class.java, Runnable::class.java), arrayOf(config, Executor { command -> command.run() }, callback))
Establishing a Local Proxy Service on the App Side
Implementation Thought
The local proxy service is responsible for listening to local address traffic. If it encounters a URL that needs to be forwarded to the proxy gateway, it forwards the request through the local proxy service; otherwise, it sends the request directly. The complete implementation details here are quite complex, and this article can only showcase a small part.
Code Implementation
Below is how to initialize an HTTP SERVER using libevent on the App side:
  • Create a new evhttp instance, set the allowed HTTP methods, and attempt to bind a port on the specified IP address.
  • If binding fails, it will attempt to randomly generate a port and bind it 10 times.
  • After successfully binding the port, the function will display the listening socket information and return success.
*http_server = evhttp_new(base);
evhttp_set_allowed_methods(*http_server, /*...*/);

for (int i = 0; i < 10; i++) {
    if (port == 0) {
        port = (rand() % 20000) + 10000;
    }

    handle = evhttp_bind_socket_with_handle(*http_server, PROXY_IP_ADDRESS, port);
    if (!handle) {
        port = 0;
        continue;
    }
    break;
}

if (port == 0) {
    // Return error
}

// Successfully returned
How the Local Proxy Service Forwards Traffic
Implementation Thought
If it is an HTTPS request, the local proxy service first forwards the CONNECT request, obtains a successful response from the gateway access machine, and establishes a network tunnel, then transparently forwards the traffic between the WebView and the gateway access machine. Subsequent interaction traffic is SSL encrypted communication, and the local proxy service cannot decrypt it.
The gateway access machine uses Nginx, which itself has the capability to establish proxy tunnels.
Code Implementation
The following implementation is used to forward CONNECT requests to the gateway access machine, and the main logic is:
  • Get the host of the connection and the client connection.
  • Get the client’s bufferevent and proxy connection, and create a new proxy request.
  • Copy the request headers and body, and create a proxy request. If it is a CGI thread, for performance optimization, reply to the client with 200 OK first.
// Get the host of the connection
string host = ctx->connect_host;

// Get the client connection and bufferevent
ctx->client_conn = evhttp_request_get_connection(ctx->client_req);
ctx->client_bufev = evhttp_connection_get_bufferevent(ctx->client_conn);

// Get the proxy connection
ctx->proxy_conn = get_proxy_connection(ctx, ctx->scheme, ctx->ip, ctx->port);

// Create a new proxy request and set the error callback
ctx->proxy_req = evhttp_request_new(connect_request_done, ctx);
evhttp_request_set_error_cb(ctx->proxy_req, http_request_error);

// Copy the request headers and body	http_header_copy(ctx, ctx->client_req, ctx->proxy_req, CLIENT_TO_GATEWAY);
evbuffer_add_buffer_reference(evhttp_request_get_output_buffer(ctx->proxy_req), 
evhttp_request_get_input_buffer(ctx->client_req));

// Create the proxy request
evhttp_make_request(ctx->proxy_conn, ctx->proxy_req,
evhttp_request_get_command(ctx->client_req), host.c_str());

// Add connections to global context
ctx->gCtx->AddConn(ctx->client_bufev, ctx->client_conn);
ctx->gCtx->AddConn(ctx->proxy_bufev, ctx->proxy_conn);

// If it is a CGI thread, reply to the client with 200 OK, set the callback function, and disable read operations
if (ctx->gCtx->IsCgiThread()) {
evhttp_send_reply(ctx->client_req, 200, "Connection Established", NULL);
bufferevent_setcb(ctx->client_bufev, readcb, NULL, eventcb, ctx);
bufferevent_disable(ctx->client_bufev, EV_READ);
}
Explanation of libevent Interfaces
The above code implementation involves many libevent interfaces, and below is a categorized list of these interfaces along with brief explanations:
Connection Management:
  • evhttp_request_get_connection: Gets the evhttp_connection object associated with the specified request.
  • evhttp_connection_get_bufferevent: Gets the bufferevent object associated with the specified evhttp_connection.
Request Creation and Sending:
  • evhttp_request_new: Creates a new evhttp_request object and sets the callback function when the request is completed.
  • evhttp_request_set_error_cb: Sets the error callback function for the specified evhttp_request.
  • evbuffer_add_buffer_reference: Adds the contents of one evbuffer to another evbuffer while keeping a reference to the original buffer.
  • evhttp_make_request: Associates the specified evhttp_request with the evhttp_connection and sends the request.
Response Handling:
  • evhttp_send_reply: Sends an HTTP response to the client.
Event Callback and Control:
  • bufferevent_setcb: Sets the callback function for the specified bufferevent.
  • bufferevent_disable: Disables certain events on the specified bufferevent (in this case, disables read operations).
These interfaces cover the main functionalities of creating and managing HTTP requests, connections, buffers, and event callbacks. By using these interfaces, the core functions of the proxy server can be implemented, such as forwarding requests, handling responses, and managing connections.
/ How Chromium Implements Proxy Connections /
In the second part, we set the proxy for the WebView kernel by reflecting the internal interfaces of WebView. How does Chromium redirect web traffic to the proxy server address we set? This section will address this question.
The Process of Redirecting Traffic to the Proxy Server in Chromium
When an HTTP request is initiated, Chromium first needs to determine whether to use a proxy server. The following are the main steps Chromium takes to redirect traffic to the proxy server:
  1. Get Proxy Configuration: Chromium obtains proxy configuration through ProxyConfigService. These configurations may come from user settings or operating system settings. ProxyConfigService returns a ProxyConfig instance containing proxy rules and exception lists.
  2. Parse Proxy Rules: ProxyService selects the appropriate proxy server for the HTTP request based on the proxy rules in the ProxyConfig. This process may involve parsing PAC files (via ProxyResolverV8) or using fixed proxy rules (via ProxyResolverFixed).
  3. Select Proxy Server: ProxyService selects a suitable proxy server for the request based on the URL and proxy rules. If no suitable proxy server is found, or if a direct connection (DIRECT) is configured, the request will be sent directly to the target server.
  4. Establish Connection: Chromium uses ClientSocketPoolManager to manage network connections. When a proxy server is needed, ClientSocketPoolManager creates a new ClientSocketHandle for the proxy server. This ClientSocketHandle contains the IP address and port of the proxy server.
  5. Send Request: Chromium sends the HTTP request to the proxy server. If the proxy server requires authentication, Chromium handles the authentication process. For HTTP proxies, Chromium adds the Proxy-Connection field to the HTTP request header. For SOCKS proxies, Chromium follows the SOCKS protocol to send requests.
  6. Receive Response: The proxy server forwards the request to the target server and returns the target server’s response to Chromium. Chromium processes the response, parses the page content, and presents it to the user.
Through these steps, Chromium can redirect traffic to the proxy server, enabling access control, privacy protection, and other functions in different network environments.
Source Code Files for Proxy Servers in Chromium
The net/proxy directory in Chromium contains source code files related to proxy servers. Below are some major files and their corresponding functions:
  1. proxy_config.cc / proxy_config.h: The ProxyConfig class represents proxy configuration, including proxy rules and exception lists. These configurations can come from user settings or operating system settings.
  2. proxy_config_service.cc / proxy_config_service.h: The ProxyConfigService class is an abstract class used to obtain the current ProxyConfig. The specific implementation may depend on the operating system or user settings.
  3. proxy_info.cc / proxy_info.h: The ProxyInfo class contains information about the proxy server selected for a specific URL. When initiating an HTTP request, ProxyService uses ProxyInfo to determine which proxy server to use.
  4. proxy_list.cc / proxy_list.h: The ProxyList class represents a set of alternative proxy servers. In some cases, there may be multiple proxy servers to choose from, and ProxyList provides functionality to select an available proxy from them.
  5. proxy_service.cc / proxy_service.h: The ProxyService class is responsible for selecting the appropriate proxy server for HTTP requests based on proxy configuration. It uses ProxyConfigService to obtain proxy configurations and applies them to HTTP requests.
  6. proxy_server.cc / proxy_server.h: The ProxyServer class represents a specific proxy server, including the proxy protocol (such as HTTP, SOCKS4, SOCKS5, etc.), hostname, and port.
  7. proxy_resolver.cc / proxy_resolver.h: The ProxyResolver class is an abstract class used to resolve proxy rules. Specific implementations may include PAC file parsing (proxy_resolver_v8.cc / proxy_resolver_v8.h) or fixed proxy rules (proxy_resolver_fixed.cc / proxy_resolver_fixed.h).
/ Conclusion /
This article revolves around the topic of network proxies, first explaining the theoretical foundations, then providing a specific case of WebView proxy from the author’s project, and finally delving into the proxy implementation in Chromium’s source code, showcasing the theory and application of network proxies from shallow to deep. I hope it can help readers better utilize proxy servers in practical scenarios to achieve related needs.
Recommended Reading:
My new book, “The First Line of Code, 3rd Edition” has been published!
Android Non-intrusive Theme Switching Revealed Animation, Imitating Telegram/Ku’an
New Features in Android 14, Selective Photo and Video Access Authorization
Welcome to follow my public account
Learning technology or submitting articles

Understanding HTTP Proxies: Theory and Practice

Understanding HTTP Proxies: Theory and Practice
Long press the image above to scan the QR code to follow

Leave a Comment