Optimization Strategies for HttpClient in High Concurrency Scenarios, Significantly Increasing QPS!

One memorable instance was a week before the Double Eleven shopping festival when one of our APIs was overwhelmed. At that time, the business logic of the API was quite simple, just calling a few external services to fetch data and then assembling the results. Logically, CPU and memory usage were not high, so the problem shouldn’t have been there. However, during the stress test, the QPS was stuck at 800 and wouldn’t budge. It was later identified that the culprit was actually — HttpClient.

Many people use Apache HttpClient by simply creating a new instance, thinking it’s convenient, but once high concurrency hits, all the issues surface. Connection reuse is not configured, connection timeouts are not set, and thread safety is overlooked, leading to a bottleneck.

1. Connection Pooling, Don’t Create New Instances Every Time

This is probably the most common pitfall.

You should know that HttpClient is not connection pooled by default. This means that for each request, it creates a new socket connection and closes it after the request is complete. This process involves three-way handshakes and four-way teardowns, which incurs significant TCP overhead.

The correct approach is to use PoolingHttpClientConnectionManager to reuse connections. The code looks like this:

public class HttpClientPool {
    private static final int MAX_TOTAL = 500;
    private static final int MAX_PER_ROUTE = 100;

    private static final CloseableHttpClient httpClient;

    static {
        PoolingHttpClientConnectionManager manager = new PoolingHttpClientConnectionManager();
        manager.setMaxTotal(MAX_TOTAL);
        manager.setDefaultMaxPerRoute(MAX_PER_ROUTE);

        RequestConfig config = RequestConfig.custom()
                .setConnectTimeout(2000)
                .setConnectionRequestTimeout(2000)
                .setSocketTimeout(3000)
                .build();

        httpClient = HttpClients.custom()
                .setConnectionManager(manager)
                .setDefaultRequestConfig(config)
                .evictIdleConnections(30, TimeUnit.SECONDS)
                .build();
    }

    public static CloseableHttpClient getHttpClient() {
        return httpClient;
    }
}

Here are a few key points:

setMaxTotal: The maximum number of connections in the entire connection pool.
setDefaultMaxPerRoute: The maximum number of connections per target host.
evictIdleConnections: Regularly clean up idle connections to prevent connection leaks.

With this setup, the QPS jumped from 800 to over 3000. Later, by adjusting the thread pool, we directly reached 5000, astonishing even the DBA.

2. Avoid Blocking, Enable Connection Request Timeout

Another often overlooked point is: ConnectionRequestTimeout.

Many people only configure socketTimeout, only to find that threads are stuck waiting for a connection. This happens because all connections in the pool are occupied, and new threads are left waiting.

The solution is quite simple: set a connection request timeout:

RequestConfig config = RequestConfig.custom()
        .setConnectionRequestTimeout(1000)
        .setConnectTimeout(2000)
        .setSocketTimeout(3000)
        .build();

If a connection is not obtained within 1 second, an exception is thrown immediately to avoid thread accumulation. Coupled with a fallback mechanism, such as using Hystrix or Resilience4j, the overall system stability can improve several times.

3. KeepAlive Strategy, Don’t Let Connections Drop Unnecessarily

By default, HttpClient may not always reuse connections. Especially when some server response headers do not include Keep-Alive information, the client will close the connection directly.

To ensure stable reuse, you need to customize the ConnectionKeepAliveStrategy:

ConnectionKeepAliveStrategy keepAliveStrategy = (response, context) -> {
    HeaderElementIterator it = new BasicHeaderElementIterator(
            response.headerIterator(HTTP.CONN_KEEP_ALIVE));
    while (it.hasNext()) {
        HeaderElement he = it.nextElement();
        String param = he.getName();
        String value = he.getValue();
        if (value != null && param.equalsIgnoreCase("timeout")) {
            return Long.parseLong(value) * 1000;
        }
    }
    return 20 * 1000; // Default 20 seconds
};

CloseableHttpClient httpClient = HttpClients.custom()
        .setConnectionManager(manager)
        .setKeepAliveStrategy(keepAliveStrategy)
        .build();

Once, when we called an internal service, the logs frequently showed Connection reset by peer, which looked like network jitter. Upon investigation, it turned out that connections in the pool were being closed prematurely by the server, resulting in errors during reuse. After implementing this strategy, the issue was completely resolved.

4. Asynchronous HttpClient, Truly Beneficial!

No matter how much you tune synchronous HttpClient, the bottleneck remains in the thread model. Each request occupies a thread, and with too many threads, context switching occurs, leading to CPU idling.

Later, I switched to AsyncHttpClient, where a single NIO Reactor thread can handle thousands of connections simultaneously.

public class AsyncHttpDemo {
    public static void main(String[] args) throws Exception {
        CloseableHttpAsyncClient asyncClient = HttpAsyncClients.custom()
                .setMaxConnTotal(500)
                .setMaxConnPerRoute(100)
                .build();
        asyncClient.start();

        HttpGet get = new HttpGet("https://api.example.com/data");
        Future<HttpResponse> future = asyncClient.execute(get, null);
        HttpResponse response = future.get(3, TimeUnit.SECONDS);

        System.out.println(EntityUtils.toString(response.getEntity()));
    }
}

During stress testing, on the same machine, the QPS increased from 5000 to **18000+**. Coupled with CompletableFuture or Reactor reactive streams, performance and resource utilization skyrocketed.

5. DNS Caching and Connection Reuse Traps

Another hidden point is: DNS Caching. Once, when we deployed in a cloud environment, multiple machines shared a load domain name. As a result, HttpClient kept hitting only a few fixed machines.

After investigating for a long time, it turned out that the JVM’s default DNS caching policy is “never expires”. This led to severe uneven load during high concurrency calls.

The solution is:

# JVM parameter
-Dsun.net.inetaddr.ttl=30

This means the DNS cache updates every 30 seconds. This way, HttpClient can poll new IPs, leading to more balanced load distribution.

6. Avoid Excessive Logging, Especially DEBUG

The final performance killer is logging.

Previously, we printed the complete response body in the interceptor (for debugging convenience), which resulted in the online API being overwhelmed. Each response was 1KB, and with a QPS of 5000, that amounted to 5MB/s of IO, causing both the disk and CPU to struggle.

Recommendations:

Log only the request URL, duration, and status code;
Response body length and traceId are sufficient;
Large logs should be processed in an asynchronous queue.

Sometimes, performance bottlenecks are not in the business logic but in these “unnoticeable small details”. Once HttpClient is optimized, even if the interface logic remains unchanged, the QPS can double.

Later, our service increased from an initial 800 QPS to over 20,000, using these “small tricks”. It’s not magic, just engineering experience.

In my opinion, HttpClient is something anyone can use, but to use it well, stably, and quickly, it requires gradual refinement. Don’t find it troublesome; it can help you prevent half of the online incidents.

-END–

I have created a free RPA tutorial for everyone: songshuhezi.com/rpa.html

🔥 Dong’s Private Collection 🔥 Dong, as an experienced programmer, has compiled the most comprehensive “Java Senior Architect Resource Collection”. The total size reaches 650GB.