Practical Guide to Handling One Million Concurrent Connections! 5 Key Linux Network Tuning Parameters to Boost Throughput by 300%

Linux | Red Hat Certified | IT Technology | Operations Engineer

👇 Join our technical exchange QQ group with the note 【Public Account】 for faster approval

Practical Guide to Handling One Million Concurrent Connections! 5 Key Linux Network Tuning Parameters to Boost Throughput by 300%

1. Background: When Concurrent Connections Encounter Performance Bottlenecks

1.1 Case Environment

Server Configuration:

vCPU: 8 cores | Memory: 16GB | Network Bandwidth: 4Gbps | PPS: 800,000

Observed Anomalies:

TIME_WAIT connections piled up (2464) with CLOSE_WAIT connections (4) occasionally experiencing new connection establishment timeouts

1.2 Initial Parameter Analysis

Original configuration viewed through sysctl:

net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_max_tw_buckets = 131072
net.ipv4.ip_local_port_range = 1024 61999

Key Defects: Small half-connection queue, narrow port range, and strict buffer limits.

2. In-Depth Diagnosis: Connection Status and Kernel Parameters

2.1 Connection Status Monitoring Techniques

Real-time TCP state statistics

watch -n 1 'netstat -ant | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'

Output Example:

ESTABLISHED 790
TIME_WAIT 2464
SYN_RECV 32  # Focus on half-connections!

Special Check for Half-Connections

# View SYN_RECV connection details
ss -ntp state syn-recv
# Monitor queue overflow
netstat -s | grep -i 'listen drops'

2.2 Key Parameter Interpretation

Practical Guide to Handling One Million Concurrent Connections! 5 Key Linux Network Tuning Parameters to Boost Throughput by 300%

3. Tuning Solutions: From Parameters to Practice

3.1 Connection Management Optimization

Resolving TIME_WAIT Accumulation

echo "net.ipv4.tcp_tw_reuse = 1" >> /etc/sysctl.conf
echo "net.ipv4.tcp_max_tw_buckets = 262144" >> /etc/sysctl.conf
echo "net.ipv4.ip_local_port_range = 1024 65000" >> /etc/sysctl.conf

Shortening Connection Recycle Time

echo "net.ipv4.tcp_fin_timeout = 30" >> /etc/sysctl.conf

3.2 Queue and Buffer Optimization

Expanding Connection Queue

echo "net.ipv4.tcp_max_syn_backlog = 65535" >> /etc/sysctl.conf
echo "net.core.somaxconn = 65535" >> /etc/sysctl.conf
echo "net.core.netdev_max_backlog = 10000" >> /etc/sysctl.conf

Adjusting Memory Buffers

cat >> /etc/sysctl.conf <<EOF
net.ipv4.tcp_mem = 8388608 12582912 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
EOF

3.3 Keepalive and Timeout Optimization

echo "net.ipv4.tcp_keepalive_time = 600" >> /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_intvl = 30" >> /etc/sysctl.conf

4. Verification and Monitoring

4.1 Real-time Monitoring Script

Connection Status Dashboard

#!/bin/bash
while true; do
  clear
  date
  echo "---- TCP Status ----"
  netstat -ant | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
  echo "---- Half-Connection Queue ----"
  ss -ltn | awk 'NR>1 {print "Listen Queue: Recv-Q=" $2 ", Send-Q=" $3}'
  echo "---- Port Usage Rate ----"
  echo "Used Ports: $(netstat -ant | grep -v LISTEN | awk '{print $4}' | cut -d: -f2 | sort -u | wc -l)/$((65000-1024))"
  sleep 5
done

Kernel Alert Rules (Prometheus Example)

alert: TCP_SYN_Dropped
expr: increase(node_netstat_Tcp_Ext_SyncookiesFailed{job="node"}[1m]) > 0
for: 5m
labels:
  severity: critical
annotations:
  summary: "SYN Queue Overflow (Instance {{ $labels.instance }})"

4.2 Load Testing Recommendations

Use wrk to simulate high concurrency:

wrk -t16 -c10000 -d60s http://service:8080

Key Monitoring Metrics:

  • Fluctuations in SYN_RECV Count

  • Packet Loss Count in netstat -s

  • Memory Usage Rate (free -m)

5. Pitfall Guide

5.1 Common Misconceptions

Blindly enabling tcp_tw_recycle

Will cause connection failures in NAT environments (removed from Linux 4.12)

Excessively large buffers causing OOM

Need to adjust tcp_mem based on memory:

# Calculate safe value (in pages, 1 page = 4KB)
echo $(( $(free -m | awk '/Mem:/ {print $2}') * 1024 / 4 / 3 )) >> /proc/sys/net/ipv4/tcp_mem

5.2 Parameter Dependencies

somaxconn must be ≥ application layer backlog

For example, Nginx needs to be adjusted synchronously:

listen 80 backlog=65535;

6. Conclusion

Through the tuning practices in this article, we achieved:

  • 70% reduction in TIME_WAIT connections

  • Maximum concurrent connections increased to over 30,000

  • Network throughput doubled

The final recommended configuration has been summarized in a GitHub Gist. Remember: tuning is not a one-time task; it requires continuous monitoring and iteration!

Linux Operations Materials Collection / Course Consultation

↓ Please scan the QR code below ↓

Practical Guide to Handling One Million Concurrent Connections! 5 Key Linux Network Tuning Parameters to Boost Throughput by 300%

What technical points and content would you like to see?

You can leave a message below to let us know!

Leave a Comment