An Overview of TC Traffic Control in Linux Kernel Networking

Last night, I dreamt of falling flowers by a quiet pond, how pitiful that spring is halfway gone and I haven’t returned home

Linux kernel Traffic Control (TC) refers to the queuing and scheduling mechanism for receiving and sending data packets by network devices. It can implement basic functions such as Shaping, Scheduling, Policing ingress, and Dropping, as well as functionalities like traffic mirroring, NAT, skbedit, csum, mark, and BPF, and can also be extended to add custom actions.

An Overview of TC Traffic Control in Linux Kernel Networking

TC is a network framework that is more complex than netfilter and can implement most of its functionalities. Netfilter targets packets and is a network packet filtering system focused on message tracking and filtering; while TC deals with traffic, being a traffic control system that emphasizes traffic scheduling, thus having different focuses.

Generally, subsystems in the Linux operating system consist of two parts: user-space control logic and kernel-space core processing. For the netfilter framework, the user-space configuration tool is iptables, and the kernel-space consists of five major hook points on the protocol stack datapath.

The following diagram shows the user-space part: iptables + library; the kernel-space part: supporting the user-space is the nfnetlink submodule + the core five hook points.

An Overview of TC Traffic Control in Linux Kernel Networking

All functionalities under the Netfilter framework are based on the five mandatory hook points of the network protocol stack data channel. It’s important to note that not all application scenarios in Linux require these five hooks; for example, routing and switching devices only need pre-routing, forward, and post-routing; for specific service applications like HTTP, FTP, DHCP, storage services, and various management platforms, these services may only require the local in or out hook point.

An Overview of TC Traffic Control in Linux Kernel Networking

The TC framework includes the user-space configuration tool tc command (from the iproute2 toolkit) and the kernel-space traffic control system. TC operates on the network interface, positioned below the network protocol stack and above the drivers.

The four basic components of the TC framework are:

qdisc/class/filter/action

  • queueing discipline (qdisc): queuing rules that complete functions like speed limiting and shaping based on certain algorithms;

  • class: user-defined traffic categories;

  • classifier (also called filter): a classifier that defines classification rules;

  • action: actions to be performed on packets;

Typically, to configure traffic control on a network card, the following steps need to be followed:

1. Create a qdisc for the network device;

qdisc is a shaper/shaper that can contain multiple classes; it needs to be attached to a network interface and a traffic direction (ingress or egress). This includes classless queues (fifo/pfifo_fast/tbf/sfq/esfq) and classified queues (htb/cbq/prio).

2. Create traffic categories (class) and attach them to the qdisc;

Sub-qdiscs and sub-classes can be established based on scene requirements to implement hierarchical scheduling.

3. Create a filter (classifier) and attach it to the qdisc;

Filters are used to classify the traffic on the network device and dispatch packets to the previously defined different classes.

4. Add actions to the filter;

For example, drop the selected packet or mirror the traffic to another network device, etc.

classifier+action mode

The classifier can match packets but the returned classid can only inform the system which class (queue) to send the packet to next; the action returns the action, telling the system what to do next with the packet (drop, allow, mirror, etc.), but it cannot classify the packet (rule matching).

Therefore, to achieve the goal of “matching + executing actions”, for example, if the source IP is 10.1.1.1, drop this packet, two steps are needed: one classifier and one action, which is the classifier+action mode.

Ingress TC

For ingress traffic, traffic control can be implemented using an intermediate device such as ifb, dummy, IMQ, etc. By directing traffic into this intermediate device and scheduling the egress of this device, it not only solves the ingress scheduling issue but also aggregates the traffic from multiple interfaces into a virtual intermediate device, achieving overall scheduling across multiple interfaces. Adding an intermediary virtual interface device is a very good idea to solve many network problems.

Common Example Analysis

Commonly used tools in user-space include: tc, qdisc, class, filter, action, netem, u32, ematch, ip route, iptables, iperf3, etc.

The kernel source code can be found in the net/sched/ directory, including sch_*.c, cls_*.c, act_*.c.

An Overview of TC Traffic Control in Linux Kernel Networking

1. HTB Example

# add qdisc
tc qdisc add dev eth0 root handle 1: htb default 2 r2q 100
# add default class
tc class add dev eth0 parent 1:0 classid 1:1 htb rate 1000mbit ceil 1000mbit
tc class add dev eth0 parent 1:1 classid 1:2 htb prio 5 rate 1000mbit ceil 1000mbit
tc qdisc add dev eth0 parent 1:2 handle 2: pfifo limit 500
# add default filter
tc filter add dev eth0 parent 1:0 prio 5 protocol ip u32
tc filter add dev eth0 parent 1:0 prio 5 handle 3: protocol ip u32 divisor 256
tc filter add dev eth0 parent 1:0 prio 5 protocol ip u32 ht 800:: match ip src 192.168.0.0/16 hashkey mask 0x000000ff at 12 link 3:
# add egress rules for 192.168.0.9
tc class add dev eth0 parent 1:1 classid 1:9 htb prio 5 rate 3mbit ceil 3mbit
tc qdisc add dev eth0 parent 1:9 handle 9: pfifo limit 500
tc filter add dev eth0 parent 1: protocol ip prio 5 u32 ht 3:9: match ip src "192.168.0.9" flowid 1:9

2. TC Ingress Example Using Ifb

# init ifb
modprobe ifb numifbs=1
ip link set ifb0 up
# redirect ingress to ifb0
tc qdisc add dev eth0 ingress handle ffff:
tc filter add dev eth0 parent ffff: protocol ip prio 0 u32 match u32 0 0 flowid ffff: action mirred egress redirect dev ifb0
# add qdisc
tc qdisc add dev ifb0 root handle 1: htb default 2 r2q 100
# add default class
tc class add dev ifb0 parent 1:0 classid 1:1 htb rate 1000mbit ceil 1000mbit
tc class add dev ifb0 parent 1:1 classid 1:2 htb prio 5 rate 1000mbit ceil 1000mbit
tc qdisc add dev ifb0 parent 1:2 handle 2: pfifo limit 500
# add default filter
tc filter add dev ifb0 parent 1:0 prio 5 protocol ip u32
tc filter add dev ifb0 parent 1:0 prio 5 handle 4: protocol ip u32 divisor 256
tc filter add dev ifb0 parent 1:0 prio 5 protocol ip u32 ht 800:: match ip dst 192.168.0.0/16 hashkey mask 0x000000ff at 16 link 4:
# add ingress rules for 192.168.0.9
tc class add dev ifb0 parent 1:1 classid 1:9 htb prio 5 rate 3mbit ceil 3mbit
tc qdisc add dev ifb0 parent 1:9 handle 9: pfifo limit 500
tc filter add dev ifb0 parent 1: protocol ip prio 5 u32 ht 4:9: match ip dst "192.168.0.9" flowid 1:9

3. Simulating Packet Loss, Corruption, and Duplication

# Packet loss
tc qdisc add dev eth0 root netem loss 1%
# Packet corruption
tc qdisc add dev eth0 root netem corrupt 0.2%
# Out of order
tc qdisc change dev eth0 root netem delay 10ms reorder 25% 50%
# Duplication
tc qdisc add dev eth0 root netem duplicate 1%

4. tc + iptables

tc qdisc add dev eth0 root handle 1: htb default 2
tc class add dev eth0 parent 1:1 classid 1:2 htb rate 98mbit ceil 100mbit prio 2
tc class add dev eth0 parent 1:1 classid 1:3 htb rate 1mbit ceil 2mbit prio 2
tc qdisc add dev eth0 parent 1:2 handle 2: sfq perturb 10
tc qdisc add dev eth0 parent 1:3 handle 3: sfq perturb 10
# The filter can use its own u32 or iptables to mark
tc filter add dev eth0 protocol ip parent 1:0 u32 match ip src 192.168.0.2 flowid 1:2
tc filter add dev eth0 protocol ip parent 1:0 u32 match ip src 192.168.0.1 flowid 1:3
# Use iptables to cooperate with the filter
tc filter add dev eth0 parent 1: protocol ip prio 1 handle 2 fw flowid 1:2
tc filter add dev eth0 parent 1: protocol ip prio 1 handle 2 fw flowid 1:3
iptables -t mangle -A POSTROUTING -d 192.168.0.2 -j MARK --set-mark 10
iptables -t mangle -A POSTROUTING -d 192.168.0.3 -j MARK --set-mark 20

Conclusion:

This article mainly provides a brief introduction to the basic logic of Linux traffic control. In the future, I will analyze the specific implementation of TC from the kernel source code level in a series of articles.

<End>

Leave a Comment