Previously, an article analyzed the reasons for packet loss in bonded network interfaces and their solutions. This article will detail the root causes of packet loss in bonded network interfaces and dynamically analyze the kernel using the bpftrace tool.
We will also use dropwatch to observe packet loss in the kernel:

Packet loss occurs in the __netif_receive_skb_core kernel function (other functions also experience packet loss, but they are not the root cause of the issue; if you have time, you can check what packets are lost in other functions).
Next, we will use bpftrace to analyze the protocol information of the packet loss:
#!/usr/bin/bpftrace
kprobe:__netif_receive_skb_core{ $skb = ((struct sk_buff *) *arg0); $protocol=(($skb->protocol & 0xFF00) >> 8) | (($skb->protocol & 0x00FF) << 8); if($protocol==35020) { printf("protocol=%d, protocol=0x%04X\n",$protocol, (($skb->protocol & 0xFF00) >> 8) | (($skb->protocol & 0x00FF) << 8)); } }
After executing the bpftrace script, the captured information is as follows:

From the protocol of the packet loss, it is determined that the 0x88CC protocol was dropped, which can be confirmed as being caused by the lack of LLDP.
On the machine, run yum install lldpad and then start lldpad to observe. It was found that the dropped count for the bond0 interface no longer increases.

From the resource pool, a machine was selected (with bonding) for investigation, and it was found that bond0 did not experience packet loss, and LLDP protocol packets were sent to bond0.


It is strange why some machines with bond0 experience packet loss for LLDP while others do not. The only difference between the online machines and the resource pool machines is that a bridge was created. Could it be that creating a bridge affects packet loss in bonding?
Experiments were conducted on the test machine, and without creating a bridge, the packet loss count for bond0 did not increase.

After creating a bridge, the packet loss count began to increase.

The root cause is related to the bridge. First, let’s outline the network topology before and after the bridge:
The network topology before the bridge is as follows:

The network topology after the bridge is as follows:

Next, we will focus on analyzing the kernel code regarding packet forwarding on the bridge.
First, let’s look at the kernel stack situation of the packet loss kernel function. Just add printf("%s\n", kstack); to the existing dropwatch.bt.
#!/usr/bin/bpftrace
kprobe:__netif_receive_skb_core{ $skb = ((struct sk_buff *) *arg0); $protocol=(($skb->protocol & 0xFF00) >> 8) | (($skb->protocol & 0x00FF) << 8); if($protocol==35020) { printf("%s\n", kstack); printf("protocol=%d, protocol=0x%04X\n",$protocol, (($skb->protocol & 0xFF00) >> 8) | (($skb->protocol & 0x00FF) << 8)); } }

The above shows the kernel stack situation of the 0x88CC protocol packet captured after creating the bridge. Let’s see what it looks like without creating a bridge:

Without creating a bridge, the 0x88CC packets cannot be captured through bpftrace.
From the above two sets of experimental comparisons, it can be seen that the processing paths for the 0x88CC protocol packets differ between creating a bridge and not creating a bridge.
The network packet reception process is net_rx_action->napi_poll->poll()
For the mlx network card, the poll() function corresponds to mlx5e_napi_poll ->mlx5e_poll_rx_cq() ->handle_rx_cqe()->mlx5e_handle_rx_cqe()->napi_gro_receive()
Analyzing napi_gro_receive():
#!/usr/bin/bpftrace kprobe:napi_gro_receive{ $skb = ((struct sk_buff *) arg1); $protocol=(($skb->protocol & 0xFF00) >> 8) | (($skb->protocol & 0x00FF) << 8); if($protocol==35020) { printf("skb->protocol=%d, protocol=%d, protocol=0x%04X\n",$skb->protocol, $protocol, (($skb->protocol & 0xFF00) >> 8) | (($skb->protocol & 0x00FF) << 8)); }
}
No information was output.

However, when performing tcpdump, information can be captured, which suggests that when tcpdump is executed, the bond0 network card enters promiscuous mode, allowing it to capture all packets passing through the network card, while in non-promiscuous mode, it can only capture packets addressed to the current host’s MAC address.


This indicates that the 0x88CC protocol packets were not processed by the kernel but were ignored upon reaching the network card.
So why does bond0 process LLDP protocol packets after creating a bridge?
After creating a bridge, the 0x88CC protocol packets can be captured.

This is because after creating a bridge, similar to tcpdump, the network card (bond0) is set to promiscuous mode. This allows it to receive all packets, which is also why the drop count for the bond0 network card increases.
How can this be proven?
We can use bpftrace to check the flags of the net_device corresponding to the bond kernel function.
#!/usr/bin/bpftrace
kprobe:bond_3ad_lacpdu_recv{ $skb = ((struct sk_buff *) arg0); $dev = ((struct net_device*) $skb->dev); $flags = $dev->flags & (1<<8); $protocol=(($skb->protocol & 0xFF00) >> 8) | (($skb->protocol & 0x00FF) << 8); if($protocol==35020) { printf("%s\n", kstack); printf("flags=%d, protocol=%d, protocol=0x%04X\n",$flags, $protocol, (($skb->protocol & 0xFF00) >> 8) | (($skb->protocol & 0x00FF) << 8)); } }

The device’s flags are 256, indicating that the device has been set to promiscuous mode.
When the network card is not set to promiscuous mode, flags=0, as shown in the trace when the bridge is not created:

By tracing bond_change_rx_flags, it was found that when creating a bridge, the bond network card is indeed set to promiscuous mode.
#!/usr/bin/bpftrace
kprobe:bond_change_rx_flags{ $net = ((struct net_device *) arg0); $change = arg1; $flags = $change & (1 << 8); printf("%s\n", kstack); printf("name=%s, flags=%d, change=%d\n", $net->name, $flags, $change);
}

Additionally, why does installing lldpad resolve the packet loss issue in bonding? I also checked the source code of lldpad and found that there is indeed logic for processing LLDP protocol packets:

In summary: Machines in self-built data centers with dual network cards configured for bonding, and virtualization and K8S based on bonding have created bridges. After creating a bridge, the bond network card is set to promiscuous mode by default, allowing it to receive all packets, including LLDP protocol packets (0x88CC). However, since the destination address of the 0x88CC protocol is not the host machine, the packets are dropped, and the drop count increases. In the case of not creating a bridge, the bond network card is not set to promiscuous mode by default, thus filtering out LLDP protocol packets. When executing tcpdump, the network card is set to promiscuous mode, allowing the corresponding information to be traced.
PS: This is the last technical article I will share this year. I wish everyone a Happy New Year, and I will continue to share technical articles with you in the coming year!