Overview of eBPF Implementation on Linux

eBPF on Linux

Programs

The core component of eBPF is the programs. eBPF programs can be attached to various locations in the kernel and invoked like functions. These programs have a wide range of applications, such as logging information, modifying data, making decisions, or triggering side effects. The locations where a program can be attached and the operations it is allowed to perform depend entirely on its program type.

When a program is invoked, the kernel passes a context — a structure containing relevant information prepared by the kernel for that program. Typical examples include socket buffers (sk_buff) or CPU registers. The specific type of context passed depends on the program type.

Like functions, programs also have return values, the meaning of which is determined by the program type. For example, the return value can indicate the number of bytes of a packet to keep, or it can be an action enumeration, such as dropping a packet, accepting a packet, or redirecting a packet.

eBPF programs are typically written in C and compiled using LLVM, but this is not the only implementation method. Any program capable of generating bytecode that adheres to the eBPF instruction set can be used to write eBPF programs. eBPF programs are usually serialized into relocatable ELF files.

Ultimately, eBPF programs are loaded into the kernel via the BPF system call, and the user-space program that performs this operation is called a loader. In practice, the complexity of loaders varies widely, from simple applications that only load eBPF programs to complex systems that continuously interact with multiple programs and maps to provide advanced functionality. Loaders often use loader libraries (such as libbpf) to provide a higher-level API than system calls, significantly simplifying development.

When a loader loads a program, the kernel checks the program for “safety” through a component called the verifier. Here, “safety” means that the program cannot cause the kernel to crash or corrupt critical components. eBPF programs must pass a series of stringent checks to be allowed to load into kernel memory. For more details, refer to the documentation related to the verifier.

Helper Functions

eBPF programs have limited capabilities: they can only read and write local stacks, perform arithmetic operations on registers, call internal functions, and execute conditional jumps. All these operations are confined to their small scope. The final capability of a program is to call so-called helper functions. These are essentially regular C functions defined by the kernel. These functions constitute an internal API/ABI between eBPF programs and the kernel. These helper functions allow eBPF programs to perform tasks that would otherwise be impossible, as these tasks cannot pass the verifier’s checks without using helper functions.

These helper functions can accept up to 5 parameters and return a single return value. Not all program types can call all helper functions — this is to comply with the verifier’s restrictions.

Helper functions have a variety of uses, from simply obtaining additional information (such as which CPU core we are executing on) to triggering significant side effects (such as redirecting packets). For a complete overview, please refer to the helper functions page1.

Helper functions are part of the UAPI (User Space API), so they enjoy the well-known stability guarantees of UAPI. For more detailed information, please refer to the content related to helper functions.

1https://docs.ebpf.io/linux/helper-function/

KFuncs

KFuncs are kernel functions that are specially annotated and allowed to be called directly from eBPF programs. They essentially serve as an alternative mechanism to helper functions. In principle, the upstream kernel no longer accepts new helper functions, so any new functionality that needs to be exposed to eBPF programs should be implemented through KFuncs.

Unlike helper functions, KFuncs do not belong to UAPI and do not enjoy the same stability guarantees. Therefore, defensive programming should be adopted when using KFuncs to handle cases where functions may be unavailable or their behavior may change.

For more detailed information, please refer to the KFuncs page2.

2https://docs.ebpf.io/linux/concepts/kfuncs/

Maps

eBPF maps are data structures residing in the kernel. Both eBPF programs and user-space programs can access these maps, making them a communication layer between eBPF programs and user space, as well as a place for persistent data storage between program calls. Like all other BPF objects, maps are shared across the host, allowing multiple programs to access the same map simultaneously. Therefore, maps can also be used to pass information between different types of programs mounted at different points.

Examples of these maps include BPF_MAP_TYPE_ARRAY (an array containing arbitrary values) or BPF_MAP_TYPE_HASH (a hash map with arbitrary key and value types). For more detailed information, please refer to the maps type overview3.

For more details on how to use maps, please refer to the maps concept page4.

3https://docs.ebpf.io/linux/map-type/

4https://docs.ebpf.io/linux/concepts/maps/

Objects

Both eBPF programs and maps belong to BPF objects, along with some other objects that we have not yet mentioned. The management of all these objects is roughly the same. Such BPF objects are created by the loader, which obtains a file descriptor for the object. The file descriptor is used for further interaction with the object, and it also serves as a reference that keeps the object “alive”. Once there are no more references to the object, it will be released.

Applications can pass copies of these file descriptors to other processes using inter-process communication techniques such as UNIX sockets, which is a very general method. A more eBPF-specific technique is called pinning, which allows the loader to reference BPF objects using a special file called a pin. These pin files can only be created in a special BPF filesystem that needs to be mounted at a certain location (usually /sys/bpf, but it may vary across different Linux distributions). As long as the pin file exists, the object it references will remain alive. Any program with permission to access the pin file can read these files and obtain a reference to the object in this way. Therefore, multiple programs can share the same object simultaneously.

Capabilities

Starting from Linux version 5.8, the capability division for eBPF has become more granular. You can find a list of various program types and their required capabilities here5.

  • <span>CAP_BPF</span>: Allows loading eBPF programs and creating eBPF maps.
  • <span>CAP_PERFMON</span>: This permission is required when loading tracing programs, and the use of the bpf_trace_printk() function depends on tracing programs.
  • <span>CAP_NET_ADMIN</span>: This permission is required when loading network programs.

More detailed information can be found in the kernel header files6.

The program type BPF_PROG_TYPE_CGROUP_SKB has an exception. Non-privileged users can load such programs but cannot mount them at the corresponding locations.

5 https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/bpf/syscall.c#n2644

6https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/include/uapi/linux/capability.h#n382

Last updated: September 2, 2024

Created on: January 25, 2023

Src

https://docs.ebpf.io/linux/

Leave a Comment