Follow us to see more great articles like this!
An update on GCC BPF support
By Daroc AldenApril 2, 2025LSFMM+BPFGemini-1.5-flash translationhttps://lwn.net/Articles/1015747/
José Marchesi and David Faust opened a session on BPF (Berkeley Packet Filter) at the 2025 Linux Storage, Filesystems, Memory Management, and BPF Summit, where they presented a lengthy discussion on their work to support compilation to BPF in GCC. Overall, the project is slowly progressing towards full support for BPF, with most self-tests now passing using the patches that Faust is working on. However, during this process, some issues with Clang’s support for BPF were discovered, necessitating lengthy discussions to find a way forward for both projects.
Marchesi first stated that there is not much to report this year. Pessimistically, progress is slowing down. Optimistically, support for BPF is stabilizing, and “there is less to do because much of the work is already done.” Based on the remainder of his talk, the optimistic view seems more appropriate to me.
Currently, the GCC version supporting BPF has been added to Fedora and will eventually make its way into CentOS, RHEL, and other distributions that pull from Fedora’s package ecosystem. Most major distributions, including Debian, Ubuntu, Gentoo, Arch, and all their derivatives, now support compiling to BPF using GCC. Marchesi mentioned that Gentoo is actually using GCC to build certain packages’ BPF components.
Incorporating GCC with BPF support into distributions is crucial not only because it is easily accessible but also because it means the project has started receiving bug reports. Marchesi stated that they are very grateful for the bug reports, as receiving these will help solidify GCC’s support for BPF. Inclusion in various distributions also indicates that the plan to introduce a BPF-specific maintenance branch last year was unnecessary — which is good, as it reduces the workload for everyone.
BPF support has also proven useful for other parts of GCC, particularly for support of NVPTX (the architecture used by some Nvidia GPUs). Marchesi noted that this is also a limited architecture that requires special attention.
Tagging misadventures
GCC has now passed more BPF self-tests. Specifically, some work submitted for review upstream before the summit fixed how GCC generates BPF Type Format (BTF) tags for kernel pointers. BTF represents composite types as chains of different tags; for example, a kernel pointer to an integer (which differs from a regular pointer in BPF so that the verifier can handle it specially) is represented as follows:
// In C:int __kptr * name;// In BTF:pointer type tag -> kernel pointer annotation -> integer
The fact that the annotation marking the pointer type as a kernel pointer appears after the pointer type itself is somewhat unusual. Other annotations, such as <span>const</span>
, are represented by tags appearing before the pointer type. Previously, GCC emitted the kernel pointer annotation at the same position as all other annotations; changing this to match Clang’s behavior significantly increased the number of BPF self-tests passing through GCC.
However, Marchesi is not satisfied with this, as he believes that the kernel pointer annotation should be placed at the same position as all other annotations. He suggested that Clang and libbpf should be changed to do it this way. Yonghong Song mentioned that this possibility was discussed during the initial design of BTF, and he wondered if any changes have occurred since then to justify revisiting that decision. Faust suggested that allowing both orderings could be an option, although he was unsure how pahole would recognize the difference. Pahole maintainer Arnaldo Carvalho de Melo stated that the program could simply check the BTF producer, so this would not be an issue — although the goal remains to eventually remove pahole from the BPF compilation pipeline, so GCC needs to generate tags in a way that meets kernel expectations.
Faust inquired how Clang represents type tags on non-pointer types; Song replied that it does not, and if that needs to be done, changes would be required. Faust stated that the ability to place type tags on non-pointers is useful — this point re-emerged later in the discussion about representing inline BPF functions. Alexei Starovoitov requested that Faust and Marchesi submit a kernel patch to add support for their preferred tag ordering in the verifier, to which Faust agreed.
However, any changes to how GCC handles BTF tags will almost certainly need to wait until the final release of GCC 15 at the end of April, as many GCC maintainers are busy with that. Faust also has another tagging-related change awaiting review: an extension to DWARF debugging information discussed at the 2024 Linux Plumbers Conference, which allows merging lists of debugging information with matching tails. Since GCC generates BTF information from its internal DWARF representation, this should also lead to a more compact BTF section for BPF programs.
Marchesi explained that the size of BTF information is not really a problem for the kernel, “but we need to satisfy [core GCC developers].” Song mentioned that Clang does not yet need to save space by sharing debugging information because it has fewer type tags. Marchesi expressed hope that Clang will eventually need it too, and Faust believes that this feature has other uses.
Google Summer of Code
GCC participates in Google Summer of Code every year; this time, Marchesi mentioned that he has proposed having someone write infrastructure for running GCC’s BPF tests on specific kernels. Currently, Ihor Solodrai maintains the continuous integration tests for the bpf-next tree (which he later presented during the conference), which tests each submitted patch to ensure that nothing regresses in what the verifier accepts. Marchesi hopes that GCC’s changes can also achieve this, so that GCC contributors can quickly receive feedback on whether new optimizations break the verifier.
He suggested that students could write a testing tool to launch the appropriate kernel in a virtual machine and load BPF programs into it, integrating it into GCC’s existing testing infrastructure. However, some attendees expressed skepticism about this. Starovoitov pointed out that launching a virtual machine for each test would be very slow and wondered if BPF programs could be run on the host. Marchesi stated that there are no existing tests using virtual machines, but GCC does have some tests that rely on specific external hardware, which is not much different from the perspective of the testing tool.
One attendee asked which kernel version they plan to test. “Something stable,” Faust replied. Starovoitov remarked that working on these tests seems like a good direction for GCC, but it might be a bit too much for a Google Summer of Code project. Marchesi noted that the application deadline is April 8, so they will soon know if anyone thinks they can take on this challenge.
Non-lexical annotations
A more substantive issue that Faust and Marchesi encountered while trying to match Clang’s behavior is how to handle <span>preserve_access_index</span>
. This attribute is one of the core parts that makes “Compile Once, Run Anywhere” (CO-RE) relocation possible. It instructs the BPF verifier to use information about field names in the BTF debugging information to rewrite access to fields in structures at runtime, accommodating any changes in structure layout. This allows BPF programs compiled against header files from one kernel version to run on any sufficiently similar kernel, even if the exact order of fields in the structure has changed.
Today, the basic attribute works correctly in GCC. The issue lies in how to handle cases where the attribute is specified as part of a nested structure. Consider the following example:
struct other { char c; int i;}struct outer_attr { struct other other; struct { long l; } nested;} __attribute__((preserve_access_index));
Given that the attribute only applies to the outer structure, how should the compiler handle access to <span>nested.l</span><code><span> or </span><code><span>other.i</span><code>? Clang's current handling of this situation — which almost certainly does not conform to the standard — is to propagate the attribute to <code><span>nested</span><span>, but not to </span><code><span>other</span><span>. If the program were written in C++, this would make sense, as C++ defines nested structures differently from non-nested structures, but the C standard does not. In fact, Marchesi pointed out that the C standard does not have the concept of nested structures at all; the ability to write a structure definition within another structure definition is purely a syntactical convenience.</span>
On the other hand, some attendees felt that propagating the attribute intuitively makes sense. Refusing to emit CO-RE relocation information for <span>nested</span><span> seems to offer no benefit. However, Marchesi stated that implementing Clang's behavior is difficult for GCC. GCC's parser cancels nested structure definitions during parsing, so the compiler does not even have access to information about whether they are nested at code generation time. "If we want to do what Clang does — and we don't want to — we need to hook into the parser." He hopes that Clang can view this as an error and require programmers to manually write </span><code><span>__attribute__((preserve_access_index))</span><span> on any nested structures.</span>
Song stated that this Clang behavior is both intentional and BPF-specific; they do not want to force users to add attributes to every nested structure. Another attendee suggested ignoring nested structures, but to maintain consistency, make the behavior of referenced structures (like the =other= above) the same. They argued that if the outer structure is CO-RE relocatable, then logically, the inner structure should also be CO-RE relocatable, regardless of how it enters that structure.
Faust generally agreed but pointed out that these attributes apply to types; if a structure is not used in a nested context in one context but is nested in a structure using that attribute in another context, it indicates that it should only be CO-RE relocatable in the latter case. However, there is no way to specify this in BTF, as both usages are the same type. Ultimately, the group failed to reach a consensus on the desired semantics of this attribute — meaning that users wishing to write BPF programs with GCC may need to manually apply the attribute to their nested structures, at least for now.
Actual changes
Marchesi stated that it was previously unclear whether programs including standard library headers should use the version installed on the build host when compiling to BPF. Well, this question has been answered: the C standard (since C99) actually requires the build environment to provide these headers. Therefore, GCC has been changed to match Clang’s behavior and provide BPF programs with (slightly modified) standard library headers available on the host. This could become an issue if BPF programs ultimately depend on certain parts of the GNU C Library, as from the compiler’s perspective, BPF is effectively a “bare metal target.” Users wanting the old behavior can pass the <span>-fno-hosted</span>
command line option.
Faust is also extending support for the BPF may_goto instruction. Currently, this instruction can only be used in inline assembly; Song stated that Clang’s C frontend will never generate it. However, Faust claimed that the work required to have C code generate it is “very simple,” so GCC’s code generator may soon utilize it.
Marchesi questioned whether the <span>bpf_fastcall</span>
instruction is still optional — that is, whether not implementing it would cause any issues other than suboptimal performance. Starovoitov replied, “Theoretically yes,” but “over time, it has become less optional.” This prompted Faust to ask assembled BPF developers whether they would prefer to implement <span>may_goto</span>
, the fix for BPF atomic memory ordering, <span>bpf_fastcall</span>
, <span>preserve_static_offset</span>
, and fixes for BPF tags in some order.
Starovoitov stated that type tags are the most important, followed by support for <span>may_goto</span>
, then <span>bpf_fastcall</span>
, and finally preserve_static_offset. It seems that no one disagreed with that assessment.
Another potential improvement that Marchesi is looking forward to is including additional “must” annotations in the compiler, such as musttail. This annotation was recently added and is being increasingly used; it will signal an error if the compiler cannot ensure that the last call in a function is a tail call. Another possible annotation of this type is the “must inline” annotation, which would be stronger than <span>always_inline</span><span>, as the latter may silently fail to inline a function if it is too large.</span>
John Fastabend asked whether this would allow writing recursive functions in BPF programs. Starovoitov’s succinct answer was, “Maybe.” Fastabend elaborated that the last thing preventing Tetragon from using regular function calls in its BPF programs is the lack of support for recursion; theoretically, having musttail support would enable the verifier’s existing loop handling logic to handle recursive calls. Starovoitov cautioned that it is not that simple and may require additional changes to the verifier.
Runtime BTF and debugging BTF
At the end of the allocated time, Marchesi raised the fact that BTF serves a dual purpose. On one hand, BTF information should serve as a debugging format, so ideally, it should reflect the source program written by the user. On the other hand, BTF is used by the verifier to understand the program and is used for CO-RE relocation, for which it should reflect the contents in the binary form of the program.
For example, GCC has several optimization processes that can change function signatures. The main one is “Interprocedural Scalar Replacement of Aggregates” (ISRA), which includes removing unused parameters and converting certain parameters to pass by value. If the compiler generates BTF before performing optimizations, the BTF will reflect the original signatures of the functions, which may not contain the information the verifier wants. If the compiler generates BTF after performing optimizations, users will be unable to use it to understand their programs, as it will no longer correspond to what they wrote.
Currently, both GCC and Clang emit BTF before optimizations. This may lead to issues where the verifier cannot find the corresponding BPF for functions optimized with ISRA. Everyone present agreed that this is a problem, but not everyone was satisfied with the suggestion made by another pahole maintainer: to generate both. While it has the benefit of being a relatively simple fix, it is not particularly elegant. The discussion about the pros and cons continued for a while and then ultimately ended to meet the schedule.
End of articleLWN articles are licensed under CC BY-SA 4.0.
Feel free to share, reproduce, and create derivative works based on existing agreements.Long press the QR code below to follow LWN for in-depth articles and various recent discussions in the open-source community.