A Detailed Explanation of Linux Core Dumps: From Basics to Practical Applications (Part 2)

Hello everyone, welcome to Lixin Embedded.

In Linux development, core dumps are powerful tools for debugging program crashes. However, on devices with limited storage space, core dump files that can be several megabytes in size can take up a lot of space. In the previous article, we discussed the principles and generation mechanisms of core dumps. Today, we will talk about how to slim down core dumps, retaining essential debugging information while alleviating storage constraints.

The Core Idea of Slimming Down Core Dumps

To reduce the size of core dump files, we first need to clarify what we actually need. When a program crashes, the essence of a core dump is to provide a stack trace, helping us locate the crash point. Therefore, the first step in slimming down is to eliminate irrelevant data while ensuring that the debugger can still function properly. In summary, our goals are:

Only retain the first N bytes of each thread’s stack to control the amount of stack data.
Exclude heap memory data to reduce unnecessary information.
Retain metadata required by the debugger, such as dynamic library mapping information.

After slimming down, the core dump transforms from a bulky file into a lean one, saving space while still fulfilling its purpose. Of course, this operation comes with a cost: variables allocated on the heap will no longer be visible, and the stack depth will be limited. However, based on practical experience, most crash debugging only requires local variables on the stack and the most recent few frames of call information, so these limitations usually have minimal impact.

A Detailed Explanation of Linux Core Dumps: From Basics to Practical Applications (Part 2)

How to Locate Thread Stacks

To retain only stack data, we first need to locate them. Do you remember the structure mentioned in the previous article? Core dumps record the mapped memory regions through the PT_LOAD segment, and each thread’s stack resides in a specific PT_LOAD segment. The question is, how do we locate these segments? The answer lies in the thread’s program counter (PC).

The PC records the address of the instruction currently being executed by the thread, which we can use to determine the memory region where the stack resides. Specifically, the PT_NOTE segment of the core dump contains an NT_PRSTATUS record, which includes the general register information of the thread (pr_reg). From here, we can obtain the PC value and then traverse the /proc//maps to find the address range where the PC falls. For example:

77837a402000-77837a404000 rw-p 00000000 00:00 0

If the PC falls within this range, congratulations, you have found the stack! Next, simply copy down from the PC, up to N bytes, or until the stack’s starting address. This way, the stack data for each thread is obtained, and the size of the core dump is significantly reduced.

However, having just the stack is not enough. If we directly feed it into GDB, the debugger will be confused due to the lack of metadata related to dynamic linking. Next, we need to supplement these critical pieces of information.

The Savior of Dynamic Linking: r_debug Structure

Embedded programs often rely on dynamic libraries, such as libopenssl for encryption and libc for basic operations. During a crash, the call stack may delve into these dynamic libraries, so we need to retain enough information for GDB to parse correctly. The problem is that Linux enables Address Space Layout Randomization (ASLR) for security, meaning the loading addresses of dynamic libraries vary with each run. How do we map the fixed addresses at compile time to the random addresses at runtime?

This is where the r_debug structure comes into play. This structure records runtime information of the dynamic linker, and the l_addr field in link_map tells us the offset of the dynamic library address. GDB uses this to map the compile-time addresses to the actual runtime addresses.

To find r_debug, we need to look at the PT_DYNAMIC program header. This header contains a bunch of tags and values, and r_debug is hidden under the DT_DEBUG tag. By traversing the dynamic tags and extracting the memory range of r_debug, we can include it in the core dump, solving the dynamic library address mapping issue.

Metadata for Dynamic Libraries: Don’t Forget the Build ID

In addition to r_debug, to ensure GDB correctly loads the symbols of the main program and dynamic libraries, we also need to collect some metadata, including:

ELF file header
All program headers
Build ID records

This information allows GDB to know how to find the symbol table and resolve parts of the call stack involving dynamic libraries. We can find all mapped ELF files from /proc//maps, extract this metadata one by one, and ensure the debugger functions correctly.

Slimming Effect

After discussing all this, how much space can we save with the slimmed-down core dump? Let’s trigger a core dump using memfaultctl to see the effect.

The ordinary core dump, without any optimization, has a size of:

-rw-r--r-- 1 root root 2625K May 1 14:59 core-fdedc559-3b92-4a39-9156-fe575104b947.elf

2.6MB! On embedded devices with only a few dozen MB of storage space, this is not a small number. What about after optimization?

-rw-r--r-- 1 root root 75K May 1 14:56 core-e8afead6-782d-4f22-bb23-8789b4390f1c.elf

As we can see, it is only 75KB, a reduction of 35 times. Although the specific effect varies depending on the code, this kind of space saving is a lifesaver for older devices. Storing several optimized core dumps takes up less space than a single original core dump.

Sharing Practical Experience

In actual projects, slimming down core dumps is not just about saving space; it also helps improve debugging efficiency. For example, in some IoT devices, storage space is tight, and network bandwidth is limited, making uploading large files a nightmare. The optimized core dump not only fits but can also be quickly uploaded to the server to help locate issues.

Additionally, there are situations where, after the device is shipped, customers occasionally report faults on-site, and the core dump is the only clue for problem localization. However, the device may only have 128MB of flash memory, leaving very little space for debugging. In this case, the slimmed-down core dump can be useful, allowing multiple dump files to be stored without affecting the normal operation of the device.

Of course, there are some pitfalls here. For instance, truncating the stack may lead to the loss of deep call stacks, so extra caution is needed when debugging complex multithreaded programs. It is recommended to adjust the N value based on project requirements to balance space and debugging depth.

Device-Side Stack Tracing

Slimming down core dumps is just the first step. In the next article, we will go further and discuss how to perform stack tracing directly on the device. This not only further compresses data but also avoids the leakage of sensitive information.