The Lost Art of Structure Packing in C Language

The Lost Art of Structure Packing

Eric S. Raymond
<[email protected]>

The Lost Art of Structure Packing

Table of Contents

  1. Who Should Read This Article
  2. The Reason I Wrote This Article
  3. Alignment Requirements
  4. Padding
  5. Struct Alignment and Padding
  6. Bit Fields
  7. Struct Member Reordering
  8. Tricky Scalar Cases
  9. Readability and Cache Locality
  10. Other Packing Techniques
  11. Overriding Alignment Rules
  12. Tools
  13. Proofs and Special Cases
  14. Other Languages 14.1. C++ 14.2. Go 14.3. Rust 14.4. Java 14.5. Swift 14.6. C#
  15. Support This Work
  16. Further Reading
  17. Version History

1. Who Should Read This Article

This article introduces a technique for reducing memory usage in compiled languages that use C-like structures—manually repacking these structure declarations to minimize their size. A basic knowledge of the C programming language is required to read this article.

If you plan to write code for memory-constrained embedded systems or operating system kernels, mastering this technique is essential. This technique is also useful if the data sets of your applications are so large that the program frequently hits memory limits. Understanding this technique is beneficial in any scenario where you care deeply about optimizing memory bandwidth usage and minimizing cache misses.

Finally, mastering this technique is a pathway to delving into other esoteric topics in the C language. You are not yet an advanced C programmer until you grasp these rules. If you cannot write this article yourself and make sound judgments about it, you are not yet a master of the C language.

The title of this article mentions “C,” but almost all of its content also applies to C++. Many of the techniques discussed here are applicable to the Go language and should be extendable to any compiled language that uses C-like structures. There is a discussion at the end regarding C++, Go, Rust, Java, Swift, and C#.

2. The Reason I Wrote This Article

I created this webpage because at the end of 2013, I found myself extensively using an optimization technique I learned over twenty years ago but had rarely used since.

I needed to reduce the memory footprint of a program that used thousands (sometimes hundreds of thousands) of C structure instances. This program is cvs-fast-export, and the problem was that it crashed due to out-of-memory errors when processing large code repositories.

In this case, significant memory savings can be achieved by carefully rearranging the order of structure members. This can yield substantial benefits—in my case, I was able to reduce the working set size by about 40%, allowing the program to handle larger code repositories without crashing.

However, during the process, as I reflected on what I was doing, I gradually realized that this technique has been largely forgotten today. A brief search online revealed that programmers seem to no longer discuss it much, at least not within the scope retrievable by search engines. There are a few Wikipedia entries that touch on the topic, but I found no comprehensive introduction to the technique.

In fact, this situation is not without reason. Computer science courses (rightly) guide people away from micro-optimizations towards finding better algorithms. The plummeting cost of machine resources has reduced the necessity of saving memory usage. Moreover, in the past, hackers learned this technique by hitting walls on peculiar hardware architectures, an experience that is less common today.

But in important scenarios, this technique still holds value; as long as memory is limited, it will always be valuable. This article aims to prevent programmers from having to rediscover this technique, allowing them to focus on more important matters.

4. Padding

Now, let’s look at a simple example of variable layout in memory. Consider the following series of variable declarations at the top level of a C module:

char *p;
char c;
int x;

If you know nothing about data alignment, you might think these three variables occupy contiguous byte ranges in memory. That is, on a 32-bit machine, a 4-byte pointer is followed by a 1-byte character, and then a 4-byte integer. On a 64-bit machine, the only difference is that the pointer would be 8 bytes.

In reality, the implicit assumption that the allocation order of static variables matches the source code order does not necessarily hold; the C standard does not enforce this. I will ignore this detail for the following reasons: first, this implicit assumption is usually correct anyway; second, the actual purpose of discussing padding and packing outside of structures is to prepare you for understanding the situation inside structures.

Now let’s look at the actual situation (on architectures like x86 or ARM that use self-aligning types). The storage for variable <span>p</span> starts at a self-aligned 4-byte or 8-byte boundary, depending on the machine word length. This is the alignment for pointers, which is the strictest alignment.

The storage for variable <span>c</span> follows immediately after <span>p</span>. However, since variable <span>x</span> has a 4-byte alignment requirement, this results in a gap in the layout; it is as if there were a fourth intermediate variable, as shown below:

char *p;      /* 4 or 8 bytes */
char c;       /* 1 byte */
char pad[3];  /* 3 bytes */
int x;        /* 4 bytes */

The character array <span>pad[3]</span><span> represents 3 bytes of wasted space in the structure. The values of the padding bytes are undefined; they are especially not guaranteed to be zeroed.</span>

Now let’s see what happens if <span>x</span> is a 2-byte short integer:

char *p;
char c;
short x;

In this case, the actual layout would be as follows:

char *p;      /* 4 or 8 bytes */
char c;       /* 1 byte */
char pad[1];  /* 1 byte */
short x;      /* 2 bytes */

On the other hand, if on a 64-bit machine, <span>x</span> is a long integer:

char *p;
char c;
long x;

The final layout would be as follows:

char *p;     /* 8 bytes */
char c;      /* 1 byte */
char pad[7]; /* 7 bytes */
long x;      /* 8 bytes */

If you have been reading carefully, you might now wonder what would happen if the shorter variable declarations were placed first:

char c;
char *p;
int x;

If the actual memory layout were written like this:

char c;
char pad1[M];
char *p;
char pad2[N];
int x;

What can we say about <span>M</span> and <span>N</span>?

First, in this case, <span>N</span> will be zero. The address of variable <span>x</span> follows immediately after <span>p</span>, which guarantees pointer alignment, and pointer alignment is at least as strict as integer alignment.

<span>M</span>‘s value is less predictable. If the compiler happens to map <span>c</span> to the last byte of a machine word, then the next byte (the first byte of <span>p</span>) will be the first byte of the next machine word, and it will be correctly pointer-aligned. In this case, <span>M</span> will be zero.

More likely, <span>c</span> will be mapped to the first byte of a machine word. In this case, <span>M</span> will be the number of padding bytes needed to ensure that <span>p</span> is pointer-aligned—3 bytes on a 32-bit machine, 7 bytes on a 64-bit machine.

There may also be intermediate cases. <span>M</span> can be any value from 0 to 7 (on a 32-bit machine, it can be from 0 to 3), since characters can start from any byte boundary of the machine word.

If you want these variables to occupy less space, you can achieve this by swapping the positions of <span>x</span> and <span>c</span><span> in the original sequence.</span>

char *p;     /* 8 bytes */
long x;      /* 8 bytes */
char c;      /* 1 byte */

Generally, for a small number of scalar variables in a C program, saving a few bytes by changing the order of declarations is not significant. However, when this technique is applied to non-scalar variables, especially structures, it becomes much more meaningful.

Before discussing structures, let’s take a look at scalar arrays. On platforms that use self-aligning types, there is no padding within character/short/integer/long/pointer arrays; each member will automatically self-align at the end of the previous member.

All these rules and examples can be applied to Go structures and Rust structures with the “repr(C)” attribute with only syntactical modifications.

In the next section, we will see that the situation with arrays of structures is not necessarily the same.

5. Struct Alignment and Padding

In general, the alignment of struct instances is the same as the alignment of their widest scalar member. The compiler does this to ensure that all members can self-align for fast access.

Additionally, in C (as well as Go and Rust), the address of a struct is the same as the address of its first member, meaning that structs do not have leading padding. However, this may not be the case in C++, as discussed in section 14.

(When you are uncertain about such cases, the ANSI C provides a <span>offsetof()</span><span> macro that can be used to read the offset of struct members.)</span>

Consider the following struct:

struct foo1 {
    char *p;
    char c;
    long x;
};

Assuming a 64-bit machine, any instance of <span>struct foo1</span><span> will adopt an 8-byte alignment. Its memory layout is as follows, which is not surprising:</span>

struct foo1 {
    char *p;     /* 8 bytes */
    char c;      /* 1 byte */
    char pad[7]; /* 7 bytes */
    long x;      /* 8 bytes */
};

Its layout is as if these types of variables were declared separately. But if we place <span>c</span> first, the situation changes.

struct foo2 {
    char c;      /* 1 byte */
    char pad[7]; /* 7 bytes */
    char *p;     /* 8 bytes */
    long x;      /* 8 bytes */
};

If these members were separate variables, <span>c</span> could start from any byte boundary, and the size of <span>pad</span> might vary. But since the alignment of <span>struct foo2</span> is determined by its widest member (the pointer), this cannot be the case. Now, <span>c</span> must be pointer-aligned, followed by 7 bytes of padding, which is fixed.

Now let’s talk about tail padding in structures. To explain this, I need to introduce a basic concept I call the stride address of a structure. It is the first address after the structure data that has the same alignment as the structure.

The general rule for tail padding in structures is that the compiler treats it as if the structure has tail padding to its stride address. This rule determines the return value of the <span>sizeof()</span><span> function.</span>

For example, on a 64-bit x86 or ARM machine:

struct foo3 {
    char *p;     /* 8 bytes */
    char c;      /* 1 byte */
};

struct foo3 singleton;
struct foo3 quad[4];

You might think that <span>sizeof(struct foo3)</span><span> should be 9, but in fact, it is 16. The stride address is at the address of </span><code><span>&p</span>[2]. Therefore, in the <span>quad</span> array, each member has 7 bytes of tail padding because the first member of each subsequent structure needs to be self-aligned to an 8-byte boundary. Its memory layout is as if the structure were declared like this:

struct foo3 {
    char *p;     /* 8 bytes */
    char c;      /* 1 byte */
    char pad[7];
};

In contrast, consider the following example:

struct foo4 {
    short s;     /* 2 bytes */
    char c;      /* 1 byte */
};

Because <span>s</span> only needs to be aligned to 2 bytes, the stride address is one byte after <span>c</span>, and the entire <span>struct foo4</span> only requires 1 byte of tail padding. Its layout is as follows:

struct foo4 {
    short s;     /* 2 bytes */
    char c;      /* 1 byte */
    char pad[1];
};

<span>sizeof(struct foo4)</span><span> will return 4.</span>

Finally, there is an important detail: if your structure contains other structure members, the internal structure must also be aligned according to its widest scalar member. Suppose you write:

struct foo5 {
    char c;
    struct foo5_inner {
        char *p;
        short x;
    } inner;
};

The member <span>char *p</span> in the inner structure will force both the outer and inner structures to be aligned to pointers. On a 64-bit machine, the actual layout is as follows:

struct foo5 {
    char c;           /* 1 byte */
    char pad1[7];     /* 7 bytes */
    struct foo5_inner {
        char *p;      /* 8 bytes */
        short x;      /* 2 bytes */
        char pad2[6]; /* 6 bytes */
    } inner;
};

This structure shows us the space that can be saved by rearranging structure members. Out of 24 bytes, 13 are padding bytes, resulting in over 50% wasted space!

Src

https://www.catb.org/esr/structure-packing/

Leave a Comment