Understanding the Implementation Principle of Zero-Length Arrays in Embedded C

I am Lao Wen, an embedded engineer who loves learning.Follow me to become even better together!

Concept of Zero-Length Arrays:

As we all know, GNU/GCC has made practical extensions to the standard C/C++, and zero-length arrays (Arrays of Length Zero) are one of the well-known extensions.

In most cases, they are used in variable-length arrays, defined as follows:

struct Packet
{
    int state;
    int len;
    char cData[0]; // The zero-length array here provides excellent support for variable-length structures.
};

First, let’s explain the zero-length array, also known as a flexible array:

Purpose: The main purpose of a zero-length array is to satisfy the need for variable-length structures.
Usage: By declaring a zero-length array at the end of a structure, the structure can be made variable-length. For the compiler, the zero-length array does not occupy space, as the array name itself does not occupy space; it merely represents an offset, and the array name itself represents an immutable address constant.

(Note: The array name is never a pointer!), but we can dynamically allocate the size of this array.

Note: If the structure is dynamically allocated using calloc, malloc, or new, the corresponding space must be released when it is no longer needed.

Advantages: Compared to declaring a pointer variable in the structure and then dynamically allocating space, this method is more efficient. This is because accessing the array content does not require indirect access, avoiding two memory accesses.

Disadvantages: In a structure, a zero-length array must be declared last, which imposes certain limitations on its usage.

For the compiler, the array name is merely a symbol; it does not occupy any space. In the structure, it only represents an offset, representing an immutable address constant!

Uses of Zero-Length Arrays:

Let’s imagine a scenario where we use a data buffer during network communication. The buffer includes a len field and a data field, which indicate the length of the data and the transmitted data, respectively. We commonly have several design ideas:

Fixed-length data buffer, setting a sufficiently large MAX_LENGTH for the data buffer.
Setting a pointer to the actual data, dynamically allocating space for the data buffer according to the length of the data each time it is used.

We will consider their advantages and disadvantages based on practical design applications, mainly considering the allocation, release, and access of buffer space.

1. Fixed-Length Packet (Allocation, Release, Access):

For example, if I want to send 1024 bytes of data, using a fixed-length packet, assuming the fixed-length packet length MAX_LENGTH is 2048, it will waste 1024 bytes of space and also cause unnecessary traffic waste:

Data structure definition:

// Fixed-length buffer
struct max_buffer
{
    int     len;
    char    data[MAX_LENGTH];
};

Data structure size: Considering alignment, the size of the data structure >= sizeof(int) + sizeof(char) * MAX_LENGTH

Due to considerations for data overflow, the length of the data array in variable-length packets is generally set long enough to accommodate the maximum data, thus the data array in max_buffer often does not fill up with data, resulting in waste.

Packet construction: If we want to send CURR_LENGTH = 1024 bytes, how do we construct this packet? Generally, we would return a pointer to the buffer data structure max_buffer:

// Allocation
if ((mbuffer = (struct max_buffer *)malloc(sizeof(struct max_buffer))) != NULL)    
{
  mbuffer->len = CURR_LENGTH;
  memcpy(mbuffer->data, "Hello World", CURR_LENGTH);
  printf("%d, %s\n", mbuffer->len, mbuffer->data);
}

Access: This memory must be used in two parts; the first part, 4 bytes, p->len, serves as the header (the extra part), which describes the length of the data part immediately following the header. Here it is 1024, so the first four bytes are assigned the value of 1024 (since we want to construct a variable-length packet, we must indicate how long this packet is, which is the role of len); the memory immediately following is the actual data part, accessed through p->data. Finally, a memcpy() memory copy is performed to fill the data to be sent into this memory.
Release: When the data space is no longer needed, it can be released directly.

// Destruction
free(mbuffer);
mbuffer = NULL;

2. Summary:

Using fixed-length arrays as data buffers, to avoid buffer overflow, the size of the array is generally set to a sufficient space MAX_LENGTH, but in actual use, data reaching the length of MAX_LENGTH is rare, so in most cases, most of the buffer space is wasted.
However, the usage process is very simple, and the allocation and release of data space are straightforward, requiring no additional operations from the programmer.

3. Pointer Data Packet (Allocation, Release, Access):

If you replace the fixed-length array of length MAX_LENGTH with a pointer, dynamically allocating space of size CURR_LENGTH each time it is used, then it avoids wasting MAX_LENGTH - CURR_LENGTH space, only wasting the space of one pointer field:

Data packet definition:

struct point_buffer
{
  int     len;
  char    *data;
};

Data structure size: Considering alignment, the size of the data structure >= sizeof(int) + sizeof(char *)
Space allocation: However, this also means that when allocating memory, two steps are required.

// =====================
// Pointer array  Occupy - Allocate - Destroy
// =====================
///  Occupy    
printf("the length of struct test3:%d\n",sizeof(struct point_buffer));
///  Allocate
if ((pbuffer = (struct point_buffer *)malloc(sizeof(struct point_buffer))) != NULL)
{
  pbuffer->len = CURR_LENGTH;
  if ((pbuffer->data = (char *)malloc(sizeof(char) * CURR_LENGTH)) != NULL)
  {
      memcpy(pbuffer->data, "Hello World", CURR_LENGTH);
      printf("%d, %s\n", pbuffer->len, pbuffer->data);
  }
}

First, memory space must be allocated for the structure; then, memory space must be allocated for the member variables of the structure.

This means that the two allocated memory spaces are not contiguous and must be managed separately. When using an array of length, the principle of single allocation is adopted, allocating all the required memory at once.

Release: Conversely, the release is the same:

/// Destruction
free(pbuffer->data);
free(pbuffer);
pbuffer = NULL;

Summary:
– Using pointer results as a buffer, only one pointer-sized space is used, avoiding the need for a MAX_LENGTH length array, thus preventing a large amount of space waste.
However, when allocating space, it requires additional allocation of data field space, and when releasing, it also requires explicit release of data field space. However, in actual use, it is often the case that space is allocated in a function and then returned to the user as a pointer to struct point_buffer, at which point we cannot assume that the user understands our allocation details and releases space according to the agreed operations, which can lead to inconvenience and even memory leaks.

4. Variable-Length Data Buffer (Allocation, Release, Access)

Fixed-length arrays are convenient to use, but they waste space. Pointer forms only use the space of one pointer, avoiding a large amount of space waste, but they require multiple allocations and releases. So is there an implementation method that does not waste space and is easy to use?

GNU C‘s zero-length arrays, also known as variable-length arrays, are such an extension. With the characteristics of zero-length arrays, it is easy to construct variable-length structures, such as buffers, data packets, etc.:

Data structure definition:

// Zero-length array
struct zero_buffer
{
  int     len;
  char    data[0];
};

Data structure size: Such variable-length arrays are commonly used in network communication to construct variable-length data packets, without wasting space or network traffic, because char data[0]; is just an array name and does not occupy storage space:

sizeof(struct zero_buffer) = sizeof(int)

Space allocation: When we use it, we only need to allocate space once.

/// Allocation
if ((zbuffer = (struct zero_buffer *)malloc(sizeof(struct zero_buffer) + sizeof(char) * CURR_LENGTH)) != NULL)
{
    zbuffer->len = CURR_LENGTH;
    memcpy(zbuffer->data, "Hello World", CURR_LENGTH);
    printf("%d, %s\n", zbuffer->len, zbuffer->data);
}

Releasing space: Releasing space is also the same, only one release is needed.

/// Destruction
free(zbuffer);
zbuffer = NULL;

Summary:

// zero_length_array.c

#include <stdio.h>
#include <stdlib.h>
#define MAX_LENGTH      1024
#define CURR_LENGTH      512

// Zero-length array
struct zero_buffer
{
int     len;
char    data[0];
}__attribute((packed));

// Fixed-length array
struct max_buffer
{
int     len;
char    data[MAX_LENGTH];
}__attribute((packed));

// Pointer array
struct point_buffer
{
int     len;
char    *data;
}__attribute((packed));

int main(void)
{
    struct zero_buffer  *zbuffer = NULL;
struct max_buffer   *mbuffer = NULL;
struct point_buffer *pbuffer = NULL;

// =====================
// Zero-length array  Occupy - Allocate - Destroy
// =====================
///  Occupy
printf("the length of struct test1:%d\n",sizeof(struct zero_buffer));
///  Allocate
if ((zbuffer = (struct zero_buffer *)malloc(sizeof(struct zero_buffer) + sizeof(char) * CURR_LENGTH)) != NULL)
  {
      zbuffer->len = CURR_LENGTH;
      memcpy(zbuffer->data, "Hello World", CURR_LENGTH);
      printf("%d, %s\n", zbuffer->len, zbuffer->data);
  }  
///  Destroy
free(zbuffer);
  zbuffer = NULL;

// =====================
// Fixed-length array  Occupy - Allocate - Destroy
// =====================
///  Occupy
printf("the length of struct test2:%d\n",sizeof(struct max_buffer));
///  Allocate
if ((mbuffer = (struct max_buffer *)malloc(sizeof(struct max_buffer))) != NULL)
  {
      mbuffer->len = CURR_LENGTH;
      memcpy(mbuffer->data, "Hello World", CURR_LENGTH);
      printf("%d, %s\n", mbuffer->len, mbuffer->data);
  }
/// Destroy
free(mbuffer);
  mbuffer = NULL;

// =====================
// Pointer array  Occupy - Allocate - Destroy
// =====================
///  Occupy
printf("the length of struct test3:%d\n",sizeof(struct point_buffer));
///  Allocate
if ((pbuffer = (struct point_buffer *)malloc(sizeof(struct point_buffer))) != NULL)
  {
      pbuffer->len = CURR_LENGTH;
      if ((pbuffer->data = (char *)malloc(sizeof(char) * CURR_LENGTH)) != NULL)
    {
        memcpy(pbuffer->data, "Hello World", CURR_LENGTH);
      printf("%d, %s\n", pbuffer->len, pbuffer->data);
    }
  }
/// Destroy
free(pbuffer->data);
free(pbuffer);
  pbuffer = NULL;
return EXIT_SUCCESS;
}

Zero-length arrays do not occupy memory space, while pointer methods require memory space.
For zero-length arrays, when applying for memory space, the principle of one-time allocation is adopted; for structures containing pointers, space must be allocated separately, and released separately.
Accessing zero-length arrays can be done using array notation.

Support for Variable-Length Arrays in GNU Document:

Reference:

6.17 Arrays of Length ZeroC Struct Hack – Structure with variable length array

Before C90, zero-length arrays were not supported. Zero-length arrays are an extension of GNU C, so early compilers could not compile them; for the extensions added by GNU C, GCC provides compilation options to explicitly identify them:

-pedantic option will generate corresponding warning messages where extension syntax is used.
-Wall enables GCC to generate as many warning messages as possible.
-Werror, it requires GCC to treat all warnings as errors.

// 1.c
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
    char a[0];
  printf("%ld", sizeof(a));
  return EXIT_SUCCESS;
}

Let’s compile:

# Show all warnings
gcc 1.c -Wall
# none warning and error

# Show warnings for GNU extensions
gcc 1.c -Wall -pedantic
1.c: In function ‘main’:
1.c:7: warning: ISO C forbids zero-size array ‘a’

# Show all warnings and treat GNU extensions warnings as errors
gcc 1.c -Werror -Wall -pedantic
cc1: warnings being treated as errors
1.c: In function ‘main’:
1.c:7: error: ISO C forbids zero-size array ‘a’

Zero-length arrays are actually a flexible use of arrays pointing to the contiguous memory space behind them:

struct buffer
{
  int     len;
  char    data[0];
};

Before zero-length arrays were introduced, people solved the problem using fixed-length arrays and pointers, but:

Fixed-length arrays define a sufficiently large buffer, which is convenient to use, but causes space waste every time.
Pointer methods require programmers to perform multiple free operations when releasing space, and in practice, we often return a pointer to the buffer from a function, and we cannot guarantee that everyone understands and follows our release method.

Therefore, GNU has extended zero-length arrays. When using data[0], which is a zero-length array, the zero-length array as an array name does not occupy storage space.

After C99, similar extensions were added, but using char payload[] format (so if you need to use the -pedantic parameter during compilation, you can change the type from char payload[0] to char payload[], which will compile successfully, provided your compiler supports the C99 standard; if the compiler is too old, it may not support it).

// 2.c payload
#include <stdio.h>
#include <stdlib.h>

struct payload
{
    int   len;
    char  data[];
};

int main(void)
{
    struct payload pay;
    printf("%ld", sizeof(pay));
    return EXIT_SUCCESS;
}

Using -pedantic for compilation, no warnings appear, indicating that this syntax is standard in C.

gcc 2.c -pedantic -std=c99

Thus, the end of the structure points to the memory data behind it. Therefore, we can use this type of structure as the header format for data packets, and the last member variable just happens to be the data content.

The GNU manual also provides two other structures to illustrate, making it easier to understand:

struct f1 
{
    int x;
    int y[];
} f1 = { 1, { 2, 3, 4 } };

struct f2
{
    struct f1 f1;
    int data[3];
} f2 = { { 1 }, { 5, 6, 7 } };

I changed the 2,3,4 in f2 to 5,6,7 for distinction. If you print the data, you will see the following information:

f1.x = 1
f1.y[0] = 2
f1.y[1] = 3
f1.y[2] = 4

That is, f1.y points to the data in memory {2,3,4}. Therefore, we can easily see that f2.f1.y points to the content of f2.data. The printed data is:

f2.f1.x = 1
f2.f1.y[0] = 5
f2.f1.y[1] = 6
f2.f1.y[2] = 7

If you are not sure whether it occupies space, you can use sizeof to calculate it. You will find that sizeof(struct f1)=4, which means that int y[] actually does not occupy space. However, this zero-length array must be placed at the end of the structure.If you do not place it at the end, you will encounter the following error during compilation:

main.c:37:9: error: flexible array member not at end of struct
                    int y[];
                            ^

At this point, you may wonder what happens if you replace struct f1 with int *y? This involves the issue of arrays and pointers. Sometimes, these two are the same, and sometimes they differ.

First, it should be noted that the extension supporting zero-length arrays focuses on arrays, meaning that you cannot replace it with int *y. The length of sizeof will differ. Change struct f1 to:

struct f3
{
    int x;
    int *y;
};

In 32/64 bit systems, int is 4 bytes, so sizeof(struct f1)=4, while sizeof(struct f3)=16. This is because int *y is a pointer, and in 64-bit systems, it is 64 bits, thus sizeof(struct f3) = 16; if in a 32-bit environment, sizeof(struct f3) would be 8, while sizeof(struct f1) remains unchanged. Therefore, int *y cannot replace int y[].

Code as follows:

// 3.c
#include <stdio.h>
#include <stdlib.h>

struct f1
{
    int x;
    int y[];
} f1 = { 1, { 2, 3, 4 } };

struct f2 
{
    struct f1 f1;
    int data[3];
} f2 = { { 1 }, { 5, 6, 7 } };

struct f3
{
    int x;
    int *y;
};

int main(void)
{
    printf("sizeof(f1) = %d\n", sizeof(struct f1));
    printf("sizeof(f2) = %d\n", sizeof(struct f2));
    printf("sizeof(f3) = %d\n\n", sizeof(struct f3));

    printf("f1.x = %d\n", f1.x);
    printf("f1.y[0] = %d\n", f1.y[0]);
    printf("f1.y[1] = %d\n", f1.y[1]);
    printf("f1.y[2] = %d\n", f1.y[2]);

    printf("f2.f1.x = %d\n", f1.x);
    printf("f2.f1.y[0] = %d\n", f2.f1.y[0]);
    printf("f2.f1.y[1] = %d\n", f2.f1.y[1]);
    printf("f2.f1.y[2] = %d\n", f2.f1.y[2]);

    return EXIT_SUCCESS;
}

Other Features of Zero-Length Arrays:

1. Why Zero-Length Arrays Do Not Occupy Storage Space:

What is the difference between zero-length arrays and pointer implementations, and why do zero-length arrays not occupy storage space?

Essentially, this involves the difference between arrays and pointers in C language. char a[1] and char *b are they the same?

According to page 82 of Programming Abstractions in C (Roberts, E. S., Mechanical Industry Press, 2004.6):

“arr is defined to be identical to &arr[0]”.

This means that char a[1] in fact represents a constant equal to &a[0]. While char *b is a real pointer variable that exists. Therefore, a=b is not allowed, while b=a is allowed. Both variable types support subscript access, so is there a fundamental difference between a[0] and b[0]? We can illustrate this with an example.

Refer to the following two programs gdb_zero_length_array.c and gdb_pzero_length_array.c:

// gdb_zero_length_array.c
#include <stdio.h>
#include <stdlib.h>

struct str
{
    int len;
    char s[0];
};

struct foo
{
    struct str *a;
};

int main(void)
{
    struct foo f = {NULL };
    printf("sizeof(struct str) = %d\n", sizeof(struct str));
    printf("before f.a->s.\n");
    if(f.a->s)
    {
        printf("before printf f.a->s.\n");
        printf(f.a->s);
        printf("before printf f.a->s.\n");
    }
    return EXIT_SUCCESS;
}

// gdb_pzero_length_array.c
#include <stdio.h>
#include <stdlib.h>

struct str
{
    int len;
    char *s;
};

struct foo
{
    struct str *a;
};

int main(void)
{
    struct foo f = {NULL };
    printf("sizeof(struct str) = %d\n", sizeof(struct str));
    printf("before f.a->s.\n");
    if (f.a->s)
    {
        printf("before printf f.a->s.\n");
        printf(f.a->s);
        printf("before printf f.a->s.\n");
    }
    return EXIT_SUCCESS;
}

We can see that although both programs have access exceptions, the segmentation fault occurs at different locations.

We will compile both programs into assembly and then use diff to see what differences exist in their assembly code.

gcc -S gdb_zero_length_array.c -o gdb_test.s
gcc -S gdb_pzero_length_array.c -o gdb_ptest
diff gdb_test.s gdb_ptest.s

1c1
<   .file   "gdb_zero_length_array.c"
---
>   .file   "gdb_pzero_length_array.c"
23c23
<   movl    $4, %esi
---
>   movl    $16, %esi
30c30
<   addq    $4, %rax
---
>   movq    8(%rax), %rax
36c36
<
addq    $4, %rax
---
>   movq    8(%rax), %rax  # printf("sizeof(struct str) = %d\n", sizeof(struct str));
23c23
<   movl    $4, %esi      #printf("sizeof(struct str) = %d\n", sizeof(struct str));
---
>   movl    $16, %esi     #printf("sizeof(struct str) = %d\n", sizeof(struct str));

From the 64-bit system assembly, we see that the size of the variable-length array structure is 4, while the size of the pointer form structure is 16:

f.a->s
30c30/36c36
<   addq    $4, %rax
---
>   movq    8(%rax), %rax

We can see that:

For char s[0], the assembly code uses the addq instruction, addq $4, %rax
For char *s, the assembly code uses the movq instruction, movq 8(%rax), %rax

addq is for <code>%rax + sizeof(struct str), that is, the address of the end of the <code>str structure, which is the address of char s[0]. This step only retrieves its address, while movq puts the content of the address into the register, thus sometimes also referred to as the leap instruction. Refer to the next example.

From this, we can see that accessing the member array name actually retrieves the relative address of the array, while accessing the member pointer retrieves the content in the relative address (this is the same as accessing other non-pointer or non-array variables):

Accessing the relative address will not cause the program to crash, but accessing the content of an illegal address will cause the program to crash.

// 4-1.c
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    char *a;
    printf("%p\n", a);
    return EXIT_SUCCESS;
}


// 4-2.c
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    char a[0];
    printf("%p\n", a);
    return EXIT_SUCCESS;
}


$ diff 4-1.s 4-2.s
1c1
<       .file   "4-1.c"
---
>       .file   "4-2.c"
13c13
<       subl    $16, %esp
---
>       subl    $32, %esp
15c15
<       leal    16(%esp), %eax
---
>       movl    28(%esp), %eax

For char a[0], the assembly code uses the leal instruction, leal 16(%esp), %eax:
For char *a, the assembly code uses the movl instruction, movl 28(%esp), %eax

2. Address Optimization:

// 5-1.c
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    char a[0];
    printf("%p\n", a);

    char b[0];
    printf("%p\n", b);

    return EXIT_SUCCESS;
}

img

Since zero-length arrays are an extension of GNU C and are not accepted by the standard library, some cleverly written quirky code will yield execution results that depend on the implementation of the compiler and optimization strategies.

For example, in the above code, the addresses of a and b may be optimized to the same location by the compiler, because a[0] and b[0] are not usable for the program, which reminds us of something?

Compilers often optimize the addresses of identical string literals to the same location to reduce space usage:

// 5-2.c
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    const char *a = "Hello";
    printf("%p\n", a);

    const char *b = "Hello";
    printf("%p\n", b);

    const char c[] = "Hello";
    printf("%p\n", c);

    return EXIT_SUCCESS;
}

Original text: https://kernel.blog.csdn.net/article/details/64131322

-END-

Previous recommendations: Click the image to jump to read Understanding the Implementation Principle of Zero-Length Arrays in Embedded C

How to efficiently parse variable-length data frames in embedded communication protocols?

Understanding the Implementation Principle of Zero-Length Arrays in Embedded C

These 7 design tips will make your embedded programs more stable and reliable!

Understanding the Implementation Principle of Zero-Length Arrays in Embedded C

Learning embedded systems is now so easy and relaxed!

I am Lao Wen, an embedded engineer who loves learning.Follow me to become even better together!

Concept of Zero-Length Arrays:

Uses of Zero-Length Arrays:

1. Fixed-Length Packet (Allocation, Release, Access):

2. Summary:

3. Pointer Data Packet (Allocation, Release, Access):

4. Variable-Length Data Buffer (Allocation, Release, Access)

Support for Variable-Length Arrays in GNU Document:

Other Features of Zero-Length Arrays:

1. Why Zero-Length Arrays Do Not Occupy Storage Space:

2. Address Optimization:

Related posts

Leave a Comment Cancel reply