Understanding Arrays and Pointers in C Language: Part Three

3. Analyzing from the Perspective of Compiler Semantics

In fact, we have already covered a lot about the compiler’s understanding of the syntax and semantics of the C language in the previous section. Below, we will further explain this logic from the compiler’s output. In this article, we will first paste the compilation results of the previous source code:

p5.c: In function ‘main’:

p5.c:29:19: warning: assignment to ‘float *’ from incompatible pointer type ‘int *’ [-Wincompatible-pointer-types]

29| pointor_float = int_pointor; // Assigning integer pointer pi to float pointer pointor_float

| ^

p5.c:30:18: warning: assignment to ‘char *’ from incompatible pointer type ‘int *’ [-Wincompatible-pointer-types]

30| pointor_char = int_pointor; // Assigning integer pointer pi to char pointer pointor_char

| ^

p5.c:50:21: warning: assignment to ‘long int’ from ‘int (*)[10]’ makes integer from pointer without a cast [-Wint-conversion]

50| point_int_array = &int_array; // Note the compilation warning for this statement

| ^

p5.c:51:21: warning: assignment to ‘long int’ from ‘int *’ makes integer from pointer without a cast [-Wint-conversion]

51| point_int_array = int_array; // Note that this statement has no compilation warning

| ^

p5.c:56:18: warning: assignment to ‘char *’ from incompatible pointer type ‘int *’ [-Wincompatible-pointer-types]

56| pointor_char = int_array; // Note the compilation warning for this statement

| ^

Here we see that the gcc compiler outputs a total of 5 warnings for this program. Please read these warning messages carefully, as they will provide us with a lot of information worth considering.

For example, the following compilation warning output is for the statement point_int_array = &int_array:

p5.c:50:21: warning: assignment to ‘long int’ from ‘int (*)[10]’ makes integer from pointer without a cast [-Wint-conversion]

50 | point_int_array = &int_array; // Note the compilation warning for this statement

| ^

The compiler clearly indicates here that it is treating &int_array as a pointer to an array, rather than int_array itself. So, how does the compiler view int_array? The compilation warning for the next statement pointor_char = int_array provides the answer:

p5.c:56:18: warning: assignment to ‘char *’ from incompatible pointer type ‘int *’ [-Wincompatible-pointer-types]

56| pointor_char = int_array; // Note the compilation warning for this statement

| ^

The compiler clearly states here: from incompatible pointer type ‘int *’ assignment to ‘char *’ — assigning an incompatible integer pointer to a character pointer. The compiler indicates that int_array is a pointer type pointing to integer data. This shows that the compiler itself treats the array name int_array as a pointer to the data type of its array elements.

Furthermore, if we analyze the implementation of variable-length arrays (VLA), we can better understand the relationship between array variables and pointers. Below is the disassembly output of the VLA program (the source code can be found in the last section about “Why Arrays Were Invented”):

callq 401050 <__isoc99_scanf@plt>

/* Allocate storage space for array a[n], modify stack pointer %rsp */

mov -0x4c(%rbp),%ebx ; n

movslq %ebx,%rax

sub $0x1,%rax

mov %rax,-0x28(%rbp) ; index of array array_t ?

movslq %ebx,%rax

mov %rax,-0x60(%rbp) ; duplicate of n ?

movq $0x0,-0x58(%rbp)

movslq %ebx,%rax

mov %rax,%r14

mov $0x0,%r15d

movslq %ebx,%rax

lea 0x0(,%rax,4),%rdx ; offset of array_t[0]

mov $0x10,%eax

sub $0x1,%rax

add %rdx,%rax

mov $0x10,%ecx

mov $0x0,%edx

div %rcx ; segments number of array_t

imul $0x10,%rax,%rax ; bytes number of array_t

sub %rax,%rsp ; new stack pointer for function.

/* VLA storage space is usually placed at the bottom of the function stack area, otherwise adjustments of other variables would be very troublesome */

mov %rsp,%rax

add $0x3,%rax

shr $0x2,%rax

shl $0x2,%rax

mov %rax,-0x30(%rbp) ; pointer for VLA array_t, VLA always at the top of the stack

The last instruction means that the address of the first element of the variable-length array array_t is stored in the memory location -0x30(%rbp). Since the address of a variable-length array cannot be made constant, it cannot be modified to include all elements of the array. The compiler’s implementation chose a compromise method, storing the address used by the array in a pointer variable, allowing easy modification of the array pointer’s address for access after it has been made variable-length. In other words, the storage space for variable-length arrays is allocated by adjusting the growth of the stack during the function’s lifetime. Once determined, it cannot be modified, as we can see that the implementation process only involves space adjustment without data migration. Therefore, it is necessary to place the VLA at the bottom of the function stack area (top of the stack).

As for those who believe that sizeof() and typeof() can prove that arrays are not pointers, they have not understood the underlying principles of sizeof() and typeof(). We often hear that sizeof() and typeof() are operators rather than functions because if one understands this issue from the bottom up, such doubts would not arise. This is a topic that everyone can explore on their own, and if there is an opportunity later, a detailed analysis will be conducted. If one has a certain understanding of compiler principles, this concept is actually quite easy to grasp.

4. Analyzing from the Perspective of Definitions

Having analyzed so much, there are still many people who have questions about the previous analysis. They believe that this is just an implementation and does not represent the essence; the definition is the essence. So how is an array defined? Let’s take a look at the specific definition.

As one of the inventors of the C language, Dennis M. Ritchie’s works should be the most authoritative in this regard. In the classic book on C language co-authored with Brian W. Kernighan, “The C Programming Language”, the relationship between arrays and pointers is clearly explained. The C language defined in this book is later referred to as K&R C. In section 5.3 pointers and arrays of this book, the original description of the definition of array variables is as follows:

By definition, the value of a variable or expression of type array is the address of element zero of the array.

This definition is very clear:“the value is address”

Summary

In summary of the previous analysis:

First, it indicates that the value of the array variable int_array is itself the address of the first element of the array, meaning that the array variable is a pointer.

Furthermore, a pointer is essentially a pointer to the first element of the array that contains the same data type as the elements of that array.

Third, it indicates that when assigning an array variable to a pointer of the same data type as the elements of that array, no type conversion is needed, whether explicit or implicit.

Fourth, although the array variable is essentially a pointer, it is usually a pointer constant, meaning you cannot operate on this pointer itself; it can only be treated as a right value.

Finally, although arrays are essentially pointers, the compiler still distinguishes between arrays and pointers when processing them. For most operators, the compiler treats arrays as pointers directly, and the operational behavior is generally indistinguishable. However, for certain special operators such as sizeof() and typeof(), the compiler performs special handling based on specific situations. In general, it treats all elements of the array as a whole.

Lastly, it is important to emphasize that arrays are still arrays. Although array variables are essentially pointers, there are definitely some differences between them. Otherwise, there would be no need to invent arrays as a data variable. As a special type of pointer variable, due to the specific nature of its reference object, the rules of operations for array variables have certain peculiarities. For example, the sizeof() operation includes the entire storage space occupied by all array elements. The output of typeof() also includes the data type of all array elements, rather than simply outputting the data type of the array elements or the storage space occupied by the data type of the array elements. For the sake of unification, it can actually be easier to understand by viewing basic data type variables as an array type containing one array element.

Leave a Comment