Avoiding Pitfalls: A Guide for C Programmers

In 2014, the open-source library OpenSSL, which supports secure transport layer protocols, revealed a shocking vulnerability in its TLS/DTLS Heartbeat extension, which contained a buffer overflow vulnerability. This is the infamous Heartbleed vulnerability.

This vulnerability allowed attackers to obtain encrypted data and steal the keys used for encryption. Since OpenSSL is one of the foundational infrastructures of internet services, the Heartbleed vulnerability swept across almost all internet companies worldwide, resulting in immeasurable losses.

The cause of this vulnerability was simply due to the careless use of the memcpy() method in a piece of C code, specifically the failure to properly check the bounds of the buffer.

If the C programmer who wrote the Heartbleed vulnerability had The C Programming Language: Traps and Pitfalls on their desk (perhaps they did), they could have carefully reviewed it a few times and avoided falling into such a dangerous pit.

Avoiding Pitfalls: A Guide for C Programmers

▲ Choose this book to avoid pitfalls in C programming

Next, we will provide a guide for C programmers to avoid pitfalls from three aspects: avoiding syntax and semantic errors, being cautious with pointers, and ensuring portability.

Easy Win for a Meal

Once, a colleague asked me to help her with a problem; she wrote about a hundred lines of code, but it never ran as expected, and she couldn’t find out why.

After observing her compile and run the demonstration, the code seemed correct, but the result was wrong. I decided to pull the code over and examine it line by line.

Suddenly, I had a moment of clarity and saw the problem. I jokingly said, “I found it; if I solve it, buy me lunch.”

My colleague readily agreed, so I moved the cursor to the following code:

for (int i=0; i &lt; N; i++);  do_something();

My colleague took a serious look, covered her face in defeat, and exclaimed, “Just this problem? Let’s go, I’ll treat you to lunch.”

The issue was the semicolon at the end of the for statement. Looking at this line, the problem was obvious; the semicolon after the for loop statement was redundant, causing the loop to run empty, and the do_something() method was not executed.

This kind of mistake can actually be avoided by changing the coding style.

By adding curly braces to all for loop bodies, you won’t be distracted by such issues; it’s that simple.

for (int i=0; i &lt; N; i++) { // With curly braces, even if an extra semicolon is added, it becomes easy to spot do_something();}

In addition to syntax and semantic issues, here are three good suggestions based on the book’s content:

When using conditional statements like if, switch, while, etc., always add curly braces to the execution body.
When making comparisons, write the literal before the variable. For example, if(100 == x) can prevent mistakes like if(x = 100) from compiling successfully.
Use memory copying methods with clear boundaries to avoid buffer overflow issues. For example, avoid using sprintf(), strcpy(), strcat() and replace them with snprintf(), strncpy(), strncat().

Do These Two Things to Avoid Pointer Abuse

A sign of a C programmer’s growth is that they truly understand the pain of being abused by pointers. The concept of pointers in C is not hard to grasp, but using them can lead to various issues.

There are three types of issues related to pointer usage:

Dangling Pointers: Using a pointer that has not been initialized or pointing to an unknown area.

Dangling Pointers: A pointer that points to memory that has been freed but has not been set to NULL.

Out-of-Bounds Access: Accessing beyond the bounds while traversing an array or linked list, leading to illegal access.

The consequences of these issues are either abnormal data content or a direct Segmentation fault (core dumped) crash, a terrifying message that has haunted many C programmers.

To avoid these pitfalls in programming, we need to do two things: first, operations with pointers and arrays; second, operations with pointers to dynamic memory.

First, let’s talk about operations with pointers and arrays.

Pointers are often used to access arrays for convenience. If you are not familiar with the characteristics of arrays, you may make mistakes unknowingly. First, let’s understand the basic characteristics of arrays:

In C, there is only one-dimensional arrays, and the elements of the array can be objects of any type.

For an array, you can only do two things: determine the size of the array and obtain a pointer to the element at index 0 of that array. It is important to note that the pointer to the first element of the array is a pointer constant.

This gives us the insight that multi-dimensional arrays in C are simulated by containing another array element within a one-dimensional array; any indexed operation on an array can be converted to an equivalent pointer operation.

Below is an example of pointer operations with arrays:

int array[100]; // Declare an integer array with 100 elements, indexed from 0 to 99int *p = array; // Declare an integer pointer p, pointing to the first element of the arrayarray*p = 123; // Equivalent to array[0] = 123p++; // Move the pointer to the next element of the array, i.e., the position of array[1]*p = 456; // Equivalent to array[1] = 456

The common mistake when using pointers to manipulate arrays is that the p++ step may be misunderstood as shifting by one byte, and some may mistakenly use p = p + sizeof(int) to achieve this.

Next, let’s talk about operations with pointers to dynamic memory.

Issues arising from pointer operations on dynamic memory are some of the most profound and obscure pitfalls in C. Based on the book’s content, I summarize the following principles for writing code:

Where memory is allocated, it should be freed there. That is, try to place malloc() and free() within the same module or abstraction layer, rather than scattering them across different levels.
Immediately initialize memory after calling malloc(). Do not assume that memory allocation functions will initialize memory; caution is the key to safety.
Immediately set the pointer variable to NULL after calling free(). Regardless of whether the subsequent code uses this variable, a simple operation can effectively avoid dangling pointer issues.
Check for memory allocation errors. After calling malloc(), check whether the pointer variable is NULL. Do not assume that memory allocation will always succeed; it can fail when there is a memory leak.

Ensure Portability: Be Cautious About These Things

C programmers must convince themselves of one thing: if the program they write runs well on one platform, it will definitely have issues when ported to another platform.

This is the characteristic of portability work, perfectly fitting Murphy’s Law. The strangest phenomenon is, “It runs perfectly fine on my end; why does it fail on yours?”

I once fell into a pit while working. I was porting a functional library to an embedded platform; this library had been stably running on the server for a while. I thought there shouldn’t be any issues, so I directly cross-compiled and tested it.

As a result, it was always unstable, sometimes running normally, sometimes failing, erratic and unpredictable. I had to solve the problem, and since the embedded environment was limited, I couldn’t run gdb; I had to fill the code with print statements. Eventually, I discovered that an integer variable had an incorrect value.

This variable, declared as unsigned long, was incorrect when its value exceeded 4294967295. Using sizeof to probe, I found out that the server environment was 64-bit, with sizeof(long) = 8, while the embedded environment was 32-bit, with sizeof(long) = 4. (This is a very basic issue; I was early in my career, and I apologize for the oversight.)

Although ANSI C strives to ensure that on different platforms, the same types and methods yield consistent results, the differences in detail among C compiler vendors lead to real traps.

A misaligned bolt can cause a spaceship to explode. Luckily, The C Programming Language: Traps and Pitfalls lists out the hidden pitfalls that can be encountered in portability, and I will explain some representative ones.

Identifier Names Should Not Conflict with Library Functions

The book uses the malloc() method as an example, where a custom Malloc() method was defined to track memory allocation. Although there is a difference in case, some special platforms may not distinguish between cases, which is not a good idea.

The solution is to add a unique and less likely to be repeated prefix to the custom method names. For example, if the full project name is cactus_project, then its abbreviation can be concatenated with the method name, resulting in cac_malloc().

Standardize Predefined Integer Types

This is to solve the issue I mentioned earlier about the different types of long on different platforms. You can use #define for macro definitions, but it is more advisable to use typedef for type predefinitions.

The following example ensures that int64 is a 64-bit signed integer, while uint64 is a 64-bit unsigned integer on all platforms:

typedef long long int int64; typedef unsigned long long int uint64;

Handle NULL Carefully

If a character pointer is set to NULL, it indicates a null pointer state, and accessing it for reading can lead to undefined behavior. This situation often arises in debug print statements.

Here is a simple example:

char *p = NULL;printf("result: %s\n", p);

On some platforms, it may work normally, simply displaying result: (null), while on others, it may directly prompt a segmentation fault and exit.

The best response is to perform a non-null check before using the pointer. Of course, I understand that directly displaying null in print statements can also be meaningful, but this requires adding a check branch and extra handling, which C programmers may not be willing to do.

However, considering the portability of the program, being cautious while writing code can prevent falling into such unnecessary pitfalls, saving a lot of modification and debugging costs, which is definitely worthwhile.

Conclusion

The author of The C Programming Language: Traps and Pitfalls is Andrew Koenig, who joined Bell Labs in 1977 and began researching the C language. In the C/C++ field, Andrew has shone brightly; his hundreds of research papers have inspired many programmers and helped them avoid one deep pit after another.

He organized the issues he encountered while using the C language and published them in a paper in 1985. Surprisingly, more than 2000 people requested copies of that paper from the Bell Labs library. Andrew then expanded this paper, resulting in the enduring The C Programming Language: Traps and Pitfalls that has thrived for 37 years.

For new C programmers struggling in the pits, quickly pick up this book to help yourself climb out; for veterans who haven’t read this book in years, don’t wait any longer. I believe that casually flipping through it will remind you of a painful experience, serving as a warning not to fall into even darker pits.

As one navigates through the world, keeping a guide to self-defense at hand means you won’t fear getting hurt!

Article Author: Arno Proofread by: Sun Ying, Guo Yongze

—END—

Related posts

Leave a Comment Cancel reply