Summary of Embedded C Language Knowledge Points

Summary of Embedded C Language Knowledge PointsIntroduction How can one excel in embedded development? Master the C language! Today, I would like to recommend a summary of embedded C language knowledge points written by an expert.Keywords in C Language The keywords in C language can be categorized by their functions:

  • Data types (commonly used: char, short, int, long, unsigned, float, double)

  • Operations and expressions ( =, +, -, *, while, do-while, if, goto, switch-case)

  • Data storage (auto, static, extern, const, register, volatile, restricted)

  • Structures (struct, enum, union, typedef),

  • Bitwise and logical operations (<<, >>, &, |, ~, ^, &&)

  • Preprocessing (#define, #include, #error, #if…#elif…#else…#endif, etc.)

  • Platform extension keywords (__asm, __inline, __syscall)

These keywords together form the syntax of C language for embedded platforms.Embedded applications can logically be abstracted into three parts:

  • Data input, such as sensors, signals, and interface inputs

  • Data processing, such as protocol decoding and packaging, conversion of AD sampling values, etc.

  • Data output, such as GUI display, output pin status, DA output control voltage, PWM duty cycle, etc.

Data management runs through the entire development of embedded applications, which includes data types, storage management, bit and logical operations, and data structures. The C language supports the implementation of these functions syntactically and provides corresponding optimization mechanisms to cope with the more constrained resource environment in embedded systems.Data Types The C language supports commonly used character, integer, and floating-point variable types. Some compilers, such as Keil, also extend support for bit and sfr (special function register) data types to meet specific address operations. The C language only specifies the minimum value range for each basic data type, so the same type may occupy different lengths of storage space on different chip platforms. This requires consideration of compatibility for subsequent portability during code implementation, and the typedef provided by C language is the keyword used to handle this situation, commonly adopted in most cross-platform software projects, as shown below:

typedef unsigned char uint8_t;typedef unsigned short uint16_t;typedef unsigned int uint32_t;......typedef signed int int32_t;

Since the basic data widths differ across platforms, determining the width of the basic data type, such as int, on the current platform requires the use of the sizeof interface provided by C language, implemented as follows.

printf("int size:%d, short size:%d, char size:%d\n", sizeof(int), sizeof(char), sizeof(short));

Another important knowledge point is the width of pointers, such as:

char *p;printf("point p size:%d\n", sizeof(p));

This is actually related to the addressable width of the chip; for example, a 32-bit MCU has a width of 4, while a 64-bit MCU has a width of 8. Sometimes this is also a simple way to check the MCU bit width.Memory Management and Storage Architecture The C language allows program variables to determine memory addresses at the time of definition, implementing a fine-grained processing mechanism through scope and the keywords extern and static. According to different hardware regions, there are three ways to allocate memory (excerpted from C++ High-Quality Programming):

  • Allocate from static storage area. Memory is allocated at compile time and exists throughout the program’s runtime. For example, global variables and static variables.

  • Create on the stack. During function execution, the storage units of local variables can be created on the stack, and these storage units are automatically released when the function execution ends. Stack memory allocation operations are built into the processor’s instruction set, making them very efficient, but the allocated memory capacity is limited.

  • Allocate from the heap, also known as dynamic memory allocation. The program can request any amount of memory at runtime using malloc or new, and the programmer is responsible for when to use free or delete to release memory. The lifetime of dynamic memory is determined by the programmer, making it very flexible, but it also encounters the most problems.

Here is a simple C language example.

//main.c#include <stdio.h>#include <stdlib.h>static int st_val;                   // Static global variable -- static storage areaint ex_val;                           // Global variable -- static storage areaint main(void){   int a = 0;                         // Local variable -- allocated on stack   int *ptr = NULL;                   // Pointer variable   static int local_st_val = 0;       // Static variable   local_st_val += 1;   a = local_st_val;   ptr = (int *)malloc(sizeof(int)); // Allocate space from heap   if(ptr != NULL)   {          printf("*p value:%d", *ptr);    free(ptr);          ptr = NULL;          // After free, ptr needs to be set to NULL, otherwise it will lead to invalid pointer check failure       }            }

The scope of C language not only describes the area where identifiers are accessible but also specifies the storage area of variables. The variables st_val and ex_val in file scope are allocated to the static storage area, where the static keyword mainly limits whether the variable can be accessed by other files, while the variables a, ptr, and local_st_val in block scope are allocated to different areas based on their types. Here, a is a local variable allocated on the stack, ptr is a pointer allocated in the heap by malloc, and local_st_val is limited by the keyword, indicating allocation to the static storage area. This involves an important knowledge point: the meaning of static is different in file scope and block scope: in file scope, it limits the external linkage of functions and variables (whether they can be accessed by other files), while in block scope, it is used to allocate variables to the static storage area. For C language, understanding the above knowledge is generally sufficient for memory management, but for embedded C, defining a variable may not necessarily be in memory (SRAM); it could also be in FLASH space or directly stored in registers (variables defined as register or some local variables under high optimization levels). For example, global variables defined as const are defined in FLASH, while local variables defined as register may be optimized to be placed directly in general-purpose registers. Understanding this knowledge is significant for code maintenance, especially when optimizing for speed or when storage is constrained. Additionally, embedded C language compilers may extend memory management mechanisms, such as supporting scatter loading mechanisms and __attribute__((section(“user-defined area”))), allowing variables to be stored in special areas like SDRAM or SQI FLASH. This enhances memory management to adapt to complex application scenarios and requirements.

LD_ROM 0x00800000 0x10000 { ;load region size_region    EX_ROM 0x00800000 0x10000 { ;load address = execution address  *.o (RESET, +First)  *(InRoot$$Sections)  .ANY (+RO)  }  EX_RAM 0x20000000 0xC000 { ;rw Data    .ANY (+RW +ZI)  }  EX_RAM1 0x2000C000 0x2000 {    .ANY(MySection)   }  EX_RAM2 0x40000000 0x20000{    .ANY(Sdram)  }}int a[10] __attribute__((section("Mysection"));int b[100] __attribute__((section("Sdram"));

By adopting this method, we can specify variables to the required areas. In some cases, this is necessary, such as when creating a GUI or webpage that requires storing a large number of images and documents, where internal FLASH space may be insufficient. In such cases, variables can be declared in external areas. Additionally, certain data in memory may be critical, and to avoid being overwritten by other content, it may be necessary to allocate a separate SRAM area to prevent fatal errors caused by unintended modifications. These experiences are commonly used and important in actual product development, but due to space limitations, only brief examples are provided here. If you encounter such needs in your work, it is recommended to understand them in detail. As for heap usage, for embedded Linux, it is consistent with standard C language usage. Pay attention to checks after malloc, and remember to set pointers to NULL after freeing to avoid “wild pointers.” However, for resource-constrained microcontrollers, the use of malloc is generally rare. If frequent memory block requests are needed, a memory management mechanism based on static storage and memory block segmentation is often constructed. This approach is more efficient (using fixed-size blocks pre-segmented, directly looking up numbers during use) and allows for controlled use of memory blocks, effectively avoiding memory fragmentation issues. Common examples include RTOS and network LWIP, which adopt this mechanism. I personally also prefer this method, so details about the heap are not described here. If you wish to learn more, you can refer to for explanations related to storage.Pointers and Arrays Arrays and pointers are often the main causes of program bugs, such as array out-of-bounds, pointer out-of-bounds, illegal address access, and unaligned access. These issues often have pointers and arrays behind them, so understanding and mastering pointers and arrays is essential for becoming a qualified C language developer. An array consists of elements of the same type, and when it is declared, the compiler allocates a segment of space in memory based on the characteristics of the internal elements. Additionally, the C language provides multi-dimensional arrays to meet the needs of special scenarios, while pointers provide a symbolic method for using addresses. A pointer only makes sense when it points to a specific address. The C language’s pointers have the greatest flexibility; they can point to any address before being accessed, greatly facilitating hardware operations, but they also impose higher requirements on developers. Refer to the following code.

int main(void){char cval[] = "hello";int i;int ival[] = {1, 2, 3, 4};int arr_val[][2] = {{1, 2}, {3, 4}};const char *pconst = "hello";char *p;int *pi;int *pa;int **par;p = cval;  p++;            //addr increases by 1  pi = ival;  pi+=1;          //addr increases by 4  pa = arr_val[0];  pa+=1;          //addr increases by 4  par = arr_val;  par++;         //addr increases by 8for(i=0; i<sizeof(cval); i++)  {printf("%d ", cval[i]);  }printf("\n");printf("pconst:%s\n", pconst);printf("addr:%d, %d\n", cval, p);printf("addr:%d, %d\n", ival, pi);printf("addr:%d, %d\n", arr_val, pa);printf("addr:%d, %d\n", arr_val, par);}/** PC端64位系统下运行结果0x68 0x65 0x6c 0x6c 0x6f 0x0pconst:helloaddr:6421994, 6421995addr:6421968, 6421972addr:6421936, 6421940addr:6421936, 6421944 */

For arrays, values are generally accessed starting from 0, with length-1 as the endpoint, accessed through the half-open interval [0, length). This generally does not cause issues, but sometimes, when we need to read the array in reverse, we may mistakenly use length as the starting point, leading to out-of-bounds access. Additionally, when operating on arrays, to save space, the index variable i is sometimes defined as unsigned char type. The range of unsigned char in C language is 0-255, and if the array is large, it may lead to an inability to terminate when exceeding the array, resulting in an infinite loop. This is easy to avoid during the initial code construction, but if requirements change later and the array size increases, there may be hidden dangers in other places where the array is used, which requires special attention. Since the space occupied by pointers is related to the chip’s addressing width, 32-bit platforms have 4 bytes, while 64-bit platforms have 8 bytes. The length of pointer arithmetic is also related to its type; for example, char type is 1, int type is 4. If you carefully observe the above code, you will find that the value of par increases by 8 because it points to a pointer, and the corresponding variable is a pointer, meaning its length is the length of the pointer type, which is 8 on a 64-bit platform and 4 on a 32-bit platform. Understanding these characteristics is not difficult, but slight negligence in engineering applications can lead to subtle issues. Additionally, pointers also support type casting, which can be quite useful in certain situations. Refer to the following code:

#include <stdio.h>typedef struct{int b;int a;}STRUCT_VAL;static __align(4) char arr[8] = {0x12, 0x23, 0x34, 0x45, 0x56, 0x12, 0x24, 0x53};int main(void){    STRUCT_VAL *pval;int *ptr;    pval = (STRUCT_VAL *)arr;    ptr = (int *)&arr[4];printf("val:%d, %d", pval->a, pval->b);printf("val:%d,", *ptr);}//0x45342312 0x53241256//0x53241256

Based on pointer type casting, efficient and quick solutions to data parsing issues are achieved in protocol parsing and data storage management. However, common and easily mistaken issues arise during the processing of data alignment and endianness. For example, the character array arr is forcibly defined as 4-byte aligned using __align(4), which is necessary to ensure that subsequent accesses as int pointers do not trigger unaligned access exceptions. If not forcibly defined, char is aligned by default to 1 byte, which does not necessarily trigger exceptions (it depends on the overall memory layout and whether the actual used space supports unaligned access, as some SDRAM may trigger exceptions with unaligned access). This can lead to unexpected behavior when adding or subtracting other variables, and the location of the exception may often have no relation to the added variables. Additionally, code that runs normally on some platforms may trigger exceptions when switching platforms. Such hidden phenomena are difficult to trace and resolve in embedded systems. Furthermore, C language pointers have special uses, such as accessing specific physical addresses through type casting and implementing callbacks through function pointers, as shown below:

#include <stdio.h>typedef int (*pfunc)(int, int);int func_add(int a, int b){return a+b;}int main(void){    pfunc *func_ptr;    *(volatile uint32_t *)0x20001000 = 0x01a23131;    func_ptr = func_add;printf("%d\n", func_ptr(1, 2));}

Here, it should be noted that volatile refers to variables that can change, generally used in the following situations:

  • Hardware registers of parallel devices, such as status registers

  • Non-automatic variables accessed in an interrupt service routine

  • Variables shared by multiple tasks in multi-threaded applications

The volatile keyword can solve synchronization issues when user mode and exception interrupts access the same variable. Additionally, when accessing hardware addresses, volatile prevents optimization of address access, ensuring that the actual address is accessed. Mastering the use of volatile is crucial in embedded low-level programming and is one of the basic requirements for embedded C practitioners. Function pointers are not commonly used in general embedded software development, but they can implement many applications in important implementations such as asynchronous callbacks and driver modules. Of course, I can only touch on this topic briefly; many detailed knowledge points are worth understanding and mastering.Structure Types and Alignment The C language provides custom data types to describe a class of transactions with the same characteristics, mainly supporting structures, enumerations, and unions. Among them, enumerations limit data access through aliases, making data more intuitive and readable, implemented as follows:

typedef enum {spring=1, summer, autumn, winter }season;season s1 = summer;

Unions are data types that can store different types of data in the same storage space. The size of a union is determined by the largest variable it contains, as shown below:

typedef union{       char c;       short s;       int i; }UNION_VAL;UNION_VAL val; int main(void) {       printf("addr:0x%x, 0x%x, 0x%x\n",                     (int)(&val.c), (int)(&val.s), (int)(&val.i));       val.i = 0x12345678;       if(val.s == 0x5678)             printf("Little-endian\n");         else             printf("Big-endian\n");     } /*addr:0x407970, 0x407970, 0x407970 Little-endian*/

The main use of unions is to access internal segments of data by sharing memory addresses, providing a more convenient way to access certain variables. Additionally, testing the endianness of chips is a common application of unions. Of course, using pointer type casting can also achieve this purpose, implemented as follows:

int data = 0x12345678; short *pdata = (short *)&data; if(*pdata == 0x5678)       printf("%s\n", "Little-endian"); else     printf("%s\n", "Big-endian");

It can be seen that using unions can avoid the misuse of pointers in certain situations.Structures are collections of variables with common characteristics. Compared to C++ classes, they do not have access restrictions and do not support functions directly within them. However, through custom data types and function pointers, many class-like operations can still be achieved. For most embedded projects, structured data processing greatly facilitates overall architecture optimization and later maintenance. The C language’s structures support access through pointers and variables, allowing for the parsing of any memory data through conversions, as previously mentioned in protocol parsing. Additionally, by packaging data and function pointers, passing them through pointers is an important foundation for implementing driver layer interface switching, which has significant practical implications. Furthermore, using bit fields, unions, and structures can achieve another form of bit manipulation, which is crucial for encapsulating low-level hardware registers.By using unions and bit field operations, we can access bits of data, providing a simple and intuitive processing method for platforms with limited registers and memory. Another important knowledge point regarding structures is alignment. Accessing aligned data can significantly improve runtime efficiency, but the storage length issues introduced by alignment can also lead to errors. The understanding of alignment can be categorized as follows:

  • Basic data types: aligned by their default lengths, such as char aligned to 1 byte, short aligned to 2 bytes, etc.

  • Arrays: aligned according to basic data types; if the first is aligned, the subsequent ones will naturally be aligned as well.

  • Unions: aligned according to the largest data type contained within them.

  • Structures: each data type within a structure must be aligned, and the structure itself is aligned to the length of the largest internal data type.

The size of the union is consistent with the largest variable, int, which is 4 bytes. By reading the values, we can see that the actual memory layout and padding positions are consistent. In fact, learning to understand C language’s alignment mechanism through padding is an effective and quick way.Preprocessing Mechanism The C language provides a rich preprocessing mechanism that facilitates cross-platform code implementation. Additionally, the C language’s macro mechanism allows for data and code block replacement, string formatting, and code segment switching, which are of great significance for engineering applications. Below, I will describe commonly used preprocessing mechanisms in C language according to functional requirements. #include file inclusion command, in C language, it effectively inserts all content from the included file into the current position. This includes not only header files but also parameter files and configuration files, which can be inserted into the specified position in the current code. The angle brackets <> and quotes “” indicate whether to search from the standard library path or user-defined path. #define macro definition, commonly used to define constants or code segment aliases. In certain cases, it can also be used with ## to format strings, achieving unified processing of interfaces, as shown below:

#define MAX_SIZE  10#define MODULE_ON  1#define ERROR_LOOP() do{
                     printf("error loop\n");
                   }while(0);#define global(val) g_##valint global(v) = 10;int global(add)(int a, int b){return a+b;}

#if..#elif…#else…#endif, #ifdef..#endif, #ifndef…#endif conditional selection judgments. Conditional selection is mainly used to switch code blocks, often used in comprehensive and cross-platform projects to meet various requirements. #undef cancels the definition of parameters to avoid redefinition issues. #error, #warning are used for user-defined warning messages, which can be limited by pre-defined configurations when used with #if or #ifdef. #pragma is a pre-defined processing directive with parameters. A common example is #pragma pack(1), but using it will cause the entire file to be aligned according to the set byte alignment. Using push and pop can solve this issue, as shown in the code below:

#pragma pack(push)#pragma pack(1)struct TestA{char i;int b;}A;#pragma pack(pop); // Note to call pop, otherwise the subsequent file will be aligned according to the pack definition value, leading to unexpected execution// equivalent to struct _TestB{char i;int b; }__attribute__((packed))A;

Conclusion Embedded C language provides developers with ample freedom in handling hardware physical addresses, bit operations, and memory access. Through arrays, pointers, and type casting techniques, the data processing copy process can be effectively reduced, which is necessary for low-level programming and facilitates the development of the entire architecture. For any embedded C language developer, mastering these fundamental knowledge points is essential.

Leave a Comment