Introduction: How to excel in embedded systems? This question, when asked, will invariably lead to the answer:Master the C language! Today, I recommend a comprehensive summary of embedded C language knowledge points written by an expert, which is definitely worth reading.From a syntactical perspective, C language is not complex, but writing high-quality and reliable embedded C programs is not an easy task. It requires not only a deep understanding of hardware characteristics and limitations but also a certain level of knowledge about compilation principles and computer technology.This article is based on embedded practice and combines relevant materials to elaborate on the C language knowledge and key points that one needs to understand in embedded systems. I hope that everyone who reads this article can gain something valuable.1 KeywordsKeywords are reserved identifiers in C language that have special functions. They can be categorized by functionality into:1). Data types (commonly used: char, short, int, long, unsigned, float, double)2). Operations and expressions ( =, +, -, *, while, do-while, if, goto, switch-case)3). Data storage (auto, static, extern, const, register, volatile, restricted)4). Structures (struct, enum, union, typedef),5). Bitwise operations and logical operations (<<, >>, &, |, ~, ^, &&)6). Preprocessing (#define, #include, #error, #if…#elif…#else…#endif, etc.)7). Platform-specific keywords (__asm, __inline, __syscall)These keywords together constitute the C syntax for embedded platforms.The applications of embedded systems can logically be abstracted into three parts:1). Data input (such as sensors, signals, interface inputs),2). Data processing (such as protocol decoding and packaging, conversion of AD sampling values, etc.)3). Data output (GUI display, output pin states, DA output control voltage, PWM duty cycle, etc.).Data management runs through the entire development of embedded applications, encompassing data types, memory management, bit and logical operations, and data structures. The C language syntactically supports the implementation of these functions and provides corresponding optimization mechanisms to cope with the more constrained resource environments in embedded systems.2 Data TypesThe C language supports commonly used variable types such ascharacter, integer, and floating-point variables. Some compilers, like Keil, also extend support forbit and sfr (special function registers) data types to meet specific address operations. The C language only specifies the minimum value range for each basic data type, so the same type may occupy different lengths of storage space on different chip platforms. This requires consideration of compatibility for future portability during code implementation, and the typedef provided by C language is a key keyword for handling this situation, commonly adopted in most cross-platform software projects, as shown below:
typedef unsigned char uint8_t;typedef unsigned short uint16_t;typedef unsigned int uint32_t;......typedef signed int int32_t;
Since the basic data widths differ across platforms, determining the width of the current platform’s basic data type, such as int, requires the use of the sizeof interface provided by C language, implemented as follows.
printf("int size:%d, short size:%d, char size:%d\n", sizeof(int), sizeof(char), sizeof(short));
Another important knowledge point is the width of pointers, such as
char *p;printf("point p size:%d\n", sizeof(p));
This is actually related to the chip’s addressable width; for example, a 32-bit MCU has a width of 4, while a 64-bit MCU has a width of 8. Sometimes, this is also a simple way to check the MCU’s bit width.3 Memory Management and Storage ArchitectureThe C language allows program variables to determine memory addresses at the time of definition, implementing a fine-grained processing mechanism through scope and the keywords extern and static. According to different hardware regions, memory allocation can be categorized into three methods (excerpted from C++ High-Quality Programming):1). Static storage allocation. Memory is allocated at compile time and exists throughout the program’s execution. For example, global variables and static variables.2). Stack allocation. When a function is executed, the storage units for local variables within the function can be created on the stack, and these storage units are automatically released when the function execution ends. Stack memory allocation operations are built into the processor’s instruction set, making them highly efficient, but the allocated memory capacity is limited.3). Heap allocation, also known as dynamic memory allocation. The program can request any amount of memory at runtime using malloc or new, and the programmer is responsible for when to use free or delete to release the memory. The lifetime of dynamic memory is determined by the programmer, making it very flexible, but it also encounters the most issues.Here is a simple C language example.
//main.c#include <stdio.h>#include <stdlib.h>static int st_val; // Static global variable -- static storage int ex_val; // Global variable -- static storage int main(void){ int a = 0; // Local variable -- allocated on stack int *ptr = NULL; // Pointer variable static int local_st_val = 0; // Static variable local_st_val += 1; a = local_st_val; ptr = (int *)malloc(sizeof(int)); // Allocate space from heap if(ptr != NULL) { printf("*p value:%d", *ptr); free(ptr); ptr = NULL; // After free, ptr needs to be set to NULL to avoid subsequent ptr validation failure, leading to dangling pointer } }
The scope of C language not only describes the area where identifiers are accessible but also specifies the storage area of variables. The variables in file scope,st_val and ex_val, are allocated in the static storage area, where the static keyword mainly limits whether the variable can be accessed by other files. In contrast, the variables a, ptr, and local_st_val in block scope are allocated in different areas based on their types, where a is a local variable allocated on thestack, ptr is a pointer allocated in the heap by malloc, and local_st_val is limited by the keyword, indicating allocation in thestatic storage area. This involves an important knowledge point: the meaning of static differs in file scope and block scope:In file scope, it limits the external linkage of functions and variables (whether they can be accessed by other files), while in block scope, it is used to allocate variables in the static storage area.For C language, understanding the above knowledge is generally sufficient for memory management. However, for embedded C, defining a variable does not necessarily mean it is in memory (SRAM); it could also be in FLASH space or directly stored in registers (variables defined as register or some local variables under high optimization levels). For example, global variables defined as const are defined in FLASH, while local variables defined as register may be optimized to be placed directly in general-purpose registers. Understanding this part of knowledge is significant for code maintenance, especially when optimizing for speed or dealing with limited storage. Additionally, embedded C language compilers may extend memory management mechanisms, such as supporting scatter loading mechanisms and __attribute__((section(“user-defined area”))), allowing variables to be stored in special areas like SDRAM or SQI FLASH. This enhances memory management to adapt to complex application scenarios and requirements.
LD_ROM 0x00800000 0x10000 { ;load region size_regionEX_ROM 0x00800000 0x10000 { ;load address = execution address*.o (RESET, +First)*(InRoot$$Sections).ANY (+RO)}EX_RAM 0x20000000 0xC000 { ;rw Data.ANY (+RW +ZI)}EX_RAM1 0x2000C000 0x2000 {.ANY(MySection)}EX_RAM2 0x40000000 0x20000{.ANY(Sdram)}}int a[10] __attribute__((section("Mysection"));int b[100] __attribute__((section("Sdram"));
By adopting this method, we can specify variables to the required areas, which is sometimes necessary, such as when developing GUIs or web pages that require storing a large number of images and documents, where internal FLASH space may be insufficient. In such cases, variables can be declared in external areas. Additionally, certain data in memory may be critical, and to avoid being overwritten by other content, it may be necessary to allocate a separate SRAM area to prevent fatal errors due to unintended modifications. These experiences are commonly used and important in actual product development, but due to space constraints, only brief examples are provided here. If you encounter such needs in your work, it is advisable to study them in detail.As for heap usage, for embedded Linux, it operates the same as standard C language, with attention to checking after malloc and remembering to set to NULL after freeing to avoid “dangling pointers.” However, for resource-constrained microcontrollers, the use of malloc is generally rare. If frequent memory block requests are needed, a memory management mechanism based on static storage and memory block segmentation is often constructed. This approach is more efficient (using fixed-size blocks pre-segmented for direct lookup during use) and allows for controlled memory block usage, effectively avoiding memory fragmentation issues. Common examples include RTOS and network LWIP, which adopt this mechanism. I personally prefer this method, so details about heap specifics are not described here. For those interested, you can refer to
int main(void){char cval[] = "hello";int i;int ival[] = {1, 2, 3, 4};int arr_val[][2] = {{1, 2}, {3, 4}};const char *pconst = "hello";char *p;int *pi;int *pa;int **par; p = cval; p++; //addr increases by 1 pi = ival; pi+=1; //addr increases by 4 pa = arr_val[0]; pa+=1; //addr increases by 4 par = arr_val; par++; //addr increases by 8for(i=0; i<sizeof(cval); i++) {printf("%d ", cval[i]); }printf("\n");printf("pconst:%s\n", pconst);printf("addr:%d, %d\n", cval, p);printf("addr:%d, %d\n", ival, pi);printf("addr:%d, %d\n", arr_val, pa);printf("addr:%d, %d\n", arr_val, par);}/* PC side 64-bit system running result 0x68 0x65 0x6c 0x6c 0x6f 0x0pconst:helloaddr:6421994, 6421995addr:6421968, 6421972addr:6421936, 6421940addr:6421936, 6421944 */
For arrays, values are generally accessed starting from 0, with length-1 as the endpoint, using the half-open interval [0, length). This generally does not cause issues, but sometimes, when we need to read the array in reverse, we may mistakenly use length as the starting point, leading to out-of-bounds access. Additionally, when operating on arrays, sometimes to save space, the index variable i is defined as unsigned char type, and since the range of unsigned char in C language is 0-255, if the array is large, it may lead to an inability to terminate when exceeding the array, resulting in an infinite loop. This is easy to avoid during the initial code construction, but later changes to increase the array size may introduce hidden risks in other parts of the code that use the array, requiring special attention.As mentioned earlier, the space occupied by pointers is related to the chip’s addressing width; 32-bit platforms have 4 bytes, while 64-bit platforms have 8 bytes. The length of pointer arithmetic is also related to its type, such as char type being 1 and int type being 4.If you carefully observe the above code, you will find that the value of par increased by 8 because it points to a pointer, corresponding to the variable being a pointer, meaning its length is that of the pointer type, which is 8 on a 64-bit platform and 4 on a 32-bit platform.These concepts are not difficult to understand, but slight carelessness in engineering applications can lead to subtle issues. Additionally, pointers support type casting, which can be quite useful in certain situations. Refer to the following code:
#include <stdio.h>typedef struct{int b;int a;}STRUCT_VAL;static __align(4) char arr[8] = {0x12, 0x23, 0x34, 0x45, 0x56, 0x12, 0x24, 0x53};int main(void){ STRUCT_VAL *pval;int *ptr; pval = (STRUCT_VAL *)arr; ptr = (int *)&arr[4];printf("val:%d, %d", pval->a, pval->b);printf("val:%d,", *ptr);}//0x45342312 0x53241256//0x53241256
Based on pointer type casting, efficient and quick solutions to data parsing issues are achieved in protocol parsing and data storage management. However, common and easily mistaken issues arise during the process, such as data alignment and endianness. For example, the above character array arr is forcibly defined as 4-byte aligned using __align(4), ensuring that subsequent access via int pointer does not trigger unaligned access exceptions. If not forcibly defined, char is aligned by default to 1 byte, which does not necessarily trigger exceptions (it depends on the overall memory layout of arr’s address and whether the actual used space supports unaligned access, as some SDRAM may trigger exceptions during unaligned access). This can lead to unexpected variable increments or decrements, potentially triggering exceptions, and the location of the exception often has no relation to the added variable. Moreover, code that runs normally on some platforms may trigger exceptions when switching platforms. Such hidden phenomena are difficult to trace and resolve in embedded systems. Additionally, C language pointers have a special usage for accessing specific physical addresses through type casting and implementing callbacks via function pointers, as shown below:
#include <stdio.h>typedef int (*pfunc)(int, int);int func_add(int a, int b){return a+b;}int main(void){ pfunc *func_ptr; *(volatile uint32_t *)0x20001000 = 0x01a23131; func_ptr = func_add;printf("%d\n", func_ptr(1, 2));}
Here, it is important to note that volatile refers to variables that can change unexpectedly, typically used in the following situations:1) Hardware registers of parallel devices (e.g., status registers)2) Non-automatic variables accessed in an interrupt service routine3) Variables shared among multiple tasks in multi-threaded applicationsVolatile can resolve synchronization issues when user mode and exception interrupts access the same variable. Additionally, when accessing hardware addresses, volatile prevents address access optimization, ensuring access to the actual address. Mastering the use of volatile is crucial in embedded low-level programming and is a fundamental requirement for embedded C practitioners. Function pointers are not commonly used in general embedded software development, but they can implement many applications, such as asynchronous callbacks and driver modules, in a simple manner. Of course, I can only scratch the surface here; many detailed knowledge points are worth exploring and mastering.5 Structure Types and AlignmentThe C language provides custom data types to describe a class of transactions with the same characteristics, mainly supporting structures, enumerations, and unions. Among them, enumerations limit data access through aliases, making data more intuitive and readable, implemented as follows:
typedef enum {spring=1, summer, autumn, winter }season;season s1 = summer;
Unions are data types that can store different types of data in the same storage space. The size of a union is determined by the largest variable it contains, as shown below:
typedef union{ char c; short s; int i; }UNION_VAL;UNION_VAL val; int main(void){ printf("addr:0x%x, 0x%x, 0x%x\n", (int)(&(val.c)), (int)(&(val.s)), (int)(&(val.i))); val.i = 0x12345678; if(val.s == 0x5678) printf("Little-endian\n"); elseprintf("Big-endian\n"); } /*addr:0x407970, 0x407970, 0x407970 Little-endian*/
The main use of unions is to access different segments of data through shared memory addresses, providing a more convenient way to parse certain variables. Additionally, testing the endianness of chips is a common application of unions. Of course, using pointer type casting can also achieve this purpose, implemented as follows:
int data = 0x12345678; short *pdata = (short *)&data; if(*pdata = 0x5678) printf("%s\n", "Little-endian"); else printf("%s\n", "Big-endian");
It can be seen that using unions can avoid the misuse of pointers in certain situations.Structures are collections of variables with common characteristics. Compared to C++ classes, they do not have access restrictions and do not support direct internal functions, but through custom data types and function pointers, many class-like operations can still be achieved. For most embedded projects, structured data processing greatly facilitates overall architecture optimization and later maintenance. Here is an example:
typedef int (*pfunc)(int, int); typedef struct{int num; int profit; pfunc get_total; }STRUCT_VAL;int GetTotalProfit(int a, int b){ return a*b; } int main(void){ STRUCT_VAL Val; STRUCT_VAL *pVal; Val.get_total = GetTotalProfit; Val.num = 1; Val.profit = 10; printf("Total:%d\n", Val.get_total(Val.num, Val.profit)); //Variable access pVal = &Val; printf("Total:%d\n", pVal->get_total(pVal->num, pVal->profit)); //Pointer access } /* Total:10 Total:10 */
The C language structure supports access through pointers and variables, allowing for the parsing of any memory data (as previously mentioned, parsing protocols through pointer type casting). Additionally, packaging data and function pointers for passing via pointers is a fundamental basis for implementing driver layer interface switching, which has significant practical implications. Furthermore, using bit fields, unions, and structures can achieve another form of bit manipulation, which is important for encapsulating low-level hardware registers, as demonstrated in practice:
typedef unsigned char uint8_t; union reg{ struct{uint8_t bit0:1; uint8_t bit1:1; uint8_t bit2_6:5; uint8_t bit7:1; }bit; uint8_t all; }; int main(void){ union reg RegData; RegData.all = 0; RegData.bit.bit0 = 1; RegData.bit.bit7 = 1; printf("0x%x\n", RegData.all); RegData.bit.bit2_6 = 0x3; printf("0x%x\n", RegData.all); } /* 0x81 0x8d*/
Through unions and bit field operations, we can access bits of data, providing a simple and intuitive processing method for registers and memory-constrained platforms. Another important knowledge point regarding structures is alignment. Accessing aligned data can significantly improve execution efficiency, but the storage length issues introduced by alignment can also lead to errors. The understanding of alignment can be categorized as follows:Basic data types: aligned by their default lengths, such as char aligned to 1 byte, short to 2 bytes, etc.Arrays: aligned according to basic data types; once the first is aligned, the subsequent ones are naturally aligned.Unions: aligned according to the length of the largest data type it contains.Structures: each data type within a structure must be aligned, and the structure itself is aligned to the length of the largest internal data type.
union DATA{ int a; char b; }; struct BUFFER0{union DATA data; char a; //reserved[3] int b; short s; //reserved[2] }; //16 bytes struct BUFFER1{char a; //reserved[0] short s; union DATA data; int b; };//12 bytes int main(void){ struct BUFFER0 buf0;struct BUFFER1 buf1;printf("size:%d, %d\n", sizeof(buf0), sizeof(buf1)); printf("addr:0x%x, 0x%x, 0x%x, 0x%x\n", (int)&(buf0.data), (int)&(buf0.a), (int)&(buf0.b), (int)&(buf0.s)); printf("addr:0x%x, 0x%x, 0x%x, 0x%x\n", (int)&(buf1.a), (int)&(buf1.s), (int)&(buf1.data), (int)&(buf1.b)); } /* size:16, 12 addr:0x61fe10, 0x61fe14, 0x61fe18, 0x61fe1c addr:0x61fe04, 0x61fe06, 0x61fe08, 0x61fe0c */
In this case, the size of the union is consistent with the largest variable, int, which is 4 bytes. By reading the values, we can see that the actual memory layout and padding positions are consistent. In fact, learning to understand C language alignment mechanisms through padding is an effective and quick way.6 Preprocessing MechanismsThe C language provides a rich set of preprocessing mechanisms that facilitate cross-platform code implementation. Additionally, the C language uses macro mechanisms for data and code block replacement, string formatting, and code segment switching, which are of great significance for engineering applications. Below, I describe commonly used preprocessing mechanisms in C language according to functional requirements.#include file inclusion command. In C language, it effectively inserts all content from the included file into the current position. This includes not only header files but also parameter files and configuration files, which can be inserted at specified positions in the current code. The <> and “” indicate whether to start searching from the standard library path or user-defined path.#define macro definition, commonly used to define constants or code segment aliases. In certain cases, it can also be used with ## to format strings for unified interface processing, as shown below:
#define MAX_SIZE 10#define MODULE_ON 1#define ERROR_LOOP() do{
printf("error loop\n");
}while(0);#define global(val) g_##valint global(v) = 10;int global(add)(int a, int b){return a+b;}
#if..#elif…#else…#endif, #ifdef..#endif, #ifndef…#endif conditional selection judgments. Conditional selection is mainly used to switch code blocks, often used in comprehensive and cross-platform projects to meet various needs.#undef cancels defined parameters to avoid redefinition issues.#error and #warning are used for user-defined warning messages, which can be limited by pre-defined configurations when used with #if and #ifdef.#pragma is a pre-defined processing directive with parameters. A common example is #pragma pack(1), but using it will cause the entire file to be aligned according to the set byte alignment. Using push and pop can resolve this issue, as shown in the code below:
#pragma pack(push)#pragma pack(1)struct TestA{char i;int b;}A;#pragma pack(pop); //Note to call pop, otherwise the subsequent file will be aligned according to the pack definition, leading to unexpected execution results, equivalent to struct _TestB{char i;int b; }__attribute__((packed))A;
7 ConclusionIf you have read this far, you should have a clearer understanding of the C language. Embedded C language provides developers with ample freedom in handling hardware physical addresses, bit operations, and memory access. Through arrays, pointers, and type casting techniques, data processing can effectively reduce copying processes, which is necessary for low-level programming and facilitates the development of the entire architecture. However, the freedom brought by this also leads to illegal access, overflow, out-of-bounds issues, and alignment, data width, and endianness problems across different hardware platforms. While functional designers can generally handle these issues, for those who take over projects later, if the original design did not consider these problems clearly, it often represents issues and troubles. Therefore, for any embedded C practitioner, it is essential to have a clear grasp of these foundational knowledge points.With that said, this preliminary summary of embedded C language comes to an end. However, the key points and difficulties in the application of embedded C language are not limited to these. For instance, embedded C language supports inline assembly, reliability implementation in communication, data storage checks, and integrity guarantees. These engineering applications and techniques are difficult to explain simply. Additionally, techniques for locating and resolving issues after exceptions are also worth detailing. Due to space constraints and my own lack of clarity, I will stop here.