How to excel in embedded systems? This question, when asked, will often receive the answer: master the C language! Today, I recommend a summary of key points in embedded C language written by an expert, which is definitely worth reading.
From a syntactical perspective, C language is not complex, but writing high-quality and reliable embedded C programs is not easy. It requires not only a deep understanding of hardware characteristics and limitations but also a certain level of knowledge about compilation principles and computer technology. This article is based on embedded practice and combines relevant materials to explain the C language knowledge and key points that one needs to understand in embedded systems, hoping that everyone who reads this article can gain something.
01
Keywords
Keywords are reserved identifiers in C language with special functions, which can be categorized by their functionality as follows:
- Data types (commonly used: char, short, int, long, unsigned, float, double)
- Operations and expressions ( =, +, -, *, while, do-while, if, goto, switch-case)
- Data storage (auto, static, extern, const, register, volatile, restricted)
- Structures (struct, enum, union, typedef)
- Bitwise operations and logical operations (<<, >>, &, |, ~, ^, &&)
- Preprocessing (#define, #include, #error, #if…#elif…#else…#endif, etc.)
- Platform extension keywords (__asm, __inline, __syscall)
These keywords together constitute the C syntax for embedded platforms.The application of embedded systems can logically be abstracted into three parts:
- Data input (e.g., sensors, signals, interface inputs)
- Data processing (e.g., protocol decoding and packaging, conversion of AD sampling values, etc.)
- Data output (e.g., GUI display, output pin states, DA output control voltage, PWM duty cycle, etc.)
Data management runs through the entire development of embedded applications, encompassing data types, memory management, bit and logical operations, and data structures. The C language syntactically supports the implementation of these functions and provides corresponding optimization mechanisms to cope with the more constrained resource environment in embedded systems.
02
Data Types
The C language supports common variable types such as character, integer, and floating-point variables. Some compilers, like Keil, also extend support for bit (bit) and sfr (register) data types to meet special address operations. The C language only specifies the minimum value range for each basic data type, so the same type may occupy different lengths of storage space on different chip platforms. This requires consideration of compatibility for future portability during code implementation, and the typedef provided by C language is the key to handling this situation, commonly adopted in most cross-platform software projects, as shown below:
typedef unsigned char uint8_t;typedef unsigned short uint16_t;typedef unsigned int uint32_t;......typedef signed int int32_t;
Since the basic data widths differ across platforms, determining the width of the current platform’s basic data type, such as int, requires the use of the sizeof interface provided by C language, implemented as follows.
printf("int size:%d, short size:%d, char size:%d\n", sizeof(int), sizeof(char), sizeof(short));
Another important knowledge point is the width of pointers, such as
char *p;printf("point p size:%d\n", sizeof(p));
This is actually related to the chip’s addressable width, for example, a 32-bit MCU has a width of 4, while a 64-bit MCU has a width of 8. Sometimes this is also a simple way to check the MCU’s bit width.
03
Memory Management and Storage Architecture
The C language allows program variables to determine memory addresses at the time of definition, implementing a fine-grained processing mechanism through scope and the extern/static keywords. According to different hardware areas, there are three ways to allocate memory (excerpted from C++ high-quality programming):
- Allocation from static storage area. Memory is allocated at the time of program compilation, and this memory exists throughout the program’s runtime. For example, global variables and static variables.
- Creating on the stack. When a function is executed, the storage units for local variables within the function can be created on the stack, and these storage units are automatically released when the function execution ends. Stack memory allocation operations are built into the processor’s instruction set, making them very efficient, but the allocated memory capacity is limited.
- Allocation from the heap, also known as dynamic memory allocation. The program can request any amount of memory using malloc or new during runtime, and the programmer is responsible for when to use free or delete to release the memory. The lifetime of dynamic memory is determined by the programmer, making it very flexible, but it also encounters the most problems.
Here is a simple C language example.
//main.c#include <stdio.h>#include <stdlib.h>static int st_val; // Static global variable -- static storage areaint ex_val; // Global variable -- static storage areaint main(void){ int a = 0; // Local variable -- allocated on stack int *ptr = NULL; // Pointer variable static int local_st_val = 0; // Static variable local_st_val += 1; a = local_st_val; ptr = (int *)malloc(sizeof(int)); // Allocate space from heap if(ptr != NULL) { printf("*p value:%d", *ptr); free(ptr); ptr = NULL; // After free, ptr needs to be set to NULL, otherwise it will lead to subsequent ptr validation failure, resulting in a dangling pointer } }
The scope of C language not only describes the area where identifiers are accessible but also specifies the storage area of variables. The variables st_val and ex_val in file scope are allocated to the static storage area, where the static keyword mainly limits whether the variable can be accessed by other files, while the variables a, ptr, and local_st_val in block scope are allocated to different areas based on their types. Among them, a is a local variable allocated on the stack, ptr as a pointer is allocated space by malloc — thus defined in the heap, while local_st_val is limited by the keyword, indicating allocation to the static storage area. This involves an important knowledge point: the meaning of static differs in file scope and block scope: in file scope, it is used to limit the external linkage of functions and variables (whether they can be accessed by other files), while in block scope, it is used to allocate variables to the static storage area.For C language, understanding the above knowledge is generally sufficient for memory management, but for embedded C, defining a variable does not necessarily mean it is in memory (SRAM); it may also be in FLASH space or directly stored in registers (variables defined as register or some local variables under high optimization levels). For example, global variables defined as const are defined in FLASH, while local variables defined as register may be optimized to be placed directly in general-purpose registers. Understanding this part of knowledge is very meaningful for code maintenance when optimizing for speed or dealing with limited storage.In addition, embedded C language compilers will extend memory management mechanisms, such as supporting scatter loading mechanisms and __attribute__((section(“user-defined area”))), allowing variables to be specified to be stored in special areas like (SDRAM, SQI FLASH). This enhances memory management to adapt to complex application environment scenarios and requirements.
LD_ROM 0x00800000 0x10000 { ;load region size_regionEX_ROM 0x00800000 0x10000 { ;load address = execution address*.o (RESET, +First)*(InRoot$$Sections).ANY (+RO)}EX_RAM 0x20000000 0xC000 { ;rw Data.ANY (+RW +ZI)}EX_RAM1 0x2000C000 0x2000 {.ANY(MySection)}EX_RAM2 0x40000000 0x20000{.ANY(Sdram)}int a[10] __attribute__((section("Mysection"));int b[100] __attribute__((section("Sdram"));
By adopting this method, we can specify variables to the required areas, which is necessary in certain cases: for example, when doing GUI or web applications that require storing a large number of images and documents, the internal FLASH space may be insufficient, and in such cases, variables can be declared in external areas. Additionally, some data in memory may be critical, and to avoid being overwritten by other content, it may be necessary to allocate a separate SRAM area to prevent fatal errors caused by unintended modifications. These experiences are commonly used and important in actual product development.As for heap usage, for embedded Linux, it is consistent with standard C language usage, noting to check after malloc, and remember to set to NULL after freeing to avoid “dangling pointers.” However, for resource-constrained microcontrollers, the use of malloc is generally rare. If frequent memory block requests are needed, a memory management mechanism based on static storage and memory block segmentation is often constructed: on one hand, efficiency is higher (using fixed-size blocks pre-segmented, directly looking up numbers during use), and on the other hand, the use of memory blocks is controllable, effectively avoiding memory fragmentation issues. Common examples include RTOS and network LWIP, which adopt this mechanism.
04
Pointers and Arrays
Arrays and pointers are often the main causes of program bugs, such as array out-of-bounds, pointer out-of-bounds, illegal address access, and unaligned access. These issues often have the shadow of pointers and arrays behind them, so understanding and mastering pointers and arrays is a necessary path to becoming a qualified C language developer.An array consists of elements of the same type, and when it is declared, the compiler allocates a segment of space in memory based on the characteristics of the internal elements. Additionally, C language provides multi-dimensional arrays to meet the needs of special scenarios, while pointers provide a symbolic method for using addresses, which only makes sense when pointing to a specific address. C language pointers have the greatest flexibility, as they can point to any address before being accessed, greatly facilitating hardware operations, but at the same time, it places higher demands on developers. Refer to the following code.
int main(void){char cval[] = "hello";int i;int ival[] = {1, 2, 3, 4};int arr_val[][2] = {{1, 2}, {3, 4}};const char *pconst = "hello";char *p;int *pi;int *pa;int **par; p = cval; p++; //addr increases by 1 pi = ival; pi+=1; //addr increases by 4 pa = arr_val[0]; pa+=1; //addr increases by 4 par = arr_val; par++; //addr increases by 8for(i=0; i<sizeof(cval); i++) {printf("%d ", cval[i]); }printf("\n");printf("pconst:%s\n", pconst);printf("addr:%d, %d\n", cval, p);printf("addr:%d, %d\n", ival, pi);printf("addr:%d, %d\n", arr_val, pa);printf("addr:%d, %d\n", arr_val, par);}/* PC side 64-bit system running result0x68 0x65 0x6c 0x6c 0x6f 0x0pconst:helloaddr:6421994, 6421995addr:6421968, 6421972addr:6421936, 6421940addr:6421936, 6421944 */
For arrays, values are generally accessed starting from 0, with length-1 as the endpoint, using the half-open interval [0, length). This generally does not cause issues, but sometimes, when we need to read the array in reverse, we may mistakenly use length as the starting point, leading to out-of-bounds access. Additionally, when operating on arrays, sometimes to save space, the index variable i is defined as an unsigned char type, and since the range of unsigned char in C language is 0-255, if the array is large, it may lead to an inability to terminate when exceeding the array, resulting in an infinite loop. This is easy to avoid during the initial code construction, but if requirements change later and the array size is increased, there may be hidden dangers in other places using the array, which requires special attention.As mentioned earlier, the space occupied by pointers is related to the chip’s addressing width, with 32-bit platforms being 4 bytes and 64-bit being 8 bytes. The length in pointer arithmetic is also related to its type, such as char type being 1, int type being 4. If you carefully observe the above code, you will find that the value of par increases by 8, which is because it points to a pointer, corresponding to the variable being a pointer, meaning the length is the length of the pointer type, which is 8 on a 64-bit platform and 4 on a 32-bit platform. Understanding these characteristics is not difficult, but slight negligence in engineering applications can lead to hard-to-detect issues. Additionally, pointers also support type casting, which can be quite useful in certain situations, as shown in the following code:
#include <stdio.h>typedef struct{int b;int a;}STRUCT_VAL;static __align(4) char arr[8] = {0x12, 0x23, 0x34, 0x45, 0x56, 0x12, 0x24, 0x53};int main(void){ STRUCT_VAL *pval;int *ptr; pval = (STRUCT_VAL *)arr; ptr = (int *)&arr[4];printf("val:%d, %d", pval->a, pval->b);printf("val:%d,", *ptr);}//0x45342312 0x53241256//0x53241256
Based on pointer type casting, efficient and quick solutions to data parsing issues are achieved in protocol parsing and data storage management. However, common and easily mistaken issues arise during the processing of data alignment and endianness. For example, the above character array arr is forcibly defined as 4-byte aligned using __align(4), which is necessary to ensure that subsequent accesses as int pointers do not trigger unaligned access exceptions. If not forcibly defined, char is aligned by default to 1 byte, and this does not necessarily trigger exceptions (it depends on the overall memory layout of arr‘s address and whether the actual used space supports unaligned access, as some SDRAM may trigger exceptions when using unaligned access). This can lead to the addition or subtraction of other variables, potentially triggering such exceptions, and the place where exceptions occur is often unrelated to the added variables. Moreover, code that runs normally on some platforms may trigger exceptions when switching platforms. This kind of hidden phenomenon is very difficult to trace and resolve in embedded systems. Additionally, C language pointers have a special usage of accessing specific physical addresses through type casting and implementing callbacks through function pointers, as shown below:
#include <stdio.h>typedef int (*pfunc)(int, int);int func_add(int a, int b){return a+b;}int main(void){ pfunc *func_ptr; *(volatile uint32_t *)0x20001000 = 0x01a23131; func_ptr = func_add;printf("%d\n", func_ptr(1, 2));}
Here, it should be noted that volatile refers to variables that can change, generally used in the following situations:
- Hardware registers of parallel devices (e.g., status registers)
- Non-automatic variables accessed in an interrupt service routine
- Variables shared by several tasks in multi-threaded applications
Using volatile can solve synchronization issues when user mode and exception interrupts access the same variable. Additionally, when accessing hardware addresses, volatile prevents optimization of address access, ensuring access to the actual address. Mastery of volatile usage is crucial in embedded low-level programming and is one of the basic requirements for embedded C practitioners.Function pointers are not commonly used in general embedded software development, but they can implement many applications in important implementations such as asynchronous callbacks and driver modules in a simple way.
05
Structure Types and Alignment
The C language provides custom data types to describe a class of transactions with the same characteristics, mainly supporting structures, enumerations, and unions. Among them, enumerations limit data access through aliases, making data more intuitive and readable, implemented as follows:
typedef enum {spring=1, summer, autumn, winter }season;season s1 = summer;
Unions are data types that can store different types of data in the same storage space. The size of a union is determined by the variable that occupies the most space, as shown below:
typedef union{ char c; short s; int i; }UNION_VAL;UNION_VAL val; int main(void){ printf("addr:0x%x, 0x%x, 0x%x\n", (int)(&val.c), (int)(&val.s), (int)(&val.i)); val.i = 0x12345678; if(val.s == 0x5678) printf("Little-endian mode\n"); elseprintf("Big-endian mode\n"); } /*addr:0x407970, 0x407970, 0x407970 Little-endian mode*/
The main use of unions is to access internal segments of data by sharing memory addresses, providing a more convenient way to parse certain variables. Additionally, testing the endianness of chips is a common application of unions. Of course, using pointer type casting can also achieve this purpose, implemented as follows:
int data = 0x12345678; short *pdata = (short *)&data; if(*pdata == 0x5678) printf("%s\n", "Little-endian mode"); else printf("%s\n", "Big-endian mode");
It can be seen that using unions can avoid the misuse of pointers in certain situations.Structures are collections of variables with common characteristics. Compared to C++ classes, they do not have access restrictions and do not support direct internal functions. However, through custom data types and function pointers, many class-like operations can still be achieved. For most embedded projects, structured data processing greatly facilitates overall architecture optimization and later maintenance, as illustrated below:
typedef int (*pfunc)(int, int); typedef struct{int num; int profit; pfunc get_total; }STRUCT_VAL;int GetTotalProfit(int a, int b){ return a*b; } int main(void){ STRUCT_VAL Val; STRUCT_VAL *pVal; Val.get_total = GetTotalProfit; Val.num = 1; Val.profit = 10; printf("Total:%d\n", Val.get_total(Val.num, Val.profit)); //Variable access pVal = &Val; printf("Total:%d\n", pVal->get_total(pVal->num, pVal->profit)); //Pointer access } /* Total:10 Total:10 */
The C language structure supports access through pointers and variables, allowing arbitrary memory data to be parsed (as previously mentioned, parsing protocols through pointer type casting). Additionally, packaging data and function pointers is an important foundation for implementing driver layer interface switching through pointer passing, which has significant practical implications. Furthermore, using bit fields, unions, and structures can achieve another form of bit manipulation, which is crucial for encapsulating low-level hardware registers, as demonstrated in practice:
typedef unsigned char uint8_t; union reg{ struct{uint8_t bit0:1; uint8_t bit1:1; uint8_t bit2_6:5; uint8_t bit7:1; }bit; uint8_t all; }; int main(void){ union reg RegData; RegData.all = 0; RegData.bit.bit0 = 1; RegData.bit.bit7 = 1; printf("0x%x\n", RegData.all); RegData.bit.bit2_6 = 0x3; printf("0x%x\n", RegData.all); } /* 0x81 0x8d*/
Through unions and bit field operations, access to bits within data can be achieved, providing a simple and intuitive processing method for registers and memory-constrained platforms. Additionally, another important knowledge point regarding structures is alignment. By accessing aligned data, running efficiency can be significantly improved, but the storage length issues introduced by alignment are also prone to errors. Understanding alignment can be categorized as follows:
- Basic data types: aligned by their default lengths, such as char aligned to 1 byte, short aligned to 2 bytes, etc.
- Arrays: aligned according to basic data types; if the first is aligned, the subsequent ones will naturally be aligned as well.
- Unions: aligned according to the length of the largest data type it contains.
- Structures: each data type within a structure must be aligned, and the structure itself is aligned to the length of the largest internal data type.
union DATA{ int a; char b; }; struct BUFFER0{union DATA data; char a; //reserved[3] int b; short s; //reserved[2] }; //16 bytes struct BUFFER1{char a; //reserved[0] short s; union DATA data; int b; };//12 bytes int main(void){ struct BUFFER0 buf0;struct BUFFER1 buf1;printf("size:%d, %d\n", sizeof(buf0), sizeof(buf1)); printf("addr:0x%x, 0x%x, 0x%x, 0x%x\n", (int)(&buf0.data), (int)(&buf0.a), (int)(&buf0.b), (int)(&buf0.s)); printf("addr:0x%x, 0x%x, 0x%x, 0x%x\n", (int)(&buf1.a), (int)(&buf1.s), (int)(&buf1.data), (int)(&buf1.b)); } /* size:16, 12 addr:0x61fe10, 0x61fe14, 0x61fe18, 0x61fe1c addr:0x61fe04, 0x61fe06, 0x61fe08, 0x61fe0c */
Among them, the size of the union is consistent with the largest internal variable, which is 4 bytes. Based on the read values, it can be determined that the actual memory layout and padding positions are consistent. In fact, learning to understand C language alignment mechanisms through padding is an effective and quick way.
06
Preprocessing Mechanisms
The C language provides rich preprocessing mechanisms that facilitate cross-platform code implementation. Additionally, the macro mechanism in C language for data, code block replacement, string formatting, and code segment switching has significant importance for engineering applications. Below, common preprocessing mechanisms in C language are described according to functional requirements.#include command for including files in C language effectively inserts all content from the included file into the current position. This includes not only header files, some parameter files, and configuration files but can also insert the file into specified positions in the current code. The <> and “” indicate whether to start searching from the standard library path or user-defined path.#define macro definition, commonly used to define constants or code segment aliases. In certain cases, it can also be used with ## to format strings, achieving unified processing of interfaces, as shown in the example below:
#define MAX_SIZE 10#define MODULE_ON 1#define ERROR_LOOP() do{
printf("error loop\n");
}while(0);#define global(val) g_##valint global(v) = 10;int global(add)(int a, int b){return a+b;}
#if..#elif…#else…#endif, #ifdef..#endif, #ifndef…#endif conditional selection judgments are mainly used to switch code blocks. In comprehensive and cross-platform projects, they are often used to meet various requirements.#undef cancels defined parameters to avoid redefinition issues.#error and #warning are used for user-defined warning messages, which can be limited by using #if and #ifdef to restrict erroneous predefined configurations.#pragma is a pre-defined processing directive with parameters. A common one is #pragma pack(1), but using it will cause the entire subsequent file to align according to the set byte alignment. Using push and pop can solve this issue, as shown in the code below:
#pragma pack(push)#pragma pack(1)struct TestA{char i;int b;}A;#pragma pack(pop); //Note to call pop, otherwise the subsequent file will be aligned according to the pack defined value, leading to unexpected execution, equivalent to struct _TestB{char i;int b; }__attribute__((packed))A;
07
Summary
If you have read this far, you should have a clearer understanding of the C language. Embedded C language provides developers with ample freedom in handling hardware physical addresses, bit operations, and memory access. Through arrays, pointers, and type casting techniques, data processing can effectively reduce copying processes, which is necessary for low-level programming and facilitates the development of the entire architecture. However, the freedom brought by this can lead to illegal access, overflow, out-of-bounds issues, and alignment, data width, and endianness problems across different hardware platforms. While functional designers can generally handle these issues, if the original design does not consider these problems clearly, it often represents issues and troubles for those who take over the project later. Therefore, for any embedded C practitioner, it is essential to have a clear grasp of these foundational knowledge and necessities.At this point, the preliminary summary of embedded C language comes to an end, but the key points and difficulties in the application of embedded C language are not limited to these. For example, embedded C language supports inline assembly, reliability implementation in communication, data storage verification, and integrity assurance. These engineering applications and techniques are difficult to explain simply.
‧‧‧‧‧‧‧‧‧‧‧‧‧‧‧‧ END ‧‧‧‧‧‧‧‧‧‧‧‧‧‧‧

👆If you wish to meet often, feel free to star 🌟 and bookmark it!
If you like our content, please click “Looking” and share it with your friends!


↓↓↓↓ Click Read the original text to see more news