In China, very few friends in embedded programming are formally graduated from computer science majors; most come from automation or electronics-related fields. These individuals have rich practical experience but lack theoretical knowledge; a large portion of those who graduated from computer science end up working on online games or web applications that are independent of the operating system. They are also less willing to engage in the embedded industry, as this path is challenging. They possess solid theoretical knowledge but lack knowledge of circuits and other related areas, making it difficult for them to learn specific knowledge needed for embedded systems.
Looking at embedded issues from the perspective of PC programming is the first step; learning to apply embedded programming concepts is the second step; combining PC and embedded thinking and applying it to actual projects is the third step.
Although I have not conducted an industry survey, from what I have seen and the personnel I have recruited, engineers in the embedded industry either lack theoretical knowledge or practical experience. Rarely do we find individuals who possess both. The root cause lies in the problems with university education in China. I will not discuss this issue here to avoid unnecessary arguments. I want to list a few examples from my practice to draw attention to certain issues when working on embedded projects.

A colleague was developing a serial driver under uC/OS-II, where both the driver and interface encountered issues during testing. A communication program was developed in the application, and the serial driver provided a function to query the number of characters in the driver buffer: GetRxBuffCharNum(). The higher level requires a certain number of characters to be received before parsing the packet. The code written by a colleague can be represented in pseudocode as follows:
bExit = FALSE;
do {
if (GetRxBuffCharNum() >= 30)
bExit = ReadRxBuff(buff, GetRxBuffCharNum());
} while (!bExit);
This code checks if there are more than 30 characters in the current buffer, it will read all characters from the buffer until successful. The logic is clear, and the thought process is also straightforward. However, this code cannot function correctly. If it were on a PC, there would be no issues; it would work normally. But in embedded systems, the situation is uncertain. My colleague was frustrated and did not understand why. When he came to me for help, I asked him how GetRxBuffCharNum() was implemented. Upon inspection, I found:
unsigned GetRxBuffCharNum(void)
{
cpu_register reg;
unsigned num;
reg = interrupt_disable();
num = gRxBuffCharNum;
interrupt_enable(reg);
return (num);
}
It is evident that due to the loop, the area between interrupt_disable() and interrupt_enable() is a global critical section, ensuring the integrity of gRxBufCharNum. However, because the outer do { } while() loop frequently disables and enables interrupts, the time taken is very short. In fact, the CPU may not respond to the UART interrupt properly. This is related to the UART’s baud rate, the size of the hardware buffer, and the CPU’s speed. The baud rate we are using is very high, about 3Mbps. The start and stop signals of the UART occupy one bit each. One byte consumes 10 cycles. At 3Mbps, it takes about 3.3us to transmit one byte. How many CPU instructions can be executed in 3.3us? On a 100MHz ARM, about 150 instructions can be executed. How long does it take to disable interrupts? Generally, disabling interrupts on ARM requires more than 4 instructions, and enabling them requires over 4 instructions as well. The code for receiving the UART interrupt actually consists of more than 20 instructions. Therefore, there is a possibility of losing communication data, which manifests at the system level as unstable communication.
Modifying this code is actually quite simple; the easiest way is to modify it from the higher level. That is:
bExit = FALSE;
do {
DelayUs(20); // Delay 20us, generally implemented using empty loop instructions
num = GetRxBuffCharNum();
if (num >= 30)
bExit = ReadRxBuff(buff, num);
} while (!bExit);
This allows the CPU time to execute the interrupt code, thus avoiding the issues caused by frequently disabling interrupts, which leads to information loss. In embedded systems, most RTOS applications do not include serial drivers. When designing code, there is often insufficient consideration of the integration between the code and the kernel, leading to deeper issues. An RTOS is called an RTOS because of its rapid response to events; this rapid response relies on the CPU’s response speed to interrupts. Drivers in Linux systems are highly integrated with the kernel, running in kernel mode. Although RTOS cannot mimic the Linux structure, there are certain lessons to be learned.
From the above example, it is clear that embedded developers need to understand every aspect of the code.

A colleague was driving a 14094 serial-to-parallel chip. The serial signal was simulated using IO because there was no dedicated hardware. My colleague casually wrote a driver, but after debugging for 3-4 days, there were still issues. I couldn’t stand it anymore, so I took a look. The control of the parallel signal was sometimes normal and sometimes not. I examined the code, which can be roughly represented in pseudocode as:
for (i = 0; i < 8; i++)
{
SetData((data >> i) & 0x1);
SetClockHigh();
for (j = 0; j < 5; j++);
SetClockLow();
}
Data’s 8 bits are sent out sequentially from bit0 to bit7 on each high clock pulse. It should work normally. I couldn’t see where the problem was. After thinking carefully and checking the 14094 datasheet, I understood. The 14094 requires that the clock’s high level lasts for 10ns, and the low level must also last for 10ns. This code only accounts for the delay during the high level and does not account for the delay during the low level. If an interrupt occurs while the clock is low, this code might work. However, if the CPU does not execute during the low level when the interrupt occurs, it will not work properly. Hence, it is intermittent.
The modification is also quite simple:
for (i = 0; i < 8; i++)
{
SetData((data >> i) & 0x1);
SetClockHigh();
for (j = 0; j < 5; j++);
SetClockLow();
for (j = 0; j < 5; j++);
}
This now works perfectly. However, this code is still not very portable because if the compiler optimizes it, these two delay loops may be lost. If they are lost, it cannot guarantee that the high and low levels will last for 10ns, and it will not work correctly. Therefore, truly portable code should implement a nanosecond-level DelayNs(10) function.
Like Linux, at power-up, first measure how long it takes to execute a NOP instruction, how many NOP instructions are needed to execute for 10ns. Executing a certain number of NOP instructions will suffice. Use compiler directives or special keywords to prevent the compiler from optimizing away the delay loops, such as in GCC:
__volatile__ __asm__(“nop;\n”);
This example clearly shows that writing good code requires a lot of knowledge support. What do you think?
1.Considerations for C Language in Embedded System Programming
2.Roadmap for Learning Embedded Software!
3.Hard Links and Soft Links in Linux File Systems
4.For Embedded Systems, You Can Try Micro Bit!
5.How to Start Artificial Intelligence with Poor Programming and Math Basics?
6.Linux Command Link Operators to Make Your Code More Concise!
Disclaimer: This article is a network repost, and the copyright belongs to the original author. If there are any copyright issues, please contact us, and we will confirm the copyright based on the materials you provide and pay for the manuscript or delete the content.