Body
Understanding embedded issues from a PC programming perspective is the first step; learning to think in embedded programming is the second step; combining PC and embedded thinking in practical projects is the third step. Many friends transition from PC programming to embedded programming. In China, few embedded programmers have graduated from computer science; most come from automatic control or electronics-related fields. These individuals have strong practical experience but lack theoretical knowledge; many computer science graduates end up working on online games or web applications that are independent of the operating system. They are often reluctant to enter the embedded industry, as it is a challenging path. They possess solid theoretical knowledge but lack knowledge of circuits and other related areas, making it difficult to learn embedded systems.
Although I have not conducted an industry survey, from my observations and the candidates I have recruited, engineers in the embedded industry either lack theoretical knowledge or practical experience. Rarely do we find individuals who possess both. The root cause lies in the issues with university education in China. I will not delve into this topic to avoid unnecessary arguments. Instead, I would like to present a few examples from my practice to draw attention to certain issues when working on embedded projects.
First Issue:
A colleague was developing a serial port driver under uC/OS-II, and both the driver and interface encountered issues during testing. An application was developed for communication, and the serial port driver provided a function to query the number of characters in the driver buffer: GetRxBuffCharNum().
The higher-level application needed to receive a certain number of characters before it could parse the packet. The code written by a colleague can be represented in pseudocode as follows:
bExit = FALSE;
do { if (GetRxBuffCharNum() >= 30) bExit = ReadRxBuff(buff, GetRxBuffCharNum());} while (!bExit);
This code checks if there are more than 30 characters in the current buffer and reads all characters into the buffer until successful. The logic is clear, and the thought process is straightforward. However, this code does not work correctly. If it were on a PC, there would be no issues, and it would function normally. But in embedded systems, the outcome is uncertain. My colleague was frustrated and did not understand why. He came to me for help, and when I saw the code, I asked him how GetRxBuffCharNum() was implemented. Upon inspection, I found:
unsigned GetRxBuffCharNum(void) { cpu_register reg; unsigned num; reg = interrupt_disable(); num = gRxBuffCharNum; interrupt_enable(reg); return (num);}
It is evident that there is a global critical section between interrupt_disable() and interrupt_enable() in the loop, ensuring the integrity of gRxBufCharNum.
However, due to the frequent enabling and disabling of interrupts in the outer do { } while() loop, the time is very short. In reality, the CPU may not respond correctly to UART interrupts. This is related to the baud rate, the size of the hardware buffer, and the CPU speed. The baud rate we are using is quite high, approximately 3Mbps.
The start and stop signals of UART occupy one bit each. One byte requires 10 cycles. At a baud rate of 3Mbps, it takes about 3.3us to transmit one byte. How many CPU instructions can be executed in 3.3us? At 100MHz ARM, approximately 150 instructions can be executed. How long does it take to disable interrupts? Generally, disabling interrupts on ARM requires more than 4 instructions, and enabling them again requires another 4 instructions. The code for receiving UART interrupts actually consists of more than 20 instructions. Therefore, it is possible to encounter a bug that causes communication data loss, which manifests at the system level as unstable communication.
Modifying this code is actually quite simple; the easiest way is to change it from the higher level. That is:
bExit = FALSE; do { DelayUs(20); // Delay 20us, generally implemented using a busy loop num = GetRxBuffCharNum(); if (num >= 30) bExit = ReadRxBuff(buff, num); } while (!bExit);
This allows the CPU time to execute the interrupt code, thus avoiding the issues caused by frequently disabling interrupts, which leads to information loss.In embedded systems, most RTOS applications do not come with serial port drivers. When designing code, developers often do not fully consider the integration of the code with the kernel, leading to deep-seated issues. The reason RTOS is called RTOS is due to its rapid response to events; this quick response relies on the CPU’s ability to respond to interrupts. Drivers in systems like Linux are highly integrated with the kernel and run in kernel mode. Although RTOS cannot replicate the structure of Linux, there are certain lessons to be learned.
From the above example, it is clear that embedded development requires developers to have a thorough understanding of all aspects of the code.
Second Example:
A colleague was driving a 14094 serial-to-parallel chip. The serial signal was simulated using IO because there was no dedicated hardware. The colleague casually wrote a driver, but after 3 or 4 days of debugging, there were still issues.
I couldn’t bear to watch any longer, so I took a look. The control of the parallel signal was sometimes normal and sometimes not. I reviewed the code, which was roughly in pseudocode:
for (i = 0; i < 8; i++) { SetData((data >> i) & 0x1); SetClockHigh(); for (j = 0; j < 5; j++); SetClockLow();}
This sends the 8 bits of data from bit0 to bit7 sequentially on each high clock edge. It should work normally. I couldn’t see where the problem was. After thinking it over and checking the 14094 datasheet, I understood.
It turns out that the 14094 requires the clock high level to last for 10ns, and the low level must also last for 10ns. This code only implements a delay for the high level and does not implement a delay for the low level. If an interrupt occurs during the low level, this code may work. However, if the CPU does not execute during the low level, it will not work correctly. Hence, it works intermittently.
Modifying it is also quite simple:
for (i = 0; i < 8; i++) { SetData((data >> i) & 0x1); SetClockHigh(); for (j = 0; j < 5; j++); SetClockLow(); for (j = 0; j < 5; j++);}
This works perfectly. However, this code is not easily portable because if the compiler optimizes it, it may eliminate these two delay loops. If they are lost, the requirement for the high and low levels to last for 10ns cannot be guaranteed, and it will not work correctly.
Therefore, truly portable code should implement this loop as a nanosecond-level DelayNs(10);
Like Linux, at power-up, first measure how long the nop instruction takes to execute, and how many nop instructions are needed to achieve 10ns. Then execute a certain number of nop instructions. Use compiler directives to prevent optimization or special keywords to prevent the delay loop from being optimized away, such as in GCC:
__volatile__ __asm__(“nop;\n”);
This example clearly shows that writing good code requires a lot of supporting knowledge. What do you think?
Source: This article is an original piece by CSDN blogger “coolbacon” and follows the CC 4.0 BY-SA copyright agreement. Please include the original source link and this statement when reprinting.
Copyright belongs to the original author or platform, for learning reference and academic research only. If there is any infringement, please contact for deletion~
Finally
If you found this helpful, please remember to give alike~
Recommended Collections Click the blue text to jump
☞ MCU Advanced Collection
☞ Embedded C Language Advanced Collection
☞ “Bug Talk” Collection
☞ Collection | Comprehensive Programming for Linux Applications
☞ Collection | Learn Some Networking Knowledge
☞ Collection | Handwritten C Language
☞ Collection | Handwritten C++ Language
☞ Collection | Experience Sharing
☞ Collection | Power Control Technology
☞ Collection | From Microcontrollers to Linux