Although I haven’t conducted an industry survey, based on my observations and the people I have recruited, engineers in the embedded industry either lack theoretical knowledge or practical experience.
It’s rare to find someone who has both. The root cause lies in the issues with university education in China. I won’t delve into this topic to avoid a heated debate. Instead, I want to share a few examples from my practice to draw attention to certain issues in embedded projects.
The first example:

A colleague developed a serial port driver under uC/OS-II, and both the driver and interface encountered issues during testing. A communication program was developed in the application, and the serial port driver provided a function to query the number of characters in the driver buffer: GetRxBuffCharNum().
The higher level requires a certain number of characters to be received before parsing the packet. The code written by a colleague can be represented in pseudocode as follows:
bExit = FALSE;
do {
if (GetRxBuffCharNum() >= 30)
bExit = ReadRxBuff(buff, GetRxBuffCharNum());
} while (!bExit);
unsigned GetRxBuffCharNum(void)
{
cpu_register reg;
unsigned num;
reg = interrupt_disable();
num = gRxBuffCharNum;
interrupt_enable(reg);
return (num);
}
It is evident that within the loop, the area between interrupt_disable() and interrupt_enable() is a global critical section, ensuring the integrity of gRxBufCharNum.
However, due to the frequent enabling and disabling of interrupts in the outer do { } while() loop, this time is very short.
In fact, the CPU may not respond correctly to the UART interrupts. This is related to the baud rate of the UART, the size of the hardware buffer, and the speed of the CPU. The baud rate we are using is very high, approximately 3Mbps.
The start and stop signals of the UART occupy one bit each. One byte requires 10 cycles. At 3Mbps, it takes about 3.3us to transmit a byte.
How many CPU instructions can be executed in 3.3us?
At 100MHz ARM, approximately 150 instructions can be executed. How long does it take to disable interrupts? Generally, disabling interrupts on ARM requires more than 4 instructions, and enabling it again requires another 4 instructions.
The code for receiving UART interrupts actually takes more than 20 instructions. Therefore, this could lead to a bug where communication data is lost, manifesting at the system level as unstable communication.
Modifying this code is actually quite simple; the easiest way is to modify it from the higher level. That is:
bExit = FALSE;
do {
DelayUs(20); // Delay 20us, generally implemented with a busy loop
num = GetRxBuffCharNum();
if (num >= 30)
bExit = ReadRxBuff(buff, num);
} while (!bExit);
From the above example, it is clear that embedded developers need to have a solid understanding of every aspect of the code.
The second example:

A colleague was driving a 14094 serial-to-parallel chip. The serial signal was simulated using IO because there was no dedicated hardware. My colleague casually wrote a driver, but after debugging for 3 or 4 days, there were still issues.
I couldn’t bear to watch anymore, so I took a look. The control of the parallel signal was sometimes normal and sometimes abnormal. I looked at the code, which roughly was:
for (i = 0; i < 8; i++)
{
SetData((data >> i) & 0x1);
SetClockHigh();
for (j = 0; j < 5; j++);
SetClockLow();
}
This sends the 8 bits of data from bit0 to bit7 sequentially on each high clock. It should be normal. I couldn’t see where the problem was!
After thinking about it, I checked the datasheet for the 14094 and understood.
It turns out that the 14094 requires the clock high level to last for 10ns, and the low level must also last for 10ns. This code only implemented a delay for the high level time but did not implement a delay for the low level. If an interrupt occurs during the low level, then this code can work.
However, if the CPU does not execute during the low level due to an interrupt, it will not work correctly. Hence, it works sometimes and not at others.
Modification is also quite simple:
for (i = 0; i < 8; i++)
{
SetData((data >> i) & 0x1);
SetClockHigh();
for (j = 0; j < 5; j++);
SetClockLow();
for (j = 0; j < 5; j++);
}
This works perfectly. However, this code is not easily portable because if the compiler optimizes it, it might eliminate these two delay loops.
If eliminated, it cannot guarantee that the high and low levels last for 10ns, and thus it cannot work correctly.
Therefore, truly portable code should implement this loop as a nanosecond-level DelayNs(10);
Like Linux, at power-up, it first measures how long the nop instruction takes to execute and how many nop instructions are needed for 10ns.
Executing a certain number of nop instructions should suffice. Using compiler directives to prevent optimization or special keywords to prevent the delay loop from being optimized away, such as in GCC:
__volatile__ __asm__(“nop;\n”);
This example clearly shows that writing good code requires a lot of supporting knowledge. What do you think?
Source: https://blog.csdn.net/coolbacon/article/details/6842921
Reprinted from WeChat public account: Uncle Wheat
Copyright belongs to the original author or platform, for learning reference only. If there is any infringement, please contact for deletion.
You might also like:
Embedded Miscellaneous Weekly | Issue 6
Essentials | What is Processor Microarchitecture and Instruction Set?
Sharing an Embedded Software Tool List!
Sharing a Compact and Useful Code Comparison Tool
An OS Implemented with Over 300 Lines of Code for Multi-task Management
Sharing Several Useful Shell Scripts in Embedded Systems!
Reply with 1024 in the WeChat public account chat interface to obtain embedded resources; reply with m to view the article summary.
Click Read Original to see more shares.