I am Lao Wen, an embedded engineer who loves learning.Follow me, and let’s become better together!
The first step is to look at embedded issues from the perspective of PC programming; the second step is to learn to think in embedded programming; the third step is to combine PC and embedded thinking and apply it to real projects.
Many friends transition from PC programming to embedded programming. In China, few embedded programmers graduate from computer science; most come from automatic control or electronics-related majors.
These individuals have strong practical experience but lack theoretical knowledge; many computer science graduates end up working on online games or web applications that are independent of the operating system.
They are also reluctant to engage in the embedded industry, as this path is challenging. They have solid theoretical knowledge but lack knowledge of circuits and other related areas, making it difficult to learn specific knowledge in embedded systems.
Although I haven’t conducted an industry survey, from my observations and the candidates I have recruited, engineers in the embedded industry either lack theoretical knowledge or practical experience.
It is rare to find someone who possesses both. The root cause lies in the issues with China’s university education. I won’t delve into this topic to avoid unnecessary arguments. I would like to list a few examples from my practice to draw attention to some issues when working on embedded projects.
The first issue:
A colleague was developing a serial driver under uC/OS-II, and both the driver and interface encountered issues during testing. An application was developed for communication, and the serial driver provided a function to query the number of characters in the driver buffer: GetRxBuffCharNum().
The higher-level application needs to receive a certain number of characters before it can parse the packet. The code written by a colleague can be represented in pseudocode as follows:
bExit = FALSE;
do {    if (GetRxBuffCharNum() >= 30)        bExit = ReadRxBuff(buff, GetRxBuffCharNum());} while (!bExit);This code checks if there are more than 30 characters in the current buffer and reads all characters into the buffer until successful.
The logic is clear, and the thought process is straightforward. However, this code does not work correctly. If it were on a PC, there would be no issues, and it would function normally. But in embedded systems, the outcome is uncertain. My colleague was frustrated and didn’t understand why.
When he asked me to solve the problem, I looked at the code and asked him how GetRxBuffCharNum() was implemented. Upon inspection:
unsigned GetRxBuffCharNum(void) {    cpu_register reg;    unsigned num;    reg = interrupt_disable();    num = gRxBuffCharNum;    interrupt_enable(reg);    return (num);}It is evident that there is a global critical section between interrupt_disable() and interrupt_enable() in the loop, ensuring the integrity of gRxBufCharNum. However, due to the frequent enabling and disabling of interrupts in the outer do { } while() loop, this time is very short. In reality, the CPU may not respond to the UART interrupt correctly. This is related to the UART baud rate, the size of the hardware buffer, and the CPU speed. The baud rate we are using is very high, approximately 3Mbps. The start and stop signals of the UART occupy one bit each. One byte takes 10 cycles to transmit. At a baud rate of 3Mbps, it takes about 3.3us to transmit one byte. How many CPU instructions can be executed in 3.3us? At 100MHz ARM, about 150 instructions can be executed. How long does it take to disable interrupts? Generally, disabling interrupts on ARM requires more than 4 instructions, and enabling them again requires more than 4 instructions. The code for receiving UART interrupts actually consists of more than 20 instructions. Therefore, this can lead to a bug where communication data is lost, manifesting at the system level as unstable communication. Modifying this code is actually quite simple; the easiest way is to change it from the higher level. That is:
bExit = FALSE;    do {        DelayUs(20); // Delay 20us, generally implemented using a busy loop        num = GetRxBuffCharNum();        if (num >= 30)            bExit = ReadRxBuff(buff, num);    } while (!bExit);This allows the CPU time to execute the interrupt code, thus avoiding the issues caused by frequently disabling interrupts, which leads to information loss.
In embedded systems, most RTOS applications do not come with serial drivers. When designing code, there is often insufficient consideration of the integration between the code and the kernel.
This leads to deep-seated issues in the code. An RTOS is called an RTOS because of its quick response to events; the quick response to events relies on the CPU’s response speed to interrupts. Drivers in systems like Linux are highly integrated with the kernel and run in kernel mode.
Although RTOS cannot replicate the structure of Linux, there are certain lessons to be learned.
From the above example, it is clear that embedded development requires developers to have a thorough understanding of all aspects of the code.
The second example:
A colleague was driving a 14094 serial-to-parallel chip. The serial signal was simulated using IO because there was no dedicated hardware. The colleague casually wrote a driver, but after debugging for 3 or 4 days, there were still issues.
I couldn’t watch it any longer, so I took a look. The control of the parallel signal was sometimes normal and sometimes not. I looked at the code, which was roughly in pseudocode:
for (i = 0; i < 8; i++) {    SetData((data >> i) & 0x1);    SetClockHigh();    for (j = 0; j < 5; j++);    SetClockLow();}This sends the 8 bits of data sequentially from bit0 to bit7 on each high clock edge. It should work normally. I couldn’t see where the problem was. After thinking about it and checking the 14094 datasheet, I understood.It turns out that the 14094 requires the clock high level to last for 10ns, and the low level must also last for 10ns. This code only implemented a delay for the high level and did not implement a delay for the low level. If an interrupt occurs during the low level, this code may work. However, if the CPU does not execute during the low level, it will not work correctly. Hence, it works intermittently. The modification is also quite simple:
for (i = 0; i < 8; i++) {    SetData((data >> i) & 0x1);    SetClockHigh();    for (j = 0; j < 5; j++);    SetClockLow();    for (j = 0; j < 5; j++);}This works perfectly. However, this code is not easily portable because if the compiler optimizes it, it may eliminate these two delay loops. If they are lost, the requirement for the high and low levels to last for 10ns cannot be guaranteed, and it will not work correctly. Therefore, truly portable code should implement this loop as a nanosecond-level DelayNs(10); Like Linux, at power-up, first measure how long the nop instruction takes to execute, and how many nop instructions are needed to achieve 10ns. Then execute a certain number of nop instructions. Use compiler directives or special keywords to prevent the compiler from optimizing away the delay loop, such as in GCC:__volatile__ __asm__(“nop;\n”); From the above examples, it is clear that writing good code requires a lot of knowledge support. What do you think?Source: This article is an original piece by CSDN blogger ‘coolbacon’, following the CC 4.0 BY-SA copyright agreement. Please include the original source link and this statement when reprinting.Copyright belongs to the original author. If there is any infringement, please contact for deletion.-END-Recommended reading: Click the image below to jump to read.
What is the difference between an independent watchdog and a window watchdog?

[Discussion] Will programmers become cheap labor (migrant workers) in the future?

The implementation principle of embedded OTA upgrades.
I am Lao Wen, an embedded engineer who loves learning.Follow me, and let’s become better together!