Combining Programming Concepts of PC and Embedded Systems

I am Lao Wen, an embedded engineer who loves learning.

Follow me, let’s become better together!

The first step is to look at embedded issues from the perspective of PC programming;

The second step is to learn to use embedded programming concepts;

The third step is to combine the ideas of PC and embedded systems and apply them to real projects.

Many friends transition from PC programming to embedded programming.

In China, very few embedded programmers have graduated from computer science; most come from automatic control or electronics-related majors.

These students have rich practical experience but lack theoretical knowledge; many computer science graduates go into online games and web applications that are independent of the operating system.

They are also reluctant to engage in the embedded industry, as it is a challenging path. They have strong theoretical knowledge but lack knowledge related to circuits, making it difficult to learn specific knowledge in embedded systems.

Although I have not conducted an industrial survey, from what I have seen and the people I have recruited, engineers in the embedded industry either lack theoretical knowledge or practical experience.

It is rare to find someone who possesses both. The root cause lies in the problems of university education in China. I will not discuss this issue here to avoid unnecessary arguments. I want to list a few examples from my practice to draw attention to some issues when doing projects in embedded systems.

The first example:

Combining Programming Concepts of PC and Embedded Systems

A colleague developed a serial driver under uC/OS-II, and problems were found during testing of both the driver and the interface. An application developed a communication program, and the serial driver provided a function to query the number of characters in the driver buffer: GetRxBuffCharNum().

The higher layer needs to receive a certain number of characters before it can parse the packet. The code written by a colleague can be represented in pseudocode as follows:

bExit = FALSE;
do {    if (GetRxBuffCharNum() &gt;= 30)        bExit = ReadRxBuff(buff, GetRxBuffCharNum());} while (!bExit);

This code checks if there are more than 30 characters in the current buffer, and reads all characters into the buffer until the read is successful.

The logic is clear, and the thought process is also clear. However, this code does not work correctly. If it were on a PC, there would be no issues, and it would work normally. But in embedded systems, it is hard to tell why. My colleague was frustrated and didn’t know why.

When he asked me to solve the problem, I looked at the code and asked him how GetRxBuffCharNum() was implemented. Upon opening it, I saw:

unsigned GetRxBuffCharNum(void) {    cpu_register reg;    unsigned num;    reg = interrupt_disable();    num = gRxBuffCharNum;    interrupt_enable(reg);    return (num);}

It is clear that in the loop, there is a global critical region between interrupt_disable() and interrupt_enable(), which ensures the integrity of gRxBufCharNum.

However, since the outer do { } while() loop frequently disables and enables interrupts, the time is very short.

In fact, the CPU may not respond to the UART interrupt properly. Of course, this is related to the UART baud rate, the size of the hardware buffer, and the speed of the CPU. The baud rate we used was very high, about 3Mbps.

The UART start and stop signals occupy one bit each. A byte consumes 10 cycles. A baud rate of 3Mbps takes about 3.3us to transmit one byte.

How many CPU instructions can be executed in 3.3us?

At 100MHz ARM, about 150 instructions can be executed. So how long does it take to disable interrupts? Generally, disabling interrupts on ARM requires more than 4 instructions, and enabling requires more than 4 instructions.

The code that receives UART interrupts actually exceeds 20 instructions. Therefore, there is a possibility of losing communication data, which manifests as unstable communication at the system level.

Modifying this code is actually quite simple; the easiest way is to modify it from the higher level. That is:

bExit = FALSE;    do {        DelayUs(20); // Delay 20us, generally implemented with busy loop        num = GetRxBuffCharNum();        if (num &gt;= 30)            bExit = ReadRxBuff(buff, num);    } while (!bExit);

This way, we give the CPU time to execute the interrupt code, thereby avoiding the issue of frequent disabling of interrupts causing the interrupt code to execute late, resulting in data loss.

In embedded systems, most RTOS applications do not come with serial drivers. When designing code, there is often insufficient consideration of how the code interacts with the kernel, leading to deep-seated issues.

The reason RTOS is called RTOS is that it enables quick responses to events; the quick response to events relies on the CPU’s response speed to interrupts.

Drivers in systems like Linux are highly integrated with the kernel, running together in kernel mode. While RTOS cannot replicate the Linux structure, it has certain reference significance.

From the above example, it is clear that embedded developers need to have a thorough understanding of every aspect of the code.

The second example:

A colleague drove a 14094 serial-to-parallel chip. The serial signal was simulated using IO because there was no dedicated hardware. My colleague casually wrote a driver, and after debugging for 3 or 4 days, there were still issues.

I couldn’t stand it anymore and went to take a look; the control of the parallel signal was sometimes normal and sometimes not. I looked at the code, which roughly was:

for (i = 0; i &lt; 8; i++) {    SetData((data &gt;&gt; i) &amp; 0x1);    SetClockHigh();    for (j = 0; j &lt; 5; j++);    SetClockLow();}

This code sends the 8 bits of data from bit 0 to bit 7 on each high signal. It should work fine. I couldn’t see where the problem was.

After thinking carefully and looking at the 14094 datasheet, I understood.

It turns out that the 14094 requires the clock high signal to last for 10ns, and the low signal must also last for 10ns. This code only implemented the delay for the high signal without considering the low signal delay. If an interrupt occurs during the low signal, this code may work.

However, if the CPU executes without an interrupt during the low signal, it cannot work correctly, hence the intermittent failures.

Modifying it is also quite simple:

for (i = 0; i &lt; 8; i++) {    SetData((data &gt;&gt; i) &amp; 0x1);    SetClockHigh();    for (j = 0; j &lt; 5; j++);    SetClockLow();    for (j = 0; j &lt; 5; j++);}

This works perfectly. However, this code is not easily portable because if the compiler optimizes it, it may eliminate these two delay loops.

If eliminated, it cannot guarantee the high and low signals last for 10ns, and thus cannot function correctly.

Therefore, truly portable code should implement this loop as a nanosecond-level DelayNs(10);

Like Linux, at power-up, first measure how long a nop instruction takes, and how many nop instructions execute for 10ns.

Executing a certain number of nop instructions will suffice. Utilize compiler directives or special keywords to prevent the compiler from optimizing away the delay loop. For instance, in GCC:

__volatile__ __asm__(“nop;\n”);

This example clearly shows that writing good code requires extensive knowledge to support it. What do you think?

Source: https://blog.csdn.net/coolbacon/article/details/6842921

Reposted from WeChat public account: Embedded Miscellany

Copyright belongs to the original author or platform. For learning reference only. If there is any infringement, please contact for deletion.

-END-

Recommended Reading: Click the image below to jump to read

How to Optimize Embedded C Language Code?

Three Common Software Architectures in Embedded Systems

Fan Benefits! Get a Multimeter for Free and Development Board!

I am Lao Wen, an embedded engineer who loves learning.

Follow me, let’s become better together!

Related posts

Leave a Comment Cancel reply