Combining PC and Embedded Programming Concepts

The first step is to view embedded issues from the perspective of PC programming;

The second step is to learn to use embedded programming concepts;

The third step is to combine PC and embedded concepts together and apply them in real projects.

Many friends transition from PC programming to embedded programming.

In China, very few friends in embedded programming have graduated from computer science; most come from automation, electronics, and related fields.

These individuals have rich practical experience but lack theoretical knowledge; many computer science graduates tend to work on online games, web pages, and other higher-level applications that are independent of the operating system.

They are also reluctant to engage in the embedded industry, as this path is challenging. They possess strong theoretical knowledge but lack knowledge of circuits and related topics, making it difficult to learn specific knowledge in embedded systems.

Although I haven’t conducted an industry survey, based on my observations and the people I have recruited, engineers in the embedded industry either lack theoretical knowledge or practical experience.

It’s rare to find someone who has both. The root cause lies in the issues with university education in China. I won’t delve into this topic to avoid a heated debate. Instead, I want to share a few examples from my practice to draw attention to certain issues in embedded projects.

The first example:

Combining PC and Embedded Programming Concepts

A colleague developed a serial port driver under uC/OS-II, and both the driver and interface encountered issues during testing. A communication program was developed in the application, and the serial port driver provided a function to query the number of characters in the driver buffer: GetRxBuffCharNum().

The higher level requires a certain number of characters to be received before parsing the packet. The code written by a colleague can be represented in pseudocode as follows:

bExit = FALSE;
do {    
if (GetRxBuffCharNum() &gt;= 30)        
bExit = ReadRxBuff(buff, GetRxBuffCharNum());
} while (!bExit);

This code checks if the current buffer contains more than 30 characters, and reads all characters from the buffer until the read is successful.

The logic is clear, and the thought process is straightforward. However, this code does not work correctly. If it were on a PC, there would be no issues; it would work normally. But in embedded systems, it is quite uncertain. My colleague was puzzled and did not know why.

When he asked me to solve the problem, I looked at the code and asked him how GetRxBuffCharNum() was implemented. Upon opening it, I saw:

unsigned GetRxBuffCharNum(void)    
{    
    cpu_register reg;    
    unsigned num;    
    reg = interrupt_disable();    
    num = gRxBuffCharNum;    
    interrupt_enable(reg);    
    return (num);    
}

It is evident that within the loop, the area between interrupt_disable() and interrupt_enable() is a global critical section, ensuring the integrity of gRxBufCharNum.

However, due to the frequent enabling and disabling of interrupts in the outer do { } while() loop, this time is very short.

In fact, the CPU may not respond correctly to the UART interrupts. This is related to the baud rate of the UART, the size of the hardware buffer, and the speed of the CPU. The baud rate we are using is very high, approximately 3Mbps.

The start and stop signals of the UART occupy one bit each. One byte requires 10 cycles. At 3Mbps, it takes about 3.3us to transmit a byte.

How many CPU instructions can be executed in 3.3us?

At 100MHz ARM, approximately 150 instructions can be executed. How long does it take to disable interrupts? Generally, disabling interrupts on ARM requires more than 4 instructions, and enabling it again requires another 4 instructions.

The code for receiving UART interrupts actually takes more than 20 instructions. Therefore, this could lead to a bug where communication data is lost, manifesting at the system level as unstable communication.

Modifying this code is actually quite simple; the easiest way is to modify it from the higher level. That is:

bExit = FALSE;
do {    
    DelayUs(20); // Delay 20us, generally implemented with a busy loop    
    num = GetRxBuffCharNum();    
    if (num &gt;= 30)    
        bExit = ReadRxBuff(buff, num);    
} while (!bExit);

This way, the CPU has time to execute the interrupt code, thus avoiding the delays caused by frequently disabling interrupts, which leads to lost information.

In embedded systems, most RTOS applications do not come with serial port drivers. When designing code, developers often do not fully consider the integration of the code with the kernel, resulting in deeper issues.

The reason RTOS is called RTOS is due to its rapid response to events; rapid response to events depends on the CPU’s response speed to interrupts.

Drivers in systems like Linux are highly integrated with the kernel, running in kernel mode. Although RTOS cannot replicate the Linux structure, it can draw some lessons from it.

From the above example, it is clear that embedded developers need to have a solid understanding of every aspect of the code.

The second example:

A colleague was driving a 14094 serial-to-parallel chip. The serial signal was simulated using IO because there was no dedicated hardware. My colleague casually wrote a driver, but after debugging for 3 or 4 days, there were still issues.

I couldn’t bear to watch anymore, so I took a look. The control of the parallel signal was sometimes normal and sometimes abnormal. I looked at the code, which roughly was:

for (i = 0; i &lt; 8; i++)    
{    
    SetData((data &gt;&gt; i) &amp; 0x1);    
    SetClockHigh();    
    for (j = 0; j &lt; 5; j++);    
    SetClockLow();    
}

This sends the 8 bits of data from bit0 to bit7 sequentially on each high clock. It should be normal. I couldn’t see where the problem was!

After thinking about it, I checked the datasheet for the 14094 and understood.

It turns out that the 14094 requires the clock high level to last for 10ns, and the low level must also last for 10ns. This code only implemented a delay for the high level time but did not implement a delay for the low level. If an interrupt occurs during the low level, then this code can work.

However, if the CPU does not execute during the low level due to an interrupt, it will not work correctly. Hence, it works sometimes and not at others.

Modification is also quite simple:

for (i = 0; i &lt; 8; i++)    
{    
    SetData((data &gt;&gt; i) &amp; 0x1);    
    SetClockHigh();    
    for (j = 0; j &lt; 5; j++);    
    SetClockLow();    
    for (j = 0; j &lt; 5; j++);    
}

This works perfectly. However, this code is not easily portable because if the compiler optimizes it, it might eliminate these two delay loops.

If eliminated, it cannot guarantee that the high and low levels last for 10ns, and thus it cannot work correctly.

Therefore, truly portable code should implement this loop as a nanosecond-level DelayNs(10);

Like Linux, at power-up, it first measures how long the nop instruction takes to execute and how many nop instructions are needed for 10ns.

Executing a certain number of nop instructions should suffice. Using compiler directives to prevent optimization or special keywords to prevent the delay loop from being optimized away, such as in GCC:

__volatile__ __asm__(“nop;\n”);

This example clearly shows that writing good code requires a lot of supporting knowledge. What do you think?

Source: https://blog.csdn.net/coolbacon/article/details/6842921

Reprinted from WeChat public account: Uncle Wheat

Copyright belongs to the original author or platform, for learning reference only. If there is any infringement, please contact for deletion.

Related posts

Leave a Comment Cancel reply