Methods to Enhance the Reliability of Embedded Systems

Follow+Star Public Account Number, don’t miss out on wonderful content
Methods to Enhance the Reliability of Embedded Systems
Source | Internet
For an engineer to be truly responsible for a project, one needs to consider multiple aspects, from a well-defined development cycle to strict execution and system checks, there are many techniques for developing high-reliability embedded systems..
This article introduces several easy-to-operate and long-lasting techniques, which are greatly helpful for ensuring the system runs more reliably and captures abnormal behavior.

Select Reliable Hardware and Software Components

The reason I usually do not recommend everyone to use the latest version of software is because new versions may have bugs. If system reliability is a priority, it is better to use hardware and software that have been validated over a long period.
Hardware: Use rigorously tested high-reliability processors, memory, and sensors, for example, certain industrial-grade embedded systems choose hardware components that have long-term operational stability..
Software: Choose validated operating systems, drivers, and middleware to ensure compatibility and stability between software components. For instance, adopting a Real-Time Operating System (RTOS) can provide higher system stability and predictability.

Check Application CRC

A significant benefit for embedded engineers is that our IDEs and toolchains can automatically generate checksums for applications or memory spaces, thus verifying the integrity of the application based on this checksum. Interestingly, in many of these cases, the checksum is only used when loading the program code onto the device.
However, if the CRC or checksum is kept in memory, then verifying whether the application remains intact at startup (or even periodically for long-running systems) is an excellent way to ensure that unexpected events do not occur.

Methods to Enhance the Reliability of Embedded Systems

Currently, the probability of a programmed application changing is very low, but considering the billions of microcontrollers delivered each year and the possibly harsh working environments, the chance of an application crash is not zero. More likely, a defect in the system could cause a sector to undergo flash writes or erases, thus compromising the integrity of the application.

Perform RAM Checks at Startup

To build a more reliable and robust system, it is essential to ensure that the system’s hardware is functioning correctly. After all, hardware can fail (luckily, software never fails; it only does what the code tells it to do, whether right or wrong). Verifying that RAM has no issues internally or externally at startup is a good way to ensure that the hardware operates as expected.
There are many different methods to perform RAM checks, but a common method is to write a known pattern and then wait a short period before reading it back. The result should be what was written.
In most cases, RAM checks pass, which is the desired outcome. However, there is a very small possibility that the check could fail, providing an excellent opportunity to flag hardware issues in the system.

Use a Stack Monitor

For many embedded developers, the stack seems to be a rather mysterious force. When strange things start happening, engineers are finally stumped, and they begin to think about what might be going on with the stack. The result is often blind adjustments to the size and position of the stack, etc. But the error is often unrelated to the stack, but how can one be sure? After all, how many engineers have actually performed worst-case stack size analysis?
Stack size is statically allocated at compile time, but the stack is used dynamically. As code executes, variables required by the application, return addresses, and other information are continually stored on the stack. This mechanism causes the stack to grow within its allocated memory. However, this growth sometimes exceeds the capacity limits determined at compile time, leading to corruption of data in adjacent memory areas.
One way to ensure the stack is functioning correctly is to implement a stack monitor as part of the system’s “healthcare” code (how many engineers do that?). The stack monitor creates a buffer zone between the stack and “other” memory areas, filling it with known bit patterns. The monitor will continuously check for any changes in the pattern. If the bit pattern changes, it means the stack has grown too large and is about to push the system into a dark hell! At this point, the monitor can log the occurrence of the event, system status, and any other useful data for later diagnosis of the issue.
Most Real-Time Operating Systems (RTOS) or microcontroller systems with Memory Protection Units (MPU) provide stack monitors. Scarily, these features are often turned off by default or frequently intentionally disabled by developers. A quick search online reveals that many suggest turning off stack monitors in RTOS to save 56 bytes of flash memory. This is a penny-wise, pound-foolish approach!

Use MPU

In the past, it was difficult to find Memory Protection Units (MPU) in small and inexpensive microcontrollers, but this situation has started to change.
Now, from high-end to low-end microcontrollers, MPUs are available, providing embedded software developers with an opportunity to significantly enhance the robustness of their firmware. See:The Benefits of Using Memory Protection Units (MPU) in Embedded RTOS Systems
MPUs have gradually been coupled with operating systems to establish memory spaces where processes are separated, or tasks can execute their code without worrying about being stomped on.
If something does happen, uncontrolled processes will be terminated, and other protective measures will be executed. Keep an eye out for microcontrollers with this component; if available, make good use of this feature.

Establish a Robust Watchdog System

A commonly favored implementation of a watchdog is where the watchdog is enabled (a good start), but it can also be reset by a periodic timer; the timer’s activation is completely isolated from any conditions that occur in the program. The purpose of using a watchdog is to assist in ensuring that if an error occurs, the watchdog will not be reset, meaning that when work halts, the system will be forced to perform a hardware reset to recover. Using a timer independent of system activity allows the watchdog to remain reset even if the system has failed.

Methods to Enhance the Reliability of Embedded Systems

For how application tasks integrate into the watchdog system, embedded developers need to consider and design carefully. For instance, a technique may allow each task running within a certain period to indicate that it can successfully complete its task. In this event, the watchdog will not be reset, forcing a reset. There are also more advanced techniques, such as using an external watchdog processor that can monitor how the main processor performs, and vice versa.

Avoid Volatile Memory Allocation

Engineers not accustomed to working in resource-constrained environments may attempt to use features of their programming language that allow them to use volatile memory allocation. After all, this is a technique commonly used in calculator systems, where memory is only allocated when necessary. For example, when developing in C, engineers may tend to use malloc to allocate space on the heap. An operation is performed, and once completed, the allocated memory can be returned using free for heap use.
In resource-constrained systems, this can be a disaster! One of the issues with using volatile memory allocation is that errors or improper techniques can lead to memory leaks or fragmentation. If these issues occur, most embedded systems do not have the resources or knowledge to monitor the heap or handle it properly. And when they happen, if an application requests space but does not have the requested space available, what happens?
The problems arising from using volatile memory allocation are quite complex, and managing these issues can be a nightmare! An alternative approach is to simplify memory allocation statically. For instance, simply create a buffer of 256 bytes in the program instead of requesting a memory buffer of that size via malloc. This allocated memory can remain throughout the application’s lifecycle, with no concerns about heap or fragmentation issues.

Declaration: This article’s material comes from the internet, and the copyright belongs to the original author. If there are copyright issues with the work, please contact me for deletion.

———— END ————
Methods to Enhance the Reliability of Embedded Systems
●Column “Embedded Tools”
●Column “Embedded Development”
●Column “Keil Tutorial”
●Selected Tutorials from the Embedded Column
Click “Read the Original” to see more shares.

Leave a Comment

×