Follow and star our official account, to access exciting content directly
Source:Online Resources
Despite many embedded engineers being filled with hope and dreams, writing high-reliability code is not achieved overnight. It is a laborious process that requires developers to maintain and manage every bit and byte of the system. When an application is deemed “successful,” there is often a sense of relief, but just because the software runs correctly under controlled conditions does not mean it will continue to do so tomorrow or a year from now.
From a well-defined development cycle to strict enforcement and system checks, there are many techniques for developing high-reliability embedded systems. Here are 7 practical and long-lasting tips that can help ensure the system runs more reliably and captures abnormal behavior.
Tip 1 – Fill ROM with Known Values
Software developers are often a very optimistic group, as long as their code runs faithfully for a long time, that is all that matters. The situation where a microcontroller jumps out of the application space and executes in an unexpected code space seems quite rare. However, the chances of this happening are not less than those of a buffer overflow or a lost reference to an erroneous pointer. It does happen! The system behavior after this occurs will be uncertain because, by default, memory space is 0xFF, or the values in the memory area may only be known to God since they have not been written to.
However, there are quite comprehensive linker or IDE techniques that can help identify such events and recover the system. The trick is to use the FILL command to fill unused ROM with known bit patterns. There are many different combinations that can be used to fill unused memory, but if you want to build a more reliable system, the most obvious choice is to place an ISR fault handler in these locations. If the system encounters an error and the processor starts executing code outside the program space, it will trigger the ISR and provide an opportunity to store the processor, register, and system state before deciding on corrective action.
Tip 2 – Check Application CRC
A significant benefit for embedded engineers is that our IDEs and toolchains can automatically generate checksums for applications or memory spaces, allowing us to verify the integrity of the application based on this checksum. Interestingly, in many of these cases, the checksum is only used when the program code is loaded onto the device.
However, if the CRC or checksum is kept in memory, verifying whether the application remains intact at startup (or even periodically for long-running systems) is an excellent way to ensure that unexpected events do not occur. The probability of a programmed application changing is low, but considering the billions of microcontrollers delivered each year and the potentially harsh operating environments, the chance of application failure is not zero. More likely, a defect in the system could lead to a sector experiencing flash writes or erases, compromising the integrity of the application.
Tip 3 – Perform RAM Checks at Startup
To establish a more reliable and robust system, it is crucial to ensure that the system hardware is functioning correctly. After all, hardware can fail. (Fortunately, software never fails; it only does what the code tells it to do, whether right or wrong). Verifying that there are no issues with internal or external RAM at startup is a good way to ensure that the hardware operates as expected.
There are many different methods to perform RAM checks, but a common approach is to write a known pattern and then wait a short period before reading it back. The result should be what was written. The truth is that in most cases, the RAM check passes, which is the desired outcome. However, there is a small possibility that the check may fail, providing an excellent opportunity to indicate hardware issues in the system.
Tip 4 – Use a Stack Monitor
For many embedded developers, the stack seems to be a rather mysterious force. When strange things start happening, engineers are finally stumped, and they begin to wonder if something is going on with the stack. The result is often blind adjustments to the stack size and position, etc. But the error is often unrelated to the stack; how can one be so sure? After all, how many engineers have actually performed worst-case stack size analysis?
The stack size is statically allocated at compile time, but the stack is used dynamically. As the code executes, variables, return addresses, and other information needed by the application are continuously stored on the stack. This mechanism causes the stack to grow within its allocated memory. However, this growth can sometimes exceed the capacity limits determined at compile time, leading to stack corruption of adjacent memory areas.
One way to ensure the stack is functioning correctly is to implement a stack monitor as part of the system’s “health” code (how many engineers actually do this?). The stack monitor creates a buffer zone between the stack and “other” memory areas and fills it with known bit patterns. The monitor then continuously checks for any changes in the pattern. If the bit pattern changes, it means the stack has grown too large and is about to push the system into a dark abyss! At this point, the monitor can log the occurrence, system state, and any other useful data for later diagnosis of the issue.
Most real-time operating systems (RTOS) or microcontroller systems with memory protection units (MPUs) provide stack monitors. The scary part is that these features are often disabled by default or frequently turned off intentionally by developers. A quick search online reveals that many people suggest disabling stack monitors in real-time operating systems to save 56 bytes of flash space. This is a counterproductive approach!
Tip 5 – Use MPU
In the past, it was challenging to find a memory protection unit (MPU) in a small and inexpensive microcontroller, but this situation has begun to change. Now, microcontrollers from high-end to low-end have MPUs, and these MPUs provide embedded software developers with an opportunity to significantly enhance the robustness of their firmware.
MPUs have gradually been coupled with operating systems to establish memory spaces where processing is separated, or tasks can execute their code without worrying about being stomped on. If something does happen, uncontrolled processing will be canceled, and other protective measures will be executed. Keep an eye out for microcontrollers with this component, and if available, make good use of its features.
Tip 6 – Establish a Robust Watchdog System
A commonly favored implementation of a watchdog is to enable the watchdog (a good start) but also to reset the watchdog with a periodic timer; the timer’s activation is completely isolated from any conditions that arise in the program. The purpose of using a watchdog is to help ensure that if an error occurs, the watchdog will not be reset, meaning that when work halts, the system will be forced to perform a hardware reset to recover. Using a timer that is independent of system activity allows the watchdog to remain reset even if the system has failed.
Embedded developers need to carefully consider and design how application tasks integrate into the watchdog system. For example, a technique might allow each task running within a certain period to indicate that it can successfully complete its task. In this event, the watchdog is not reset, forcing a reset. There are also more advanced techniques, such as using an external watchdog processor that can monitor the performance of the main processor and vice versa.
Establishing a robust watchdog system is crucial for a reliable system. Due to the many technologies involved, it is difficult to cover everything in these few paragraphs, but the author will publish related articles on this topic in the future.
Tip 7 – Avoid Dynamic Memory Allocation
Engineers who are not accustomed to working in resource-constrained environments may attempt to use features of their programming language that allow them to use dynamic memory allocation. After all, this is a technique commonly used in computing systems, where memory is allocated only when necessary. For example, when developing in C, engineers may tend to use malloc to allocate space on the heap. An operation will execute, and once completed, the allocated memory can be returned using free for heap usage.
In resource-constrained systems, this can be a disaster! One of the problems with using dynamic memory allocation is that errors or improper techniques can lead to memory leaks or fragmentation. When these issues occur, most embedded systems do not have the resources or knowledge to monitor the heap or handle it properly. And when they do occur, what happens if the application requests space but there is no available space?
The problems arising from using dynamic memory allocation are complex, and properly handling these issues can be a nightmare! An alternative approach is to simplify memory allocation statically. For example, simply create a buffer of 256 bytes in the program instead of requesting a memory buffer of that size via malloc. This allocated memory can be maintained throughout the application’s lifecycle, and there will be no concerns about heap or memory fragmentation issues.
Conclusion
These are just a few methods that can help developers start building more reliable embedded systems. There are many other techniques, such as utilizing good coding standards, monitoring bit flips, performing array and pointer boundary checks, and using assertions. All these techniques are the secrets that allow designers to develop embedded systems with higher reliability.
Copyright Notice:This article is sourced from the internet, freely conveying knowledge, and the copyright belongs to the original author. If there are any copyright issues, please contact me for removal.
‧‧‧‧‧‧‧‧‧‧‧‧‧‧‧‧ END ‧‧‧‧‧‧‧‧‧‧‧‧‧‧‧
Follow my WeChat public account and reply "join group" to join the technical exchange group according to the rules.
Click "Read the original text" for more sharing, and feel free to share, bookmark, like, and view.