Click the blue “One Click Linux” in the upper left corner, and select “Set as Favorite“
Get the latest technical articles at the first time
☞【Technical Content】Learning Path for Embedded Driver Engineers
☞【Technical Content】Linux Embedded Knowledge Points - Mind Map - Free Access
☞【Employment】A Comprehensive IoT Project Based on Linux for Your Resume
☞【Employment】Resume Template
If a large embedded project does not implement fault tolerance design in its code, can you imagine the consequences?Experienced friends can certainly think of countless bugs in such a project, and some bugs are very difficult to trace.Today, let’s discuss several common fault tolerance design methods in embedded code.
Using Assertions (Assert)
What is an Assert assertion? Let me illustrate with an example.
Consider this array and function:
int Array[5] = {0xA1, 0xB2, 0xC3, 0xD4, 0xE5};
int Fun(char i) { return Array[i]; }
If you call the Fun function like this, do you think it will cause an error?
int a; a = Fun(8);
Experienced friends probably guessed that adding an assertion mechanism in the Fun function can prevent errors.
Assertions (Assert) are one of the most common fault tolerance designs in code, and many source code libraries feature assertions, such as the STM32 peripheral library:
void GPIO_Init(GPIO_TypeDef* GPIOx, GPIO_InitTypeDef* GPIO_InitStruct) { /* Check the parameters */ assert_param(IS_GPIO_ALL_PERIPH(GPIOx)); assert_param(IS_GPIO_MODE(GPIO_InitStruct->GPIO_Mode)); assert_param(IS_GPIO_PIN(GPIO_InitStruct->GPIO_Pin)); /* ... */}
Clear Return Values and Error Codes
Commonly used protocol stacks, peripheral libraries, and operating systems often have APIs that are well-designed, providing reasonable return values to indicate the success or failure of operations. For example, using 0 to indicate success and non-zero values to indicate specific error codes.
For instance, the RTOS task creation function:
INT8U OSTaskCreate(void (*task)(void *p_arg), void *p_arg, OS_STK *ptos, INT8U prio) { OS_STK *psp; INT8U err; #if OS_CRITICAL_METHOD == 3u /* Allocate storage for CPU status register */ OS_CPU_SR cpu_sr = 0u; #endif
#ifdef OS_SAFETY_CRITICAL_IEC61508 if (OSSafetyCriticalStartFlag == OS_TRUE) { OS_SAFETY_CRITICAL_EXCEPTION(); return (OS_ERR_ILLEGAL_CREATE_RUN_TIME); } #endif
#if OS_ARG_CHK_EN > 0u if (prio > OS_LOWEST_PRIO) { /* Make sure priority is within allowable range */ return (OS_ERR_PRIO_INVALID); } #endif OS_ENTER_CRITICAL(); if (OSIntNesting > 0u) { /* Make sure we don't create the task from within an ISR */ OS_EXIT_CRITICAL(); return (OS_ERR_TASK_CREATE_ISR); } /* ... */}
Designing reasonable return values and error codes for functions will also make your code more robust, especially when debugging.
Logging
Why do we need to log? Detailed log information, including the time, location, and reason for errors, should be recorded to facilitate tracking and analysis when bugs occur.
When we first learn embedded systems, we usually learn about the printf function for output, which corresponds to another function: logging.
In addition to storing logs locally, printf can also be used to output logs to another terminal (such as a host computer) for storage.
Fatal Bug Restart Strategy
When we encounter fatal bugs in software, such as hardware faults (HardFault) or memory overflows (MemManage), we can choose a restart strategy.Of course, the restart method should be chosen based on the actual project situation, such as: core reset or system reset.1. Core ResetThis only resets the Cortex-M core and does not reset peripherals like UART.In the Cortex-M core documentation, there is a description like this:By setting the VECTRESET bit in the application interrupt and reset control register (AIRCR) in NVIC, the processor core can be reset without resetting other on-chip facilities.The core reset function (modified from core code) is as follows:
void NVIC_CoreReset(void) { __DSB(); SCB->AIRCR = ((0x5FA << SCB_AIRCR_VECTKEY_Pos) | (SCB->AIRCR & SCB_AIRCR_PRIGROUP_Msk) | SCB_AIRCR_VECTRESET_Msk); // Set VECTRESET __DSB(); while(1) { __NOP(); }}
2. System Reset
The register bit (SYSRESETREQ) for system reset operations in software reset is different, and the target of the reset is the entire chip (except for the backup area).
The system reset function is as follows:
void NVIC_SysReset(void) { __DSB(); SCB->AIRCR = ((0x5FA << SCB_AIRCR_VECTKEY_Pos) | (SCB->AIRCR & SCB_AIRCR_PRIGROUP_Msk) | SCB_AIRCR_SYSRESETREQ_Msk); // Set SYSRESETREQ __DSB(); while(1) { __NOP(); }}
Static Analysis Tools
Using static analysis tools to check for potential issues in the code, such as uninitialized variables, memory leaks, buffer overflows, etc. These tools can detect many issues before compilation, thus improving code quality.Although this is not strictly a fault tolerance design, it is an important part of the development process, and its role can sometimes exceed that of conventional fault tolerance designs.Here, I recommend reading:Common Static Analysis Tools for Embedded DevelopmentFinally, there are countless code bugs, and in addition to conventional fault tolerance designs, coding standards are also very important.Lastly, when you write code, what fault tolerance designs do you consider? Feel free to leave comments.
end
One Click Linux
Follow us and reply with 【1024】 to receive a wealth of Linux materials
Collection of Wonderful Articles
Article Recommendations
☞【Collection】ARM☞【Collection】Fan Q&A☞【Collection】All Originals☞【Collection】LinuxGetting Started☞【Collection】Computer Networks☞【Collection】Linux Drivers☞【Technical Content】Learning Path for Embedded Driver Engineers☞【Technical Content】All Knowledge Points of Linux Embedded – Mind Map