Using Assertions (Assert)
What is an Assert assertion? Let’s illustrate this with an example.
int Array[5] = {0xA1, 0xB2, 0xC3, 0xD4, 0xE5};
int Fun(char i){ return Array[i];}
int a;
a = Fun(8);
Experienced friends must have guessed that adding an Assert mechanism in the Fun function can avoid errors.
Assertions (Assert) are one of the most common fault tolerance designs in code, and many source code libraries can be seen using assertions, such as the STM32 peripheral library:
void GPIO_Init(GPIO_TypeDef* GPIOx, GPIO_InitTypeDef* GPIO_InitStruct){ /* Check the parameters */ assert_param(IS_GPIO_ALL_PERIPH(GPIOx)); assert_param(IS_GPIO_MODE(GPIO_InitStruct->GPIO_Mode)); assert_param(IS_GPIO_PIN(GPIO_InitStruct->GPIO_Pin)); /* ... */}
Clearly Defined Return Values and Error Codes
Commonly used protocol stacks, peripheral libraries, operating systems, etc., mostly have APIs that are designed perfectly, providing reasonable return values for functions to feedback on the success or failure of operations. For example, using 0 to indicate success and non-zero values to indicate specific error codes.
For example, the RTOS task creation function:
INT8U OSTaskCreate (void (*task)(void *p_arg), void *p_arg, OS_STK *ptos, INT8U prio){ OS_STK *psp; INT8U err;#if OS_CRITICAL_METHOD == 3u /* Allocate storage for CPU status register */ OS_CPU_SR cpu_sr = 0u;#endif
#ifdef OS_SAFETY_CRITICAL_IEC61508 if (OSSafetyCriticalStartFlag == OS_TRUE) { OS_SAFETY_CRITICAL_EXCEPTION(); return (OS_ERR_ILLEGAL_CREATE_RUN_TIME); }#endif
#if OS_ARG_CHK_EN > 0u if (prio > OS_LOWEST_PRIO) { /* Make sure priority is within allowable range */ return (OS_ERR_PRIO_INVALID); }#endif OS_ENTER_CRITICAL(); if (OSIntNesting > 0u) { /* Make sure we don't create the task from within an ISR */ OS_EXIT_CRITICAL(); return (OS_ERR_TASK_CREATE_ISR); } /* ... */}
Designing reasonable return values and error codes for functions will also make your code more robust, especially making it easier to find bugs.
Logging
Why do we need to log? Recording detailed log information, including the time, location, and reason for the error, helps in tracing and analyzing when bugs occur.
When we first learn embedded systems, we basically learn the printing function like printf, which corresponds to another function of logging.
In addition to storing logs locally, we can also use printf to print output to another terminal (such as a host computer) for log storage.
Fatal Bug Restart Strategy
When we encounter some fatal bugs in software, such as hardware faults (HardFault), memory overflow (MemManage), we can choose a restart strategy.
Of course, the restart should depend on the actual situation of the project, choosing what kind of restart method, such as: core reset, system reset.
1. Core Reset
Only reset the Cortex-M core, without resetting UART and other on-chip peripherals.
In the Cortex-M core documentation, there is a description like this: By setting the VECTRESET bit in the AIRCR of the application interrupt and reset control register (NVIC), the processor core can be reset without resetting other on-chip facilities.
The core reset function (modified from the core code) is as follows:
void NVIC_CoreReset(void){ __DSB(); SCB->AIRCR = ((0x5FA << SCB_AIRCR_VECTKEY_Pos) | (SCB->AIRCR & SCB_AIRCR_PRIGROUP_Msk) | SCB_AIRCR_VECTRESET_Msk); // Set VECTRESET __DSB(); while(1) { __NOP(); }}
2. System Reset
The register bit (SYSRESETREQ) for system reset in software reset operations is different, and the object of the reset is the whole chip (except for the backup area).
System reset function:
void NVIC_SysReset(void){ __DSB(); SCB->AIRCR = ((0x5FA << SCB_AIRCR_VECTKEY_Pos) | (SCB->AIRCR & SCB_AIRCR_PRIGROUP_Msk) | SCB_AIRCR_SYSRESETREQ_Msk); // Set SYSRESETREQ __DSB(); while(1) { __NOP(); }}
Static Analysis Tools
Using static analysis tools to check for potential issues in the code, such as uninitialized variables, memory leaks, buffer overflows, etc. These tools can detect many issues before compilation, thus improving code quality.
Although this is not exactly a fault tolerance design, it is also an important part of the development process, and its role can sometimes exceed conventional fault tolerance designs.
Author | strongerHuang
-
Looks like a tripartite, but actually a memorial
-
DJI Drone Electronic Speed Control Circuit Board
-
Case | Power Entry with Ferrite Bead, Trouble Occurred
-
Investigation of Poaching, Taiwan Calls for 8 Mainland Chip Companies
-
Domestic GPU Unicorn Begins Layoffs: “We’ll Rehire Everyone When We Have Money”