Click the above blue text to follow us
HardFault is a type of exception in ARM Cortex-M processors. It is triggered when the processor encounters an unresolvable error, or when the exception handler configured to handle specific types of errors (such as bus errors, memory management errors, usage faults) is disabled, or when another error occurs during the handling of these specific errors. It serves as a “catch-all” exception indicating that the system has encountered a serious problem.
Debugging HardFault requires patience and a systematic approach. The key steps are:
- Implement a HardFault_Handler that can capture sufficient information.
- Use a debugger to obtain the values of the fault status registers and the exception stack frame.
- Carefully interpret these values, especially CFSR, HFSR, MMFAR, BFAR, and the PC in the stack.
- Combine disassembly and source code to locate the specific instruction and code line that triggered the fault.
- Analyze common causes (pointers, out-of-bounds access, stack issues, alignment, MPU, etc.) and fix them.
When a HardFault occurs, the processor automatically pushes some critical registers onto the currently used stack (MSP or PSP) and jumps to the HardFault handler.
Our primary task is to write an effective HardFault handler to extract useful information.
1
Step 1: Implement an Effective HardFault Handler
The default HardFault_Handler is usually an infinite loop while(1);
. We need to replace it with code that can capture and report fault information.
In your project (usually in stm32xxxx_it.c
or a similar file), find the HardFault_Handler function and replace or modify it with the following code:
// Define a structure to store the register values extracted from the stack
typedef struct {
uint32_t r0;
uint32_t r1;
uint32_t r2;
uint32_t r3;
uint32_t r12;
uint32_t lr; // Link Register
uint32_t pc; // Program Counter
uint32_t psr; // Program Status Register
} HardFaultRegs_t;
// Global variable for viewing in the debugger
volatile HardFaultRegs_t stacked_regs;
volatile uint32_t cfsr_val;
volatile uint32_t hfsr_val;
volatile uint32_t dfsr_val;
volatile uint32_t afsr_val;
volatile uint32_t mmfar_val;
volatile uint32_t bfar_val;
volatile uint32_t stacked_sp; // Save the value of the stack pointer itself
// HardFault handler function
// Use __attribute__((naked)) to avoid compiler-generated stack operations
void HardFault_Handler(void) __attribute__((naked));
void HardFault_Handler(void) {
// Get the currently used stack pointer (MSP or PSP)
// TST LR, #4 tests bit 2 of LR (EXC_RETURN's bit 2)
// If bit 2 is 1, it indicates PSP is used; otherwise, MSP is used
__asm volatile(
" TST LR, #4\n" // Test bit 2 of LR: 0 = MSP, 1 = PSP
" ITE EQ\n" // If-Then-Else based on EQ flag (result of TST)
" MRSEQ R0, MSP\n" // EQ=1 (bit 2 is 0): Use MSP, move MSP to R0
" MRSNE R0, PSP\n" // NE=0 (bit 2 is 1): Use PSP, move PSP to R0
" MOV %0, R0\n" // Move the selected stack pointer to the C variable 'stacked_sp'
: "=r" (stacked_sp) // Output operand: stacked_sp C variable
: // Input operands: none
: "r0" // Clobbered registers: R0 is used internally
);
// Load register values from the obtained stack pointer into the structure
// stacked_sp now points to the location of R0
stacked_regs.r0 = *((volatile uint32_t*)(stacked_sp + 0));
stacked_regs.r1 = *((volatile uint32_t*)(stacked_sp + 4));
stacked_regs.r2 = *((volatile uint32_t*)(stacked_sp + 8));
stacked_regs.r3 = *((volatile uint32_t*)(stacked_sp + 12));
stacked_regs.r12 = *((volatile uint32_t*)(stacked_sp + 16));
stacked_regs.lr = *((volatile uint32_t*)(stacked_sp + 20));
stacked_regs.pc = *((volatile uint32_t*)(stacked_sp + 24));
stacked_regs.psr = *((volatile uint32_t*)(stacked_sp + 28));
// Read the fault status registers
cfsr_val = (*((volatile uint32_t*)0xE000ED28));
hfsr_val = (*((volatile uint32_t*)0xE000ED2C)); // Note: HFSR address is 0xE000ED2C
dfsr_val = (*((volatile uint32_t*)0xE000ED30));
afsr_val = (*((volatile uint32_t*)0xE000ED3C));
// Check if MMFAR and BFAR are valid and read them
if (cfsr_val & (1 << 7)) { // MMARVALID bit in MMFSR
mmfar_val = (*((volatile uint32_t*)0xE000ED34));
} else {
mmfar_val = 0xFFFFFFFF; // Invalid
}
if (cfsr_val & (1 << 15)) { // BFARVALID bit in BFSR
bfar_val = (*((volatile uint32_t*)0xE000ED38));
} else {
bfar_val = 0xFFFFFFFF; // Invalid
}
// Here you can add code to print these variable values via UART, SWO, or other means
// printf("HardFault!\n");
// printf("SP = 0x%08X\n", stacked_sp);
// printf("R0 = 0x%08X\n", stacked_regs.r0);
// ... (print other registers)
// printf("PC = 0x%08X\n", stacked_regs.pc); // Address of the next instruction after the fault
// printf("LR = 0x%08X\n", stacked_regs.lr);
// printf("PSR= 0x%08X\n", stacked_regs.psr);
// printf("CFSR=0x%08X\n", cfsr_val);
// printf("HFSR=0x%08X\n", hfsr_val);
// printf("MMFAR=0x%08X\n", mmfar_val);
// printf("BFAR=0x%08X\n", bfar_val);
// Set a breakpoint here, or enter an infinite loop waiting for the debugger to connect
__asm volatile("BKPT #0\n"); // Software breakpoint
// Or
// while(1);
}
Note:
- __attribute__((naked)) tells the compiler not to generate function entry and exit code (like push/pop), as we need precise control over the stack pointer.
- The volatile keyword ensures that the compiler does not optimize away reads and writes to these variables.
- The code includes assembly instructions to read MSP or PSP.
- You need to add code to print information based on your project configuration (like UART initialization).
- Finally, using
BKPT #0
can trigger a software breakpoint when a HardFault occurs, allowing the debugger to stop in HardFault_Handler for easy variable inspection.
2
Step 2: Reproduce HardFault and Analyze with Debugger
Compile and download the code containing the above HardFault_Handler to the target board.
Connect the debugger (such as ST-Link, J-Link).
Run the code until HardFault occurs. If BKPT #0
is set, the program will automatically stop at the breakpoint. If no breakpoint is set and the handler ends with while(1);
, manually pause the program after HardFault occurs; the program counter should stop in the while(1);
loop.
Check variable values in the debugger’s Watch window or Memory window to view the values of stacked_regs, cfsr_val, hfsr_val, mmfar_val, bfar_val, etc.
3
Step 3: Interpret Fault Information
Analyze CFSR
MMFSR (bits [7:0]):
- IACCVIOL (bit 0): Instruction access violation (e.g., fetching instructions from XN region).
- DACCVIOL (bit 1): Data access violation (e.g., writing to read-only area).
- MUNSTKERR (bit 3): MemManage Fault stack error on exception return.
- MSTKERR (bit 4): MemManage Fault stack error on exception entry.
- MLSPERR (bit 5): MemManage Fault during floating-point lazy state saving.
- MMARVALID (bit 7): Address in MMFAR is valid.
BFSR (bits [15:8]):
- IBUSERR (bit 8): Bus error caused by instruction prefetch.
- PRECISERR (bit 9): Precise data bus error. BFAR is valid.
- IMPRECISERR (bit 10): Imprecise data bus error. BFAR is invalid. Usually caused by write buffers or caches, with a delay between the error point and the reporting point.
- UNSTKERR (bit 11): BusFault stack error on exception return.
- STKERR (bit 12): BusFault stack error on exception entry.
- LSPERR (bit 13): BusFault during floating-point lazy state saving.
- BFARVALID (bit 15): Address in BFAR is valid.
UFSR (bits):
- UNDEFINSTR (bit 16): Executed an undefined instruction.
- INVSTATE (bit 17): Attempted to enter an invalid state (e.g., executing ARM instruction).
- INVPC (bit 18): Invalid PC load (e.g., attempting to jump to an address with LSB=0).
- NOCP (bit 19): Attempted to execute coprocessor instruction.
- UNALIGNED (bit 24): An unaligned access occurred (requires CCR.UNALIGN_TRP bit enabled).
- DIVBYZERO (bit 25): Executed a division by zero operation (requires CCR.DIV_0_TRP bit enabled).
Analyze HFSR
- VECTTBL (bit 1): Bus error when reading the vector table (usually occurs during exception handling startup).
- FORCED (bit 30): Indicates that the HardFault was caused by a configurable fault (MemManage, BusFault, UsageFault) that escalated because its handler was disabled or a new fault occurred during handling. Focus on analyzing CFSR in this case.
- DEBUGEVT (bit 31): Indicates that the HardFault was caused by a debug event (e.g., in Halting debug mode).
Analyze MMFAR and BFAR
If MMARVALID or BFARVALID is set, these two registers will tell you the exact address that caused the memory or bus error. Check if this address is within your expected memory range, whether it requires special access permissions (like MPU settings), or if it points to an invalid peripheral address.
Analyze the PC and LR in the stack frame
stacked_regs.pc: This is the address of the next instruction that caused the fault. In the debugger’s disassembly window, jump to PC - 2
or PC - 4
(depending on whether the faulting instruction is a 16-bit or 32-bit Thumb instruction) to see which assembly instruction triggered the error.stacked_regs.lr: Link Register. If the HardFault was caused by a general function call, LR contains the return address.If the HardFault occurred within an interrupt/exception handler, LR will contain a special EXC_RETURN value (e.g., 0xFFFFFFF9, 0xFFFFFFFD, etc.) indicating the processor state and the stack used upon return. This can help determine if the HardFault occurred in an interrupt context.
4
Step 4: Locate and Fix Source Code
Based on the instruction address located in the disassembly window, use the .map file or the debugger’s symbol information to find the corresponding C source code line.
Analyze the causes:
- Null pointer/wild pointer:Check if the address pointed to by MMFAR or BFAR, or the pointer variable accessed by the faulting instruction, is NULL or points to an invalid/freed memory area.
- Array out-of-bounds:Check if the array index exceeds the boundary, leading to access of illegal memory.
- Stack overflow:If the value of stacked_sp is very close to or exceeds the defined stack area boundary, or if PC points to the stack area, it is likely a stack overflow. Check function call depth, local variable sizes, and interrupt nesting. You may try increasing the stack space (defined in
startup_stm32xxxx.s
file). - Unaligned access:Check if there are forced type casts and dereferences of pointers to multi-byte types like
uint16_t
,uint32_t
where the pointer’s address is not a multiple of 2 or 4. For example:uint32_t* p = (uint32_t*)0x20000001; val = *p;
. You can modify the data structure or usememcpy
to avoid this. - Division by zero error:Check if there are any cases in the code where the divisor is zero.
- MPU configuration error:If using MPU, check if the MPU region configuration is correct and if necessary read/write/execute permissions are allowed.
- Accessing invalid peripheral address:Check if BFAR points to an unclocked or nonexistent peripheral register address.
- Interrupt/RTOS issues:If HardFault occurs during interrupt handling or RTOS task switching, the issue may be more complex, possibly involving incorrect interrupt priority configuration, insufficient critical section protection, or too small task stacks. Checking the EXC_RETURN value in LR can help determine the context.
Modify the code based on the identified causes, recompile, download, and run the code to ensure that HardFault no longer occurs.
Click Read the original text for more exciting content~