Locating Code Before HARD FAULT Occurrence

Recently, I was working on a motor drive application using the BGI MCU, and during debugging, I was plagued by a randomly occurring HARD FAULT. The issue was strange; the time it took to enter HARD FAULT was unpredictable. It could be triggered just a few seconds after connecting to USB, or it could take several minutes or even tens of minutes for the program to run before triggering. Although I felt the problem was likely related to the PWM switching code, I initially couldn’t find the code that caused the issue. I searched Baidu but found little inspiration. After some effort, I finally identified the problem and learned something new, so I will first introduce the debugging process.

First, I entered DEBUG mode in KEIL and ran the program. After the MCU “crashed,” I stopped it, clearly stuck in the HARD FAULT’s while(1) loop. Next, I opened the FAULT REPORTS window:

Locating Code Before HARD FAULT Occurrence

We can see that the HARD FAULT was caused by a USAGE FAULT, with the reason being INVSTATE. From the PPT titled “Diagnosis of Common Hard+Faults in STM32,” we learn that INVSTATE indicates the MCU attempted to enter ARM state, which is illegal, thus causing the USAGE FAULT. Additionally, the PPT contains the following description:

Locating Code Before HARD FAULT Occurrence

Moreover, in the authoritative guide for Cortex-M3, it is also noted: be cautious when using BLX, as it also has the function of changing states. Therefore, the LSB of the reg must be 1 to ensure it does not attempt to enter ARM state; if the LSB is not set, a FAULT will occur. Similarly, you must ensure that the value sent to the PC is odd (LSB=1).

Although I understood these points, they still did not provide direct assistance. Next, I found a PDF file named “keil_hardfault” on GOOGLE, written by KEIL. The latter part of the document introduced a method for locating the code before the HARD FAULT occurred through an example. After some exploration, I successfully found the problem location using this method.

Locating Code Before HARD FAULT Occurrence

First, I saw that the MCU was stopped in the HARD FAULT; the address stored in the LR register was the return address of the HARD FAULT. At this point, LR = 0xFFFF_FFF1, which is clearly an erroneous instruction address, so I could determine that the program jumped to an incorrect address, and the LSB of that address is 0, thus triggering the USAGE FAULT. The error flow was as follows:

[Correct Code] –(Error)–> [0xFFFF_FFF0]. Next, I needed to find the code that caused the erroneous jump.

Noticing the SP (R13) register, it points to the current stack top. I entered the SP value [0x2000A190] in the MEMORY window:

Locating Code Before HARD FAULT Occurrence

According to the CM3 core, the stack grows downwards, and registers are pushed onto the stack in descending order. This way, I could find the register status at the time of the incident.

This diagram in the keil_hardfault.pdf provides a more intuitive view:

Locating Code Before HARD FAULT Occurrence

We can see the register status before jumping to the HARD FAULT interrupt function, for example, R0 = 0000_0066, R2 = FFFF_FFFF.

At this point, we need to focus on the LR register, which points to the next instruction of the erroneous jump instruction. The stack shows LR = 0x0800_04FF, so the erroneous instruction is LR = 0x0800_04FC.

In the assembly window, I located this address as shown in the figure below:

Locating Code Before HARD FAULT Occurrence

We can see that the instruction is BLX R0, and the value of R0 at that time was 0x0000_0066, which is clearly an incorrect jump address. Having found the instruction address, we can also view the corresponding C code, and it is easy to guess that the problem lies in an array out of bounds, with the cause being the value of EPindex. I then added some debugging code here and easily confirmed that the issue was caused by the value of EPindex being 0.

As for what happened next, since I found the code that caused the HARD FAULT, I could finally take a breath of relief. Overall, this method was indeed very effective in addressing this problem.

Leave a Comment