How to Locate HardFault in FreeRTOS

Source: Official WeChat Account 【Osprey Talks Microcontrollers】

Author: Osprey

ID ：emOsprey

Hello everyone, I am Osprey. Due to some matters, this update is a bit late. But I still strive for everyone to learn some practical technology from Osprey’s official account every time, improving their core competitiveness.

Thank you all for your continued support of Osprey.

Today, I will continue to talk about the common HardFault encountered during development. This issue has been with us since we started learning STM32 development, and many people encounter this problem without knowing how to locate it.

If you are developing independently and encounter this issue, generally, you just need to look at the code, modify the code, etc. These conventional methods are effective because you are most familiar with the code you wrote, and the changes are generally not too large, making it easier to narrow down the scope and easier to locate.

However, as products become more complex, the current development model is collaborative. Each person is responsible for their own module, leading to large amounts of code and high complexity, making it even more difficult to locate issues.

Sometimes, when you just join a company, you are unfamiliar with the code, and HardFault appears, it can be quite frustrating, making you feel like running away at any moment (you can run away from the code, just one of you needs to get away).

At this point, having a big shot who can solve such tricky problems can save a lot of time. I have solved quite a few similar problems at my company, so I have rich experience and play this type of role.

Osprey’s method for locating HardFault generally relies on KEIL online debugging + C language + authoritative guides.

Currently, Osprey’s bug-fixing process is roughly as follows:

1. For issues that can be reproduced, if familiar with the code, it can be resolved within a few hours.

2. For sporadic issues, the time to solve the problem is determined based on the occurrence situation. Generally, if it occurs four or five times, it can basically be located.

3. For hard-to-reproduce issues, you usually need to attach a recorder to record the runtime situation in real-time.

After going through so much, there are very few HardFault problems that Osprey needs to spend several days to solve (I still remember when I first arrived in Shenzhen, a HardFault caused by a bug written by someone else forced me to work several nights in a row. If it weren’t for a chance opportunity, I might not have been able to resolve it).

Here’s a small advertisement: if you find it hard to solve, you can hire Osprey to solve the HardFault problem.

However, recently at work, I have been using C++, which I am not very familiar with, and the speed of resolving HardFault has decreased. Additionally, the engineering compilation optimization level -O2 has increased the debugging difficulty, so mastering the following methods is very important:

Summary of several compilation optimization settings for MDK

Regarding HardFault, Osprey has previously shared quite a few notes, but I wonder how many people have seriously read them.

HardFault INVSTAE error location (1)

What the hell, after returning from the New Year, the board just hardfaulted?

Today, Osprey continues to share methods for locating HardFault in FreeRTOS.

Here we need a component written by a big shot: CmBacktrace (In fact, if online debugging is possible, Osprey does not need to rely on this component, but it is still quite useful in hard-to-reproduce situations).

Gitee repository: https://gitee.com/Armink/CmBacktrace

This component is probably known and used by many friends, but Osprey wants to say that some friends may be using older components that do not have the following tracking function, so it’s recommended to update.

As you can see, when an error occurs, the function call stack (sometimes it may be incorrect and needs actual analysis, only for reference).

_call_main ->  main -> fult_test_by_div0

This is quite practical.

Moreover, this note is not only applicable for locating HardFault in FreeRTOS, but can also be modified for use in other RTOS such as uCOS, rt-thread, etc. (bare metal is even easier).

The repository example supports platforms: bare metal, rt-thread, ucoss-ii, freertos.

The focus here is on how to port this component to freertos (In fact, the repository’s documentation is also very detailed and can be referenced). Since freertos is also constantly being updated, this component’s example may not be completely applicable to the new version, and Osprey just happened to have completed the porting, so I’m recording it here for everyone’s convenience.

1 Copy the entire folder of cm_backtrace (source code files) from the repository to your own project folder.

2 Add these files in your project (we can open demos -> os -> freertos project to view).

There are only two files, quite simple.

One is the core source code, and the other is the assembly code, which is the entry point of the code execution.

Note: Depending on the IDE, the selected assembly file may differ:

In fact, it just relocates the hardfault default handler from startup_stm32f10x_hd.s to cmb_fault.S.

Here, there is a weak, so during linking, it will not link this, but rather cmb_fault.S :

To facilitate problem location, we need to modify this code a bit.

Note: If the hardfault code in your startup file has been modified and you do not understand assembly, it is recommended to restore it as above; otherwise, it may not run properly.

3. Initialization code in the main function.

The string here needs to be the same (modify according to your project name):

Therefore, it is recommended to create the project in English. This is useful when outputting error messages; otherwise, you need to modify it every time you check the call stack, which is quite troublesome.

If the internal watchdog is enabled, it is recommended to disable it:

// HAL 库__HAL_DBGMCU_FREEZE_IWDG1();// 标准库DBGMCU_Config(DBGMCU_IWDG_STOP, ENABLE);

Add the following function cm_backtrace_assert where the assertion fails:

This way, if the assertion fails, you can also see the call stack.

4. Modify FreeRTOS kernel files (kernel version V10.2.1)

To analyze the erroneous code, we must know the stack information of each task, and FreeRTOS may not have this information, so we need to add it.

task.c

FreeRTOS.h

Note that in older versions of FreeRTOS, you only need to modify one place, but in the new version, you need to modify two places; otherwise, it will assert failure and cannot run.

It is recommended to add comments along with the modifications.

UBaseType_t     uxSizeOfStack;      /*&lt; Support For CmBacktrace &gt;*/

Related function modifications task.c prvInitialiseNewTask():

The task.c file finally adds the following code to obtain the stack address, size, and name:

For convenience of copying, I will post the code here.

/*-----------------------------------------------------------*//*&lt; Support For CmBacktrace &gt;*/uint32_t * vTaskStackAddr(){    return pxCurrentTCB-&gt;pxStack;}uint32_t vTaskStackSize(){    #if ( portSTACK_GROWTH &gt; 0 )        return (pxNewTCB-&gt;pxEndOfStack - pxNewTCB-&gt;pxStack + 1);        #else /* ( portSTACK_GROWTH &gt; 0 )*/        return pxCurrentTCB-&gt;uxSizeOfStack;        #endif /* ( portSTACK_GROWTH &gt; 0 )*/}char * vTaskName(){    return pxCurrentTCB-&gt;pcTaskName;}/*-----------------------------------------------------------*/

5. Modify component configuration information based on the RTOS platform and chip core

cmb_cfg.h

1) You need to define the print output function, generally using printf to print, or you can use your custom print functions that are similar in functionality to printf.

#define cmb_println(...)               printf(__VA_ARGS__);printf("\r\n")

2) Enable RTOS support

#define CMB_USING_OS_PLATFORM

3) Specify RTOS as FreeRTOS

#define CMB_OS_PLATFORM_TYPE           CMB_OS_PLATFORM_FREERTOS

4) Chip core according to the actual selection, currently supports M0, M3, M4,M7.

#define CMB_CPU_PLATFORM_TYPE          CMB_CPU_ARM_CORTEX_M3

5) Print virtual stack, which can print the original stack information when an error occurs, which may help with analysis.

#define CMB_USING_DUMP_STACK_INFO

6) Language support: English. It actually also supports Chinese, but it is recommended to use English (not configured, defaults to English).

#define CMB_PRINT_LANGUAGE             CMB_PRINT_LANGUAGE_ENGLISH

7) If compiled with C++, there may be errors, you can define this at the beginning:

#define __CLANG_ARM

7. Modify the component as needed to facilitate use (let’s see if there is an opportunity to merge these into the big shot’s branch).

1) Since the functionality involved is small, you can change the related header file inclusion form to this, so you don’t need to modify the header file path, making it easier to port:

#include &lt;cm_backtrace.h&gt;--&gt;&gt;#include "./cm_backtrace.h" #include &lt;cmb_cfg.h&gt;--&gt;&gt;#include "./cmb_cfg.h" #include "cmb_def.h"--&gt;&gt;#include "./cmb_def.h"

In main, you also do not need to include the header file, but directly declare this function where needed, because externally, only this function needs to be called

#include &lt;cm_backtrace.h--&gt;void cm_backtrace_init(const char *firmware_name, const char *hardware_ver, const char *software_ver);

In this way, you don’t need to add the header file path anymore.

Alternatively, you can add the header file using a relative path:

#include "../../driver/cm_backtrace/cm_backtrace.h"

Additionally, we can let the program automatically stop before entering Hardfault, which allows us to better utilize online debugging code, “What exactly is a software breakpoint?”.

HardFault_Handler    PROC    LDR     r0, =0xE000EDF0; DEMCR    LDR     r0,[r0,#0x00]    AND     r0,r0,#0x00000001    CBZ     r0,not_in_debug    BKPT    0not_in_debug    MOV     r0, lr                  ; get lr    MOV     r1, sp                  ; get stack pointer (current is MSP)    BL      cm_backtrace_fault

Because the information is most complete when entering Hardfault, and you don’t want to set a breakpoint every time, the above code achieves this function very well and will not affect the normal operation of the program (it will automatically determine whether it is in debugging mode).

8. Experiment.

Once everything above is done, you can verify the effect. Here we can simulate and see the situation. (Modify project configuration; Osprey has shared this content before, so I won’t elaborate).

After running the repository example, it should print the following information, telling us that a div 0 error occurred.

Your ported project should also print similar information (add test code: fault_test_by_div0();). If it does not print, there are two possibilities:

1. The print function was not initialized properly before entering Hardfault.

2. There is a problem with the print function.

Afterwards, we copy the last line and run the tool in the repository tools folder called add2line to see the call stack information:

In git bash, it may fail to execute. You can add the program path, or you can add the tool path to the Windows environment variable. It is possible to prompt that the axf file is not found; copying this file to the tool folder will suffice.

The correct approach is to place this tool in the C drive directory and add an environment variable, then you can open gitbash or cmd window in the axf directory to execute the command.

This is just for demonstration, so I won’t elaborate on these.

Finally, let me briefly introduce the implementation principle of this component:

If a hardfault occurs, it first enters the assembly file’s HardFault_Handler for processing, where it retrieves the current stack pointer and LR, and determines which stack the error occurred on based on LR (here is the PSP stack).

Based on the error register information, it determines the type of error (here is the division by 0 error).

Then it analyzes the stack information and LR, PC based on the assembly code on FLASH to find possible jump instructions. The two jump addresses found are 0x08001f96 and 0x08000368, thus obtaining the call stack.

Therefore, to accurately obtain the call stack, there are two important preconditions (it is recommended to use optimization level -O0):

1. The stack is not corrupted.

2. The chip runs code that is consistent with the axf file.

Even so, you cannot guarantee that the call stack you find is correct; for example, if you use fault_test_by_unalign() for testing, the result is as follows:

There is an extra fputc in the middle.

Therefore, this printed information can only be used for reference.

However, online debugging is different; it is more professional and less likely to produce incorrect calling relationships.

What Osprey wants to share ends here. See you next time!

Related posts

Leave a Comment Cancel reply