Project Background: A “Random Crash” Bug Caused the Loss of Over 200 Chips, Exceeding 80,000
At that time, we were developing a motor control board based on STM32F407. After mass production and programming, we found that:
- Some boards were unresponsive after startup
- Some crashed after running for a while without restarting or output
- Some got stuck in interrupts after power failure and restart
We suspected it was due to Flash write failures or abnormal clock during power loss, but after several days, we couldn’t pinpoint the issue.
Worse still, to “test” it, we repeatedly programmed and debugged, resulting in the damage of over 200 chips (frequent erase/write + unstable power), leading to a direct loss of over 80,000.
Finally, we switched to three sets of tools, upgrading from “blind guessing + logging” to:
✅ J-Link hardware breakpoints + Trace points✅ STM32CubeMonitor for real-time variable monitoring✅ ITM Trace for analyzing abnormal processes
The speed of locating issues increased tenfold, and all subsequent problems were resolved using this combination.
1. J-Link Hardware Breakpoints: Much More Stable Than Software Breakpoints, Precise Without Freezing
⚠️ Cause: Crashes Occurred in Flash, Regular Breakpoints Could Not Be Set
Initially, we used ST-Link + Keil for breakpoint debugging, but found that:
- Some function breakpoints had no response
- Crashes occurred during Flash write interrupt → breakpoints set directly caused exceptions
- Some crash points could not be reproduced at all
✅ Switching to J-Link + Hardware Breakpoints Immediately Improved Stability
J-Link supports hardware breakpoints (not relying on Flash coverage):
c
JLinkGDBServer + VSCode/Keil/IAR
We locked onto several key functions:
c
// Flash write function
HAL_FLASH_Program(...)
// RTC wakeup interrupt
HAL_RTCEx_WakeUpTimerIRQHandler()
// System exception handler
HardFault_Handler()
After setting breakpoints with J-Link, we could accurately jump in even during Flash writes and nested interrupts.
Later, we indeed found an illegal parameter write in
<span>HAL_FLASH_Program()</span>, which caused the Flash to fail.
2. STM32CubeMonitor: Real-Time Variable Monitoring, Viewing Values Without Setting Breakpoints
⚠️ Cause: Variables Suddenly Changed Before Crashes, but logs couldn’t be recorded in time
We had a task that periodically wrote to Flash based on a flag:
c
if (write_flag == 1) {
SaveConfig();
write_flag = 0;
}
Occasionally, Flash write crashes occurred, and we suspected the <span>write_flag</span> state was abnormal, but we couldn’t see it in the logs.
✅ Using CubeMonitor to Directly Monitor Variables
STM32CubeMonitor is ST’s official debugging tool, supporting:
- Real-time reading of MCU internal variables (via SWD)
- Graphing and setting threshold alarms
- Multi-variable synchronous monitoring, viewing values without setting breakpoints
Configuration Steps:
- Connect the device using J-Link or ST-Link
- Add a Session and select the ELF file
- Add variable monitoring:
<span>write_flag</span>,<span>flash_state</span>,<span>error_code</span> - Set refresh frequency (default 1Hz, can be set faster)
We later discovered:
<span>write_flag</span>was erroneously set to 0xFF when a certain frame of data arrived (memory out-of-bounds write), causing Flash write anomalies.
3. Trace Instruction Flow Analysis: Knowing Exactly Where the System “Died” Without Guessing
⚠️ Cause: Some Crashes Never Enter HardFault, the Program Just “Stalls”
We had a batch of boards that crashed without even entering <span>HardFault_Handler</span>, LEDs not lighting, no logs output, suspecting the CPU was stuck in an abnormal state.
✅ Using J-Link Trace + ITM to Capture Instruction Flow
We connected J-Link Pro + SWO line and opened ITM Trace:
c
J-Link SWO Viewer / Ozone / Keil Event Recorder
Configuration Steps:
- Enable SWO output (check in CubeMX)
- Configure
<span>printf</span>to redirect to ITM - Use
<span>DWT->CYCCNT</span>to capture instruction execution time
We printed the key function paths as follows:
c
ITM_SendChar('A'); // Initialization
ITM_SendChar('B'); // Entering task
ITM_SendChar('C'); // Before Flash write
ITM_SendChar('D'); // Flash completed
The crashed boards traced stopped between <span>B</span> → <span>C</span>, indicating the crash occurred in the Flash function.
Further investigation revealed:
During the write, it was interrupted by an RTC interrupt → Flash was not locked, leading to an illegal address write on the next attempt → crash
4. Final Debugging Combination: We Now Rely on These Three “Weapons”
| Tool | Purpose | Scenario |
|---|---|---|
| J-Link Hardware Breakpoints | Precise debugging of interrupts/Flash segments | Software breakpoints ineffective/crashes |
| CubeMonitor | Real-time variable monitoring | No breakpoints, no logs |
| Trace (ITM/SWO) | Reproducing abnormal execution paths | Can check even if the program did not enter exception handling |
Our debugging process has now become:
- First, use CubeMonitor to check for abnormal variable changes
- Then use Trace to track the program path and confirm the crash location
- Finally, use J-Link breakpoints to accurately locate variable values + stack status
The debugging speed has improved at least 10 times; what used to rely on guessing is now all about “seeing”.
Conclusion: Debugging Difficulty ≠ Problem Difficulty, but Rather Incorrect Tool Usage
| Problem | Conventional Method | Improved Method |
|---|---|---|
| Hard to find crash location | Logs cannot be seen | J-Link Trace tracking |
| Cannot enter interrupts | Software breakpoints ineffective | Hardware breakpoints |
| Variable sudden changes leave no trace | Logs cannot be output | CubeMonitor real-time monitoring |