This article introduces a precise delay method in the Cortex-M kernel
Introduction
Why learn this delay method?
-
Often when we run an operating system, we typically occupy a hardware timer—SysTick, and the clock tick of our operating system is generally set at 100-1000HZ, meaning an interrupt occurs every 1ms to 10ms. Many bare-metal tutorials use delay functions based on SysTick, which inevitably leads to conflicts.
-
Many people might say, aren’t there timers? Timers are super precise. I don’t deny this, but hypothetically, if a system constantly enters timer interrupts (every 10us/1us/0.5us), the entire system will frequently be interrupted, and threads won’t be able to run smoothly. Additionally, it consumes a hardware timer resource, which could be used for other tasks!
-
Regarding modifications to the ST HAL library, I personally feel that while ST’s products are good, their HAL library is quite cumbersome. However, HAL_Delay(), which also uses SysTick for delays, creates many inconveniences when porting operating systems. Fortunately, HAL_Delay() is a weakly defined function, and we can rewrite its implementation. Thus, using kernel delays is undoubtedly the best method (that’s my opinion). Of course, if you have the ability to write a simple delay using a for loop, that’s also fine.
-
I may not be authoritative, so I will quote a sentence from the Cortex-M3 authoritative guide—”DWT has remaining counters typically used for performance profiling of program code. By programming them, they can emit events when the counter overflows (in the form of tracking packets). The most typical use is to measure the number of cycles taken to execute a task using the CYCCNT register, which can also be used for time-related purposes (like calculating CPU usage in an operating system).”
DWT in Cortex-M
In Cortex-M, there is a peripheral called DWT (Data Watchpoint and Trace) used for system debugging and tracing.

It has a 32-bit register called CYCCNT, which is an up-counter that records the number of kernel clock cycles. Each time the kernel clock ticks, this counter increments by 1. The precision is extremely high, determined by the kernel frequency. If it’s the F103 series, the kernel clock is 72M, making the precision 1/72M = 14ns. Given that program execution times are in microseconds, a precision of 14ns is more than sufficient. The maximum time that can be recorded is: 60s=2^32/72000000 (assuming the kernel frequency is 72M, the time for one kernel tick is approximately 1/72M=14ns). For H7 chips with a 400M main frequency, the timing precision reaches 2.5ns (1/400000000 = 2.5), and for powerful processors like i.MX RT1052, the maximum recordable time is: 8.13s=2^32/528000000 (assuming the kernel frequency is 528M, the time for one kernel tick is approximately 1/528M=1.9ns). When CYCCNT overflows, it resets to 0 and starts counting up again.

m3, m4, m7 have been tested and can be used (m0 is unknown). Precision: 1/kernel frequency (s).
To achieve the delay function, three registers are involved: DEMCR, DWT_CTRL, and DWT_CYCCNT, which are used to enable DWT functionality, enable CYCCNT, and obtain system clock counts respectively.
DEMCR
To enable the DWT peripheral, control bit 24 of another kernel debug register, DEMCR, must be set to 1 (important note, this might come up in exams!!). The address of DEMCR is 0xE000 EDFC


About DWT_CYCCNT
Before enabling the DWT_CYCCNT register, clear it to 0. Let’s look at the base address of DWT_CYCCNT, which can be found in the ARM Cortex-M manual, where its base address is 0xE000 1004. The default reset value is 0, and it is readable and writable. Writing 0 to 0xE000 1004 clears DWT_CYCCNT.

About CYCCNTENA
CYCCNTENA enables the CYCCNT counter. If not enabled, the counter does not count, and no event is generated for PS sampling or CYCCNTENA. Normally, the debugger must initialize the CYCCNT counter to 0.
It is the first bit of the DWT control register; writing 1 enables it, otherwise the CYCCNT counter will not work.

In summary
Steps to use DWT’s CYCCNT:
First, enable the DWT peripheral, controlled by bit 24 of another kernel debug register, DEMCR; write 1 to enable.
Before enabling the CYCCNT register, clear it to 0.
Enable the CYCCNT register, which is controlled by DWT’s CYCCNTENA, i.e., bit 0 of the DWT control register; write 1 to enable.
Code Implementation
Note: All rights to explain this code belong to 【®野火】
1/**
2 ******************************************************************
3 * @file core_delay.c
4 * @author fire
5 * @version V1.0
6 * @date 2018-xx-xx
7 * @brief Use kernel registers for precise delays
8 ******************************************************************
9 * @attention
10 *
11 * Experimental platform: 野火 STM32 development board
12 * Forum :http://www.firebbs.cn
13 * Taobao :https://fire-stm32.taobao.com
14 *
15 ******************************************************************
16 */
17
18#include "./delay/core_delay.h"
19
20/*
21**********************************************************************
22* Timestamp related register definitions
23**********************************************************************
24*/
25/*
26 In Cortex-M, there is a peripheral called DWT (Data Watchpoint and Trace),
27 this peripheral has a 32-bit register called CYCCNT, which is an up-counter,
28 that records the number of kernel clock cycles, the maximum time that can be recorded is:
29 10.74s=2^32/400000000
30 (assuming the kernel frequency is 400M, the time for one kernel tick is approximately 1/400M=2.5ns)
31 When CYCCNT overflows, it resets to 0 and starts counting up again.
32 Steps to enable CYCCNT counting:
33 1. First, enable the DWT peripheral, controlled by bit 24 of another kernel debug register, DEMCR; write 1 to enable
34 2. Clear the CYCCNT register to 0 before enabling it
35 3. Enable the CYCCNT register, controlled by DWT_CTRL (macro defined as DWT_CR) bit 0; write 1 to enable
36 */
37
38
39#define DWT_CR *(__IO uint32_t *)0xE0001000
40#define DWT_CYCCNT *(__IO uint32_t *)0xE0001004
41#define DEM_CR *(__IO uint32_t *)0xE000EDFC
42
43
44#define DEM_CR_TRCENA (1 << 24)
45#define DWT_CR_CYCCNTENA (1 << 0)
46
47
48/**
49 * @brief Initialize timestamp
50 * @param None
51 * @retval None
52 * @note This function must be called before using the delay function
53 */
54HAL_StatusTypeDef HAL_InitTick(uint32_t TickPriority)
55{
56 /* Enable DWT peripheral */
57 DEM_CR |= (uint32_t)DEM_CR_TRCENA;
58
59 /* Clear DWT CYCCNT register to 0 */
60 DWT_CYCCNT = (uint32_t)0u;
61
62 /* Enable Cortex-M DWT CYCCNT register */
63 DWT_CR |= (uint32_t)DWT_CR_CYCCNTENA;
64
65 return HAL_OK;
66}
67
68/**
69 * @brief Read current timestamp
70 * @param None
71 * @retval Current timestamp, i.e., the value of DWT_CYCCNT register
72 */
73uint32_t CPU_TS_TmrRd(void)
74{
75 return ((uint32_t)DWT_CYCCNT);
76}
77
78/**
79 * @brief Read current timestamp
80 * @param None
81 * @retval Current timestamp, i.e., the value of DWT_CYCCNT register
82 */
83uint32_t HAL_GetTick(void)
84{
85 return ((uint32_t)DWT_CYCCNT/SysClockFreq*1000);
86}
87
88
89/**
90 * @brief Use the CPU's internal counter for precise delay, 32-bit counter
91 * @param us : Delay length, unit 1 us
92 * @retval None
93 * @note This function must call CPU_TS_TmrInit function to enable the counter,
94 or enable macro CPU_TS_INIT_IN_DELAY_FUNCTION
95 The maximum delay value is 8 seconds, i.e., 8*1000*1000
96 */
97void CPU_TS_Tmr_Delay_US(uint32_t us)
98{
99 uint32_t ticks;
100 uint32_t told,tnow,tcnt=0;
101
102 /* Initialize timestamp register inside the function */
103#if (CPU_TS_INIT_IN_DELAY_FUNCTION)
104 /* Initialize timestamp and clear */
105 HAL_InitTick(5);
106#endif
107
108 ticks = us * (GET_CPU_ClkFreq() / 1000000); /* Required tick count */
109 tcnt = 0;
110 told = (uint32_t)CPU_TS_TmrRd(); /* Counter value when just entered */
111
112 while(1)
113 {
114 tnow = (uint32_t)CPU_TS_TmrRd();
115 if(tnow != told)
116 {
117 /* 32-bit counter is an incrementing counter */
118 if(tnow > told)
119 {
120 tcnt += tnow - told;
121 }
122 /* Reload */
123 else
124 {
125 tcnt += UINT32_MAX - told + tnow;
126 }
127
128 told = tnow;
129
130 /* If the time exceeds or equals the delay time, exit */
131 if(tcnt >= ticks)break;
132 }
133 }
134}
135
136/*********************************************END OF FILE**********************/
1#ifndef __CORE_DELAY_H
2#define __CORE_DELAY_H
3
4#include "stm32h7xx.h"
5
6/* Get kernel clock frequency */
7#define GET_CPU_ClkFreq() HAL_RCC_GetSysClockFreq()
8#define SysClockFreq (218000000)
9/* For convenience, the timestamp register is initialized in the delay function,
10 so every time the function is called, it initializes again.
11 Set this macro value to 0, and call CPU_TS_TmrInit when the main function just runs to avoid initializing every time */
12
13#define CPU_TS_INIT_IN_DELAY_FUNCTION 0
14
15
16*******************************************************************************
17 * Function declarations
18 ******************************************************************************/
19uint32_t CPU_TS_TmrRd(void);
20HAL_StatusTypeDef HAL_InitTick(uint32_t TickPriority);
21
22// Must call CPU_TS_TmrInit function to enable the counter before using the following functions, or enable macro CPU_TS_INIT_IN_DELAY_FUNCTION
23// Maximum delay value is 8 seconds
24void CPU_TS_Tmr_Delay_US(uint32_t us);
25#define HAL_Delay(ms) CPU_TS_Tmr_Delay_US(ms*1000)
26#define CPU_TS_Tmr_Delay_S(s) CPU_TS_Tmr_Delay_MS(s*1000)
27
28
29#endif /* __CORE_DELAY_H */
Notes:
If the user is not using the HAL library, comment out:
1uint32_t HAL_GetTick(void)
2{
3 return ((uint32_t)DWT_CYCCNT/SysClockFreq*1000);
4}
It is also recommended to rename the HAL_InitTick()
function.
Rewrite the following macro definitions according to your platform:
1/* Get kernel clock frequency */
2#define GET_CPU_ClkFreq() HAL_RCC_GetSysClockFreq()
3#define SysClockFreq (218000000)
Postscript
In fact, in the ucos-iii source code, there is a function to measure interrupt-off time, which uses the STM32 timestamp, i.e., recording a certain moment of program execution. If the moments before and after program execution are recorded, the execution time of that segment can be calculated. However, there is very little information regarding kernel registers; luckily, I found one (ARM manual) that has detailed descriptions of these kernel registers, with timestamps-related registers described in detail in chapters 10 and 11. If you want to see the information, feel free to contact me.
To obtain the source code: Follow the public account “IoT Development”, and reply “DWT” in the background.
PS: If you find these articles good, feel free to share them; of course, it’s not mandatory. It’s all valuable content, and I hope we can all be on the path of sharing generously~
You might also like:
Partition management for FLASH to better store our data~
Have you learned these typical examples of custom protocols?
Interview question: Several methods to verify endianness in C language