Follow+Star Public Account Number, don’t miss the wonderful content
Source | MultiMCU EDU
Typically, the RTOS system tick is 1KHz, of course, there are cases of 100Hz or 10KHz.
At 1KHz, the shortest system delay is 1ms, and in real-time control, there are situations that require microsecond (us) level delays. What should we do?
There are two approaches to achieve microsecond-level delays:
-
1. Increase the system clock
-
2. Use the MCU’s high-precision timer
1. Increase the system clock
The reason it is said to be “increasing” is: the faster the system clock, the more thread scheduling instances per unit time, which means that the time spent on scheduling will significantly increase, which is detrimental to the thread’s functionality. The actual work is done by the thread function; if the CPU could speak, it would be extremely dissatisfied with the overly rapid thread scheduling. The thread is what the CPU needs to do. If the CPU is called to do something, but is pulled away before finishing, it would say: “Fool, are you crazy? Didn’t you ask me to do the work? Why do you keep dragging me around instead of letting me finish my task?”
2. Use the MCU’s on-chip peripheral timer
Generally, MCUs have on-chip high-precision timer peripherals that can be configured to 1us precision. Since we can use a timer, let’s use it, why write an article? Of course, it’s not just as simple as turning on the timer; the RTOS needs to implement blocking delays, and tasks entering a delay must give up CPU usage and enter a blocked state. Using a timer to just wait in an RTOS is a lazy behavior; sleeping and yielding is necessary for good multithreading scheduling.
Although us-level delay times are short, the probability of another thread starting a delay while one thread is in delay is low. However, in a multithreading scenario, delays can still occur, such as when one thread needs to delay for 500us, and just after 100us, another thread needs to delay for 200us. This situation not only leads to re-entrancy but also “time overlap” (200us overlaps with the remaining 400us of the previous thread), and these situations cannot be handled solely by a high-precision hardware timer.
Multithreading Delay Condition Analysis
Let’s look at a multithreading delay condition diagram, as shown:
For easier reading and further design implementation, I’ve added some annotations to the above diagram for a more detailed description of the multithreading conditions, as shown:
To better illustrate, I will use Microsoft Azure RTOS ThreadX as the basis for this design. The goal is to output a general method; the specific RTOS chosen is not important, as long as it supports multithreading, e.g.: RT-Thread, FreeRTOS, etc.
In the diagram, A, B, C, and High-precision Timer are four threads. The High-precision Timer thread has the highest priority, but it is not a timed callback; rather, it is passively triggered. The following explains why the High-precision Timer thread has the highest priority and how it is passively triggered.
We know that when a thread uses WAIT_FOREVER
to wait for a semaphore, if the semaphore’s value is 0, the thread will be suspended on that semaphore. We utilize this characteristic to complete the thread’s “passive trigger,” namely:
1. The initial value of the semaphore is set to 0
2. Release the semaphore once in an interrupt (i.e., increment the semaphore value by 1)
Thus, after the interrupt occurs, it can immediately wake up the thread suspended on that semaphore, completing the thread’s passive trigger. Once the thread is in the ready state, due to its highest priority, it will immediately preempt the scheduler for execution. After the High-precision Timer thread is awakened by the semaphore, it immediately performs a resume operation on the threads whose delay time has expired, thus completing the us delay for the thread.
Let’s review the three threads A, B, and C in the diagram; each line has two circles threaded along it, where the first circle from top to bottom represents the active suspension due to delay, and the second circle indicates the resumption of the thread by the High-precision Timer thread after the time has elapsed.
Thus, the method of reading the diagram has been explained. If we are to implement this in code, there is also a relationship between the “hardware timer and the High-precision Timer thread.” The label on the left side of the High-precision Timer in the diagram indicates that the High-precision Timer thread resumes the threads whose delay time has expired because the hardware timer generated an interrupt. When discussing the “passive trigger,” I mentioned the relevant principles; in fact, the far right of the diagram should also include a column representing the “hardware timer” for better understanding of the principles. The reason for not including it is that we need to consider “re-entrancy,” which is a bit too much to pack into one diagram; having too little makes it incomplete, while too much makes it confusing, so I didn’t draw the “hardware timer” column.
Code Implementation
us-level Timer Configuration
1. Timer Initialization
It’s most convenient to directly use the functions generated by CubeMX without any modifications, as follows:
/** * @brief TIM9 Initialization Function * @param None * @retval None */static void MX_TIM9_Init(void){ /* USER CODE BEGIN TIM9_Init 0 */ /* USER CODE END TIM9_Init 0 */ TIM_ClockConfigTypeDef sClockSourceConfig = {0}; /* USER CODE BEGIN TIM9_Init 1 */ /* USER CODE END TIM9_Init 1 */ htim9.Instance = TIM9; htim9.Init.Prescaler = 215; htim9.Init.CounterMode = TIM_COUNTERMODE_UP; htim9.Init.Period = 65535; htim9.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1; htim9.Init.AutoReloadPreload = TIM_AUTORELOAD_PRELOAD_DISABLE; if (HAL_TIM_Base_Init(&htim9) != HAL_OK) { Error_Handler(); } sClockSourceConfig.ClockSource = TIM_CLOCKSOURCE_INTERNAL; if (HAL_TIM_ConfigClockSource(&htim9, &sClockSourceConfig) != HAL_OK) { Error_Handler(); } /* USER CODE BEGIN TIM9_Init 2 */ /* USER CODE END TIM9_Init 2 */}
Since we need to use the timer’s timing interrupt, we need to set the NVIC; this part of the code is generated by CubeMX in another file, so for convenience, we will merge it with the initialization function as follows:
void bsp_InitHardTimer(void){ __HAL_RCC_TIM9_CLK_ENABLE(); HAL_NVIC_SetPriority(TIM1_BRK_TIM9_IRQn, 0, 0); HAL_NVIC_EnableIRQ(TIM1_BRK_TIM9_IRQn); MX_TIM9_Init();}
Note that this is just to initialize the function; do not enable the timer. According to the design, the timer should only be turned on when the delay function is called by the thread that needs to delay.
2. Function to Start the Timer
void bsp_DelayUS(uint32_t n){ n = (n<=30) ? n : (n-30); HAL_TIM_Base_Stop_IT(&htim9); htim9.Instance->CNT = htim9.Init.Period - n; HAL_TIM_Base_Start_IT(&htim9);}
Here it is important to note that we must “stop before starting”; as mentioned above, in the case of “time overlap,” we must first stop the currently delaying timer.
3. Timer Interrupt Function
/** * @brief This function handles TIM1 break interrupt and TIM9 global interrupt. */void TIM1_BRK_TIM9_IRQHandler(void){ /* USER CODE BEGIN TIM1_BRK_TIM9_IRQn 0 */ /* USER CODE END TIM1_BRK_TIM9_IRQn 0 */ HAL_TIM_IRQHandler(&htim9); /* USER CODE BEGIN TIM1_BRK_TIM9_IRQn 1 */ tx_semaphore_put(&tx_semaphore_delay_us); HAL_TIM_Base_Stop_IT(&htim9); /* USER CODE END TIM1_BRK_TIM9_IRQn 1 */}
This calls the Microsoft Azure RTOS ThreadX API to release the semaphore tx_semaphore_put()
, which has been established during initialization (the code to create the semaphore is omitted).
us Delay Function Interface
TX_THREAD *thread_delay_us;UINT tx_thread_sleep_us(ULONG timer_ticks){ TX_THREAD_GET_CURRENT(thread_delay_us) bsp_DelayUS(timer_ticks); tx_thread_suspend(thread_delay_us); return TX_SUCCESS;}
Here, a global variable thread_delay_us is defined, using TX_THREAD_GET_CURRENT()
to get the thread that calls the us delay, and after starting the timer, the thread is suspended using tx_thread_suspend()
.
High-precision Timer Thread
extern TX_THREAD* thread_delay_us;UINT status;void threadx_task_delay_us_run(ULONG thread_input){ (void)thread_input; while(1){ tx_semaphore_get(&tx_semaphore_delay_us, TX_WAIT_FOREVER); if(thread_delay_us){ status = tx_thread_resume(thread_delay_us); } }}
This also omits the thread creation process and provides the thread body: it completes the thread’s passive trigger with the semaphore tx_semaphore_delay_us
, as well as the resume of the thread_delay_us thread.
Test us-Level Ordinary Timer Callback Thread
#include "pthread.h"VOID *pthread_test_entry(VOID *pthread1_input){ while(1) { //print_task_information(); uint64_t now = get_timestamp_us(); tx_thread_sleep_us(100); printf("delay_us: %lld\r\n", get_timestamp_us() - now); }}
This is a thread created using the POSIX interface API; those interested in POSIX can refer to the article “POSIX Interface of Azure RTOS ThreadX”.
Time Granularity Testing
ThreadX is said to achieve sub-microsecond context switching on a 200MHz MCU, and the time granularity tested by Sugar is relatively stable at 150us. This does not mean that ThreadX performance is poor, but rather that the STM32F7 timer takes about 30us to start and stop, so the timer should not be turned on and off when the precision is less than 30us. However, for our design, we must have the timer’s on and off to handle potential re-entrancy situations.
How do we know that starting and stopping takes 30us? The reason is shown in the diagram:


