
EEWorld
Electronic Information Sharp Interpretation
Technical Dry Goods Daily Updates

Now let’s discuss how to use FreeRTOS services to facilitate microcontroller program design and feel the changes in system processing efficiency after introducing FreeRTOS. I thought of a simple common requirement: the “background” printing output of the serial port (often used for debugging information output during microcontroller development) for the experiment. The serial port (abbreviation for asynchronous serial communication port, UART) is a low-speed device relative to the CPU, with a maximum communication speed of 115200bit/s. At this speed, the time required to output an 8-bit character (assuming 1 start bit, 1 stop bit, and no parity) is 87us. The time required for the CPU to write a character into the serial port’s transmission data register can be ignored, but since the hardware FIFO of the serial port is very small, it is necessary to wait until the transmission data register allows writing before performing the write operation. Therefore, in simple program design, the flag bit is continuously queried, and when the register can be written, a character is written until the string to be sent is completed.
When considering execution efficiency, this approach cannot be used, especially in a multitasking environment, because this loop repeatedly querying the flag bit causes other tasks to wait without utilizing CPU processing power. Therefore, we need a method for “background” output, allowing the current task to submit the string to be output from the serial port to the system, and then continue executing the program without waiting for the string to be sent from the serial port; or the current task waits for the string to be sent, but other tasks can execute.
Below is my experimental code running on the ST Nucleo-L4R5ZI board, which can easily be modified to run on other STM32 boards (but the SRAM of STM32L4R5 is large, making it easy to squander). On this development board, the USB serial port is connected to LPUART1 of the STM32L4R5. I wrote a uart_puts(char *) function to output strings from LPUART1. The TDR register of LPUART1 is used to write the data to be sent, and the sending completion status can be obtained from the ISR register. Of course, to achieve background operation, interrupts are needed.
Instead of repeatedly querying, let an interrupt be generated when the TDR register of LPUART1 can be written to notify the CPU that a character should be written. Since it is background, there must be a storage space to store the string provided by the uart_puts() function parameters (if it can fully accommodate, the function can return); the interrupt service routine needs to know the location of this storage to retrieve characters.
It seems that FreeRTOS queues are just suitable for this function: uart_puts() writes characters to the queue, and the interrupt ISR code reads (it can also wake a task to read, but that is unnecessary) from the queue. Following this idea, I wrote the uart_puts() function as follows:QueueHandle_t uart_tx_queue;void uart_puts(char *str){ for(;;str++) { char ch=*str; if(ch==0) break; else { xQueueSend(uart_tx_queue, &ch, portMAX_DELAY); LPUART1->CR1 |= USART_CR1_TXEIE_TXFNFIE; // enable TXE IRQ } }}The cooperative ISR program is:void LPUART1_IRQHandler(void){ char data; BaseType_t waken=pdFALSE; if(xQueueReceiveFromISR(uart_tx_queue, &data, &waken)==pdPASS) { LPUART1->TDR = data; portYIELD_FROM_ISR(waken); } else // no more data to transmit { LPUART1->CR1 &= ~USART_CR1_TXEIE_TXFNFIE; // disable IRQ }}
This means that when a task calls uart_puts(), it writes characters into the queue and opens the interrupt without needing to care about it, while the ISR only needs to fetch characters from the queue. If there are no more characters to transmit, it will disable the interrupt. Of course, the queue must be created in advance, for example, a queue with a length of 400 characters:
uart_tx_queue = xQueueCreate(400, 1);I also wrote two additional tasks that call uart_puts() to output different text strings, and then used the vTaskDelay() function to create some delays – to ensure that the strings written within a certain time do not exceed the serial port throughput limit. Additionally, to test CPU execution status, I can insert GPIO operation code in the above functions to light up LED indicators or capture with a digital oscilloscope. STM32 runs at the default clock frequency of 4MHz. By observing with an oscilloscope: at 9600 Baud, the serial port throughput capability is mostly utilized. The yellow line in the figure below is the UART TX output signal, and the cyan line is generated by the “lighting” code added in the uart_puts() function, surrounding the xQueueSend() function call.
In the uart_puts() function, adding characters to the queue took about 60us each time. This time is shorter than the 104us required to output one bit of UART. Let’s probe the execution of the ISR, also inserting “lighting” code at the entry and exit of the ISR (not precise, as the time for saving and restoring the stack is not reflected). The interrupt frequency is consistent with the frequency of the characters outputted by the serial port.
Let’s analyze the cost of implementation. I estimated above that adding characters to the queue takes about 60us each time – this is also over two hundred machine cycles. Let’s experiment with changing the serial port Baud to 115200, what effect will it have? I found that the effect is: the queue does not play any role, and the time of calling uart_puts() is synchronized with the serial port output. Strange! Zooming in, it looks like this (the yellow line is the UART TX signal, and the cyan line’s high level represents calling xQueueSend()):
Upon investigating the execution of the ISR, I immediately understood – compared to 9600 Baud, the interrupt frequency has increased by an order of magnitude, requiring more CPU time. When uart_puts() writes a character to the queue, it immediately opens the interrupt, so the CPU executes the ISR. Let’s look at the image (here the yellow line represents ISR entry and exit).
The reason is that the time consumed by queue operations (Send, followed by two Receives) is longer than the time to send a character via UART, so the queue does not serve as a buffer, and the operational cost of itself wastes CPU resources.
As a replacement for the queue provided by FreeRTOS, I allocated a block of memory as a buffer, using two integer variables tail and head (representing the read and write positions, respectively) to record the usage of the buffer. When tail and head are equal, it indicates that the buffer has no data. Writing a byte into the buffer increases tail by 1; conversely, taking a byte out increases head by 1. The significance of the “circular buffer” is that when tail or head exceeds the buffer size, it resets to 0, and the maximum number of bytes that the entire buffer can hold is fixed.
First, consider the implementation of the ISR, for example:void LPUART1_IRQHandler(void){ if(uartbuf.tail!=uartbuf.head) // not empty { LPUART1->TDR = uartbuf.buf[uartbuf.head]; uartbuf.head = (uartbuf.head+1)%BUFSIZE; if(uartbuf.tail!=uartbuf.head) return; } // buffer empty LPUART1->CR1 &= ~USART_CR1_TXEIE_TXFNFIE; // disable IRQ}uart_puts() implementation needs to consider more situations, as the usage of the circular buffer is not as straightforward as in the previous example: because the head variable does not exist – it relates to the DMA transfer address, which must access the DMA hardware register. I used the memcpy() function to fill the buffer, and I also had to calculate addresses based on conditions. If the buffer cannot hold the output string, the task needs to be blocked until the DMA processes the buffer data.void uart_puts(char *str){ uint16_t len; char done=0, dma_stopped=0; xSemaphoreTake(uart_mutex, portMAX_DELAY); taskENTER_CRITICAL(); len=strlen(str); if(dmabuf.pend_ext) { uint16_t avail, dma_pos; dma_pos = dmabuf.tail – UART_DMAChannel->CNDTR; avail = dma_pos – dmabuf.pend_ext; if(avail>=len) { memcpy(dmabuf.buf+dmabuf.pend_ext, str, len); dmabuf.pend_ext += len; done=1; } } else { if(dmabuf.pend_len) { uint16_t avail1, dma_pos; avail1 = BUFSIZE – dmabuf.pend_off – dmabuf.pend_len; // till buffer end if(avail1>=len) { memcpy(dmabuf.buf+dmabuf.pend_off+dmabuf.pend_len, str, len); dmabuf.pend_len += len; done=1; } else { dma_pos = dmabuf.tail – UART_DMAChannel->CNDTR; if(avail1+dma_pos>=len) { memcpy(dmabuf.buf+dmabuf.pend_off+dmabuf.pend_len, str, avail1); dmabuf.pend_len += avail1; memcpy(dmabuf.buf, str+avail1, len-avail1); dmabuf.pend_ext = len-avail1; done=1; } } } else // no pending transfer { if(UART_DMAChannel->CNDTR) // not finished { uint16_t avail1, dma_pos; avail1 = BUFSIZE – dmabuf.tail; if(avail1>=len) { memcpy(dmabuf.buf+dmabuf.tail, str, len); dmabuf.pend_off = dmabuf.tail; dmabuf.pend_len = len; done=1; } else { dma_pos = dmabuf.tail – UART_DMAChannel->CNDTR; if(avail1+dma_pos>=len) { memcpy(dmabuf.buf+dmabuf.tail, str, avail1); dmabuf.pend_off = dmabuf.tail; dmabuf.pend_len = avail1; memcpy(dmabuf.buf, str+avail1, len-avail1); dmabuf.pend_ext = len-avail1; done=1; } } } else // finished already { dma_stopped=1; } } } taskEXIT_CRITICAL(); if(!done) { if(!dma_stopped) wait_dma_finish(); while(BUFSIZE < len) { memcpy(dmabuf.buf, str, BUFSIZE); uart_conf_dma(0, BUFSIZE); wait_dma_finish(); len -= BUFSIZE; str += BUFSIZE; } memcpy(dmabuf.buf, str, len); uart_conf_dma(0, len); } xSemaphoreGive(uart_mutex);}
Note that unlike before, I used taskENTER_CRITICAL() to set the critical section, prohibiting DMA interrupts from interrupting execution and prohibiting task scheduling. Because the configuration of the buffer is now more complex, if the DMA transfer status changes during the execution of uart_puts(), it is easy to cause misjudgment.
The complete program with the auxiliary functions used above is in the attachment, and I will not list all of it here.
To test how much performance the program using DMA improves compared to using only interrupts, I designed a test code: using one task to continuously print a counter variable, this task is assigned high priority, so the serial port will continuously output.volatile int counter;static portTASK_FUNCTION( vPrint, pvParameters ){ char str[128]; strcpy(str, ” **** 0x”); for(;;) { char *text = str+8; int8_t i; uint32_t x = counter; text[8]=0; for(i=7;i>=0;i–) { unsigned char h=x%16; x>>=4; if(h<10) *(text+i)=’0’+h; else *(text+i)=’A’+h-10; } LED_B_on(); uart_puts(str); LED_B_off(); }}And this counter variable is rewritten in a lower priority task:static portTASK_FUNCTION( vCount, pvParameters ){ for(;;) { counter++; }}Then, I set a time: 10 seconds, to see how much the counter has increased from 0, which can determine the effective execution time of the vCount task. This is done with another task, given the highest priority:static portTASK_FUNCTION( vControl, pvParameters ){ vTaskDelay(10000); vTaskSuspendAll(); for(;;) { }}
When comparing the two implementation methods, I removed the task mutex code from uart_puts() because only one task is calling it. Comparing the final output count results:
BUFSIZE=256 | BUFSIZE=512 | |
DMA (Ex 3) | 0x4DC6F6 | 0x4E5AAD |
PIO (Ex 2) | 0x3D4DC3 | 0x3DF186 |
To visualize:
Recommended Reading
Dry Goods | FreeRTOS Learning Notes – Application Scenarios
Dry Goods | FreeRTOS Learning Notes – Stack (Key to Task Switching)
Dry Goods | FreeRTOS Learning Notes – Task Status and Switching
Dry Goods | FreeRTOS Learning Notes – Inter-task Communication
Dry Goods | Skills Get√, Simple Measurement of Operational Amplifier
Dry Goods | Regarding PCB High-Frequency Circuit Board Wiring
Dry Goods | 40 Animated Images to Let You Understand Various Common Sensor Working Principles in Seconds
Dry Goods | Exploration of Losses in Switching Power Supplies

All the following WeChat public accounts belong to
EEWorld (www.eeworld.com.cn)
Welcome to long press the QR code to follow!
EEWorld Subscription Account: Electronic Engineering World
EEWorld Service Account: Electronic Engineering World Welfare Society