Understanding Embedded Parallel Multithreading Processors

Hello everyone, I am the Mixed Bag Master.

Recently, a friend gave me a small board, and the MCU on the board is a very interesting thing – the parallel multithreading processor MC3172.

In simple terms, this MCU internally implements functionalities similar to RTOS multithreading. However, the biggest difference between MC3172 programming and RTOS programming is:

  • MC3172 multithreading runs absolutely in parallel, without switching jitter and overhead.

  • MC3172 does not have concepts like thread priority, priority inversion, deadlock, etc.

  • All interrupts in MC3172 can be handled by dedicated threads, with no interrupt nesting and delays.

  • Threads in MC3172 run synchronously and in parallel, without blocking or interfering with each other.

  • The determinism of thread response in MC3172 is more precise compared to RTOS.

Introduction to MC3172

MC3172 is a 32-bit RISC parallel multithreading real-time processor from Xiamen Goxin Technology. Based on the RISC-V RV32IMC instruction set, it features 100% single-cycle instructions, a maximum clock frequency of 200MHz, and 3.37 coremark/MHz. It can replace real-time operating systems, achieving modularity and reusability of programs.

Related materials can be downloaded from the Goxin official website. Link:

http://www.gxchip.cn/

MC3172 features:

Understanding Embedded Parallel Multithreading Processors

MC3172 Practice

The development environment for MC3172 uses domestic software – MounRiver Studio.

Understanding Embedded Parallel Multithreading Processors

Download link for MounRiver Studio:

http://www.mounriver.com/download

Let’s take a simple look at the demo project for MC3172:

Understanding Embedded Parallel Multithreading Processors

1. MC3172 Folder

The MC3172 folder contains the core programming files for MC3172.

The thread configuration tool can configure each thread:

Understanding Embedded Parallel Multithreading Processors

It can configure thread clock sources, frequencies, stack space, memory allocation, and other information.

MC3172 supports 64 threads running synchronously in parallel, divided into 4 thread groups, each with 16 threads. The thread numbers in each thread group are shown in the above image. Unused threads can be set as idle threads, which do not run and thus consume no power.

Each thread has its own independent stack space, which can be allocated freely within the allowed data space, but it is necessary to ensure that the data space occupied by all non-idle threads does not exceed the size of the data space.

MC3172.h contains macro definitions related to peripheral addresses and their configuration macros, such as:

Understanding Embedded Parallel Multithreading Processors

Similar to ST’s stm32fxxx.h.

thread_config.h is the thread configuration file generated by the thread configuration tool:

Understanding Embedded Parallel Multithreading Processors

MC3172.lds is the link script generated by the thread configuration tool

Understanding Embedded Parallel Multithreading Processors

thread_start.c is the source file related to starting threads:

#ifndef THREAD_START_C
#define THREAD_START_C
#include "./MC3172.h"
#include "./thread_config.h"

void thread1_initial(void)
{
#ifdef ROTHD_THREAD1_VALID
extern void thread1_main(void);
    rothd_set_sp_const(ROTHD_THREAD1_STACKCFG_VALUE|0x20000000);
    thread1_main();
#endif
}
void thread2_initial(void)
{
#ifdef ROTHD_THREAD2_VALID
extern void thread2_main(void);
    rothd_set_sp_const(ROTHD_THREAD2_STACKCFG_VALUE|0x20000000);
    thread2_main();
#endif
}

// Omitted code......
void (*thread_initial_pointer[64]) (void)={
                                               &thread0_initial,
                                               &thread1_initial,
                                               &thread2_initial
// Omitted code......
}

void thread_start(void)
{
    (*thread_initial_pointer[THREAD_ID])();
}

The entry function for the program is: thread_start, from the link script we can see:

Understanding Embedded Parallel Multithreading Processors

THREAD_ID in thread_start is the thread ID value, read directly from the address 0x50000000:

#define THREAD_ID (*(volatile u8*)(0x50000000))

Speculation: The ID value at address 0x50000000 will keep changing, and through some mechanism, it will jump to execute each thread function in the thread_initial_pointer function pointer array.

threadx_initial initializes the thread stack and executes the thread body, such as:

void thread_end(void)
{
    while(1);
}

void thread1_main(void)
{
    while(1){
        //user code section
    }
    thread_end();
}

This is user code, where we can write our application code inside each thread’s main function.

2. Release Folder

The Release folder contains the compiled firmware program, which can be downloaded using the development board programming download tool:

Understanding Embedded Parallel Multithreading Processors

3. USER_CODE Folder

The USER_CODE folder stores user code:

Understanding Embedded Parallel Multithreading Processors

MC3172 is a parallel multithreading real-time processor, let’s take a look at its multithreading parallel execution characteristics.

We will write two threads, both configured similarly, where the two threads toggle two IOs. The test code is as follows:

void LED0_GPIOA_PIN0_TEST(void)
{
 // Start GPIOA and set privilege group and clock frequency
    INTDEV_SET_CLK_RST(GPIOA_BASE_ADDR,(INTDEV_RUN|INTDEV_IS_GROUP0|INTDEV_CLK_IS_CORECLK_DIV2));

    // Enable GPIOA PIN0
    GPIO_SET_OUTPUT_EN_VALUE(GPIOA_BASE_ADDR, GPIO_PIN0, GPIO_SET_ENABLE);

    while(1)
    {
     // GPIOA PIN0 output 1
     GPIO_SET_OUTPUT_PIN_TO_1(GPIOA_BASE_ADDR, GPIO_PIN0);

     // Delay
        for (u32 var = 0; var < 5000; ++var)
        {
            NOP();
        }

        // GPIOA PIN0 output 0
     GPIO_SET_OUTPUT_PIN_TO_0(GPIOA_BASE_ADDR, GPIO_PIN0);

     // Delay
        for (u32 var = 0; var < 5000; ++var)
        {
            NOP();
        }
    }
}

void LED1_GPIOA_PIN1_TEST(void)
{
 // Start GPIOA and set privilege group and clock frequency
    INTDEV_SET_CLK_RST(GPIOA_BASE_ADDR,(INTDEV_RUN|INTDEV_IS_GROUP0|INTDEV_CLK_IS_CORECLK_DIV2));

    // Enable GPIOA PIN1
    GPIO_SET_OUTPUT_EN_VALUE(GPIOA_BASE_ADDR, GPIO_PIN1, GPIO_SET_ENABLE);

    while(1)
    {
     // GPIOA PIN1 output 1
     GPIO_SET_OUTPUT_PIN_TO_1(GPIOA_BASE_ADDR, GPIO_PIN1);

     // Delay
        for (u32 var = 0; var < 5000; ++var)
        {
            NOP();
        }

        // GPIOA PIN1 output 0
     GPIO_SET_OUTPUT_PIN_TO_0(GPIOA_BASE_ADDR, GPIO_PIN1);

     // Delay
        for (u32 var = 0; var < 5000; ++var)
        {
            NOP();
        }
    }
}

////////////////////////////////////////////////////////////

void thread_end(void)
{
    while(1);
}

////////////////////////////////////////////////////////////

void thread0_main(void)
{
    while(1){
        //user code section
    }
    thread_end();
}

////////////////////////////////////////////////////////////

void thread1_main(void)
{
    while(1){
        //user code section
     LED0_GPIOA_PIN0_TEST();
    }
    thread_end();
}

////////////////////////////////////////////////////////////

void thread2_main(void)
{
    while(1){
        //user code section
     LED1_GPIOA_PIN1_TEST();
    }
    thread_end();
}

After programming, the logic analyzer captures the level changes of GPIOA_PIN0 and GPIOA_PIN1 as follows:

Understanding Embedded Parallel Multithreading Processors

As can be seen, these two waveforms are completely synchronized, with the CPU doing two things simultaneously, achieving the same effect as RTOS multithreading.

Insights and Summary

Embedded development combines software and hardware, with both complementing each other. If the hardware is very powerful, the software can be designed to be relatively simple; if the hardware has limited functionality, then the software may need to consider many aspects.

For example:

  • Some software algorithms require multiple sensor data inputs for fusion, making the implementation relatively simple, but in reality, to reduce costs, some sensors may be reduced, requiring the software algorithms to work harder to achieve stable and reliable functionality.

  • For some not very complex digital signal processing, it can be handled on general-purpose MCUs, but for more complex digital signal processing, it may require MCUs with DSP processors.

Especially for internal IC circuits of chips, if there are related modules that can achieve certain functions, then corresponding software programming will be much simpler, and the efficiency of hardware implementation is generally higher than that of software implementation.

Hardware-implemented multithreading programming does indeed outperform RTOS programming, but in actual development, the product’s software and hardware architecture need to consider multiple aspects, such as chip stability and software ecosystem.

Parallel multithreading real-time processors are great, but they are still in the early stages, and there are many things that need to be improved. We need to support and promote them more; only when the ecosystem is established can we have the opportunity to use them in the future.

This concludes my share. If you find the article helpful, please help share it, thank you!

Previous Recommendations:

Reusing Old Boards: Setting Up a Wireless Debugging Environment!

Sharing Several Practical Code Snippets (with code examples)

Sharing Embedded Software Debugging Methods and Several Useful Tools!

Step-by-Step Guide to Using VSCode + gdb + gdbserver to Debug ARM Programs

Embedded Mixed Bag Weekly | Issue 9

Summary of 3 Debugging Methods for Embedded Segmentation Faults!

Writing Internationalized Embedded Code, How to Handle Time Issues?

Reply with 1024 in the WeChat public account chat interface to get embedded resources; reply with m to view the article summary.

Leave a Comment

×