How the Linux System is Transformed into a Real-Time Kernel

1. Basic Concepts and Background of Linux RT Patches

1.1 Concept of Real-Time and Positioning of Linux RT

In modern embedded systems and industrial control fields, real-time performance has become a key indicator of operating system performance. Real-time performance refers to the guaranteed execution time of a specific task, ensuring that the task’s deadline (maximum execution time limit) is met under all circumstances.

Real-time systems are divided into two main categories: hard real-time systems and soft real-time systems. Hard real-time systems require tasks to be completed within strict time limits; exceeding these limits can lead to system failure or unacceptable results. For example, the response time of a flight control system must be in the millisecond range to ensure flight safety. Soft real-time systems have some flexibility regarding time constraints; occasional delays do not result in severe consequences, such as in multimedia processing applications.

The standard Linux kernel was designed with a focus on high throughput and generality, introducing multiple sources of “uncertainty” in its core mechanisms, making it impossible to guarantee strict upper limits. The design goals of the Linux kernel are generality, high throughput, and fairness, rather than guaranteeing deterministic response times. The default CFS (Completely Fair Scheduler) scheduling algorithm in Linux aims to allow all processes to “fairly” share CPU time, without distinguishing between real-time and normal tasks, thus failing to ensure that high-priority tasks can obtain CPU time within the most stringent time requirements (e.g., microsecond level).

The real-time issues in the Linux kernel mainly manifest in three aspects: non-preemptible kernel and spinlocks, interrupt disabling, and other non-deterministic factors. To protect shared kernel data structures (such as task queues, memory management structures, etc.) from concurrent access corruption, the Linux kernel extensively uses spinlocks. When a CPU core acquires (holds) a spinlock, it enters a non-preemptible critical section. During this time, even if a higher-priority real-time task is ready, the current kernel code must continue executing until the lock is released.

In Linux, hardware interrupt handlers (ISRs) execute in interrupt context. To ensure that the execution of the ISR itself is not disturbed, the CPU automatically disables local interrupts when responding to an interrupt. During the period when interrupts are disabled, that CPU core cannot respond to any new interrupts or perform task scheduling. Additionally, Linux uses virtual memory; if the code or data accessed by a real-time task is not in physical memory, a “page fault interrupt” will be triggered, requiring data to be loaded from disk, which incurs a millisecond-level delay that is catastrophic for real-time tasks.

1.2 Core Features and Improvement Mechanisms

The Linux RT patch (commonly referred to as PREEMPT_RT or RT patch) is designed to provide real-time performance to the Linux kernel, with core features including low latency and high responsiveness. The RT patch enables the kernel to respond more quickly to external events, reducing system response times and processing delays. PREEMPT_RT (Real-Time Preemption Patch) is an important extension of the Linux kernel, specifically aimed at enhancing the real-time performance of Linux systems. Its full name is “Fully Preemptible Kernel,” with the goal of transforming the Linux kernel into a fully preemptible real-time operating system, improving system real-time response capabilities by reducing latency and increasing determinism.

The core improvement mechanisms of the PREEMPT_RT patch include the following aspects:

Scheduler Architecture Transformation: The core data structure of the real-time scheduler is rt_rq (Real-Time Runqueue), with each CPU core corresponding to an rt_rq instance to manage the real-time task queue on that CPU. The rt_rq structure efficiently manages priorities through a bitmap and an array of linked lists. The Linux kernel adopts a multi-level scheduler class (sched_class) architecture, where different types of processes are handled by different scheduler classes. The rt_sched_class relies on carefully designed data structures to achieve efficient priority management and task scheduling.

The definition of the rt_rq structure is as follows:

struct rt_rq {
    struct rt_prio_array active;      // Priority array
    unsigned int rt_nr_running;       // Number of running real-time tasks
    unsigned int rr_nr_running;       // Number of time-sliced tasks
    struct {
        int curr;                     // Current highest priority
        int next;                     // Next highest priority
    } highest_prio;
    bool overloaded;                  // Is overloaded (needs load balancing)
    struct plist_head pushable_tasks; // List of tasks that can be migrated
    // ...
};

In this structure, the rt_prio_array structure efficiently manages priorities through a bitmap and an array of linked lists:

struct rt_prio_array {
    DECLARE_BITMAP(bitmap, MAX_RT_PRIO+1);  // Priority bitmap
    struct list_head queue[MAX_RT_PRIO];     // Array of priority queues
};

The Linux kernel provides five scheduling policies, among which SCHED_FIFO (First-In-First-Out Real-Time Scheduling) and SCHED_RR (Round-Robin Real-Time Scheduling) are specifically designed for real-time tasks. The priority range for real-time tasks is 1-99 (the lower the number, the higher the priority), while the priority range for non-real-time tasks is 100-139. Tasks under the SCHED_FIFO policy have priorities ranging from 1 (low) to 99 (high), and tasks running under this policy will be scheduled until they complete or are preempted by a higher-priority task. The SCHED_RR policy is derived from SCHED_FIFO, with the difference being that tasks run for a defined time slice duration (unless preempted by a higher-priority task).

In the Linux kernel, real-time processes always have a higher priority than normal processes. The scheduling of real-time processes is managed by the Real-Time Scheduler (RT Scheduler), while normal processes are managed by the CFS scheduler. The system allocates an RT run queue and RT scheduling entities for each CPU, and task groups participate in scheduling through the RT scheduling entities they contain.

2.2 Revolutionary Improvements in Interrupt Handling Mechanism

The Linux RT patch has made revolutionary improvements to the interrupt handling mechanism by introducing Threaded Interrupts technology. In the traditional Linux kernel, hardware interrupt handlers (ISRs) execute in interrupt context, during which local interrupts are disabled, preventing timely responses to high-priority interrupts. PREEMPT_RT addresses this issue by dividing interrupt handling into two phases:

Fast Interrupt Handling (Hard Interrupt): Only urgent matters are handled, with extremely short execution times.

Slow Interrupt Handling (Soft Interrupt Thread): Preemptible kernel threads that handle the remaining interrupt logic.

The implementation of interrupt threading is completed through the request_threaded_irq function:

int request_threaded_irq(unsigned int irq, irq_handler_t handler,
                         irq_handler_t thread_fn, unsigned long flags,
                         const char *name, void *dev) {
    // ...
    action->thread_fn = thread_fn;
    action->flags |= IRQF_ONESHOT;    // Ensure threaded handling
    // ...
}

The PREEMPT_RT patch enforces the use of a threaded interrupt handling mechanism. Therefore, all interrupt handlers run in thread context unless they are marked with the IRQF_NO_THREAD flag. This mechanism can be forced to enable in the mainline Linux kernel without the PREEMPT_RT patch through the kernel command line option threadirqs, but the resulting behavior may differ slightly.

By default, interrupt threads are real-time threads, with a scheduling class of SCHED_FIFO and a real-time priority of 50. The priority of interrupt threads can be modified using the chrt command. This design allows interrupt handlers to be preempted by higher-priority tasks, significantly reducing interrupt latency.

2.3 Fundamental Changes in Lock Mechanism

The Linux RT patch has fundamentally changed the kernel lock mechanism, primarily by replacing traditional spinlocks with real-time mutexes (rt_mutex) that support priority inheritance.

Conversion from Spinlocks to rt_mutex: In the non-PREEMPT_RT preemption model, spinlocks are mapped to original spinlocks. Tasks waiting for a spinlock will spin until the task holding the spinlock releases it. In PREEMPT_RT, spinlocks are mapped to rt_mutex_base and become “sleeping spinlocks,” while the original spinlocks retain their behavior. Tasks waiting for sleeping spinlocks enter a sleep state and are awakened when the spinlock is released.

Design of Real-Time Mutex (rt_mutex): Real-time mutexes are an enhanced version of traditional mutexes, with their core data structure (struct rt_mutex) defined as follows:

// Real-time mutex structure
struct rt_mutex {
    struct rt_mutex_base base;
    struct plist_head waiters;      // List of waiters
    struct task_struct *owner;       // Current owner
    // ...
};

Compared to traditional mutexes, real-time mutexes add three key features: tracking of waiting task priorities, saving the priority of the holding process, and dynamic priority adjustment.

Priority Inheritance Mechanism: Priority inversion is a common problem in real-time systems. When a low-priority task holds a lock required by a high-priority task, it can lead to blocking of the high-priority task. PREEMPT_RT implements the Priority Inheritance Protocol to address this issue:

static void rt_mutex_setprio(struct rt_mutex *lock, struct task_struct *p,
                             struct task_struct *new_owner) {
    struct task_struct *old_owner = lock->owner;

    if (old_owner && old_owner != new_owner) {
        // Restore original owner's priority
        rt_mutex_adjust_prio(old_owner);
    }

    if (new_owner) {
        // Raise new owner's priority to the highest waiting task's priority
        new_owner->normal_prio = max(new_owner->normal_prio, p->prio);
        if (new_owner->prio > new_owner->normal_prio)
            new_owner->prio = new_owner->normal_prio;
        // ...
    }
}

The workflow of priority inheritance is as follows: when a high-priority task P1 requests a lock held by a low-priority task P3, P3’s priority is temporarily raised to P1’s priority; after P3 finishes execution and releases the lock, it restores its original priority; P1 then acquires the lock and continues execution.

2.4 Optimization of Timer Subsystem

The Linux RT patch has made significant optimizations to the timer subsystem by introducing High Resolution Timer support. High-resolution timers allow for precise timing scheduling and eliminate the dependency of timers on periodic scheduler ticks (jiffies).

The high-resolution timer system has three entry points to handle expired timers: when not switched to high-resolution mode, queries and processes occur in each jiffie tick event interrupt. High-resolution timers (hrtimer) rely on high-resolution clock event mode (highres=on), which depends on high-resolution clock sources (constant TSC/HPET/APIC PMT, etc.).

When high-resolution timers are enabled in the Linux kernel, nanosleep, itimers, and POSIX timers can provide high-resolution mode without modifying source code. The core implementation of high-resolution timers is located in the kernel/time/ directory, mainly consisting of three modules: kernel/time/hrtimer.c implements core operations such as timer creation, starting, and cancellation.

2.5 Memory Management and Other Component Optimizations

The Linux RT patch has also optimized memory management to meet the strict requirements of real-time systems for memory access latency and consistency.

Memory Locking Mechanism: The mlock() and mlockall() functions are used to lock part or all of the calling process’s virtual address space in RAM, preventing memory from being swapped out. mlock() locks memory starting at address addr and extending over a length of len address range, while mlockall() locks all pages mapped to the address space of the calling process.

Real-Time Memory Pool: The real-time memory pool is a dynamic memory allocation mechanism tailored for real-time applications. It pre-allocates a large block of memory and divides it into fixed-size memory blocks to form a memory pool. When real-time applications need memory, they can quickly allocate it directly from the memory pool without performing complex heap management operations, significantly reducing memory allocation latency.

RCU Mechanism Optimization: The RCU mechanism in mainline Linux is only preemptible when CONFIG_PREEMPT (preemption model: “low-latency desktop”) is set. The PREEMPT_RT preemption model uses a preemptible RCU mechanism. Additionally, the PREEMPT_RT patch eliminates RCU processing from all intermediate states and handles it only in its own threads.

Conclusion

The author has collected some embedded learning materials. Reply with [1024] in the public account to find the download link!

Recommended Articles  Click the blue text to jump
☞ Collection | Comprehensive Guide to Linux Application Programming
☞ Collection | Learn Some Networking Knowledge
☞ Collection | Handwritten C Language

☞ Collection | Handwritten C++ Language
☞ Collection | Experience Sharing
☞ Collection | From Microcontrollers to Linux
☞ Collection | Power Control Technology
☞ Collection | Essential Mathematics for Embedded Systems
☞ Collection | MCU Advanced Collection