When Do We Need to Use RTOS?

Follow+Star Public Number, don’t miss the wonderful content

Source | Network

RTOS plays an important role in embedded development, so when exactly do we need to use RTOS?

■ When do we actually need a real-time operating system?

Do most embedded projects still require a real-time operating system? Considering the speed of today’s high-performance processors and the availability of real-time patches for Linux, Windows, and other general-purpose operating systems (GPOS), this is a good question.

The answer lies in the nature of embedded devices. These devices are often produced in thousands or even millions at this scale, and even a $1 reduction in hardware cost can save manufacturers a small fortune. In other words, these devices cannot afford the cost of a processor in the millions of hertz (not to mention cooling).

For example, in the automotive telematics market, a typical 32-bit processor runs at about 600 megahertz, far below the processors commonly found in desktops and servers. In such an environment, a real-time operating system designed to extract extremely fast and predictable response times from low-end hardware has significant economic advantages.

In addition to cost savings, the services provided by real-time operating systems make many computational problems easier to solve, especially when multiple activities compete for a system’s resources. For example, consider a system where a user expects (or needs) an immediate response to input. With a real-time operating system, developers can guarantee that user-initiated operations will take precedence over other system activities unless more important activities (such as operations that help protect user safety) must be executed first.

Also consider a system that must meet quality of service (QoS) requirements, such as a device that displays real-time video. If any part of the content delivery of the device relies on software, it will experience dropped frames at a rate that users find unacceptable, rendering the device unreliable. However, with a real-time operating system, developers can precisely control the execution order of software processes and ensure playback occurs at an appropriate and consistent rate.

■ Real-time operating systems are not fair

In the embedded industry, the demand for real-time “hard” timing is still prevalent. The question is: what do real-time operating systems offer that GPOS do not? And how useful are today’s real-time extensions for some GPOS? Can they deliver reasonable real-time operating system performance?

Let’s start with task scheduling. In GPOS, the scheduler typically uses a “fair” policy to assign threads and processes to the CPU. Such policies can achieve the high overall throughput required for desktop and server applications but cannot guarantee that high-priority, time-critical threads will execute before lower-priority threads.

For example, GPOS may lower the priority assigned to high-priority threads or dynamically adjust thread priorities to ensure fairness among other threads in the system. Therefore, high-priority threads may be preempted by lower-priority threads. Additionally, most GPOS have unbounded scheduling delays: the more threads in the system, the longer it takes for GPOS to schedule an executing thread, any of which can cause high-priority threads to miss deadlines, even on high-speed CPUs.

Moreover, high-priority threads can run uninterrupted until they complete their tasks, of course, unless they are preempted by an even higher-priority thread. This approach is known as priority-based preemptive scheduling, allowing high-priority threads to meet their deadlines even when many other threads are competing for CPU time.

If you need to learn more about operating system development practices, you can participate in the training course at Niu Ke School, created by senior RTOS development experts from international top equipment manufacturers. The course systematically teaches Kernel, BSP, IDE, and other application practices, supplemented with practical operations on real development boards. Click the poster for details.

■ Preemptive kernel

In most GPOS, the OS kernel is non-preemptive. Therefore, high-priority user threads can never preempt kernel calls and must wait for the entire call to complete—even if the call is made by the lowest-priority process in the system. Furthermore, when drivers or other system services (often executed within kernel calls) perform exec cuts on behalf of client threads, all priority information is typically lost. This behavior can lead to unpredictable delays and prevent critical activities from completing on time.

On the other hand, in a real-time operating system, kernel operations are preemptive. As in GPOS, preemption may not occur during time windows, but in well-designed real-time operating systems, these windows are very short, typically on the order of hundreds of nanoseconds. In addition, real-time operating systems set an upper limit on preemption delays and interrupt disable times; this limit allows developers to determine the worst-case delay.

To achieve consistent predictability and timely completion of critical activities, the real-time operating system kernel must be as simple and elegant as possible. The best way to achieve this simplicity is to design a kernel that contains only services with short execution paths. By excluding work-intensive operations (such as process loading) from the kernel and assigning them to external processes or threads, real-time operating system designers can help ensure that there is an upper limit on the longest non-preemptive code paths through the kernel.

In some GPOS, the kernel adds a degree of preemptiveness. However, the time intervals during which preemption cannot occur are still much longer than those in a typical real-time operating system; the length of any such preemption time interval will depend on the longest critical section of any module in the GPOS kernel (for example, networking). Furthermore, a preemptive GPOS kernel will not handle other conditions that may cause unbounded delays, such as loss of priority information when client calls drivers or other system services.

■ Avoiding priority inversion

In GPOS, and even in real-time operating systems, low-priority threads can inadvertently block high-priority threads from accessing the CPU, a situation known as priority inversion. When unbounded priority inversion occurs, critical deadlines may be missed, leading to results ranging from abnormal system behavior to complete failure. Unfortunately, priority inversion is often overlooked during system design. There are many examples of priority inversion, including the Mars Pathfinder project in July 1997.

Generally, priority inversion occurs when two tasks of different priorities share a resource, and the high-priority task cannot obtain the resource from the low-priority task. To prevent this situation from exceeding a limited time interval, real-time operating systems can provide mechanisms not available in GPOS, including priority inheritance and priority ceiling emulation. We cannot do justice to both mechanisms, so let’s focus on an example of priority inheritance.

First, we must consider how task synchronization can lead to blocking, and how blocking can, in turn, lead to priority inversion. Suppose two tasks are running, Task 1 and Task 2, with Task 1 having a higher priority. If Task 1 is ready to execute but must wait for Task 2 to complete an activity, it will be blocked. This blocking may be due to synchronization; for example, Task 1 and Task 2 share a resource controlled by a lock or semaphore, and Task 1 is waiting for Task 2 to unlock that resource. Alternatively, it may be because Task 1 is requesting a service currently being used by Task 2.

Blocking allows Task 2 to run until the condition Task 1 is waiting for occurs (for example, Task 2 unlocking the resource shared by both tasks). At this point, Task 1 begins executing. The total time that Task 1 must wait is called the blocking factor. If Task 1 is to meet any deadlines, this blocking factor cannot vary with any parameters (such as the number of threads or system inputs). In other words, the blocking factor must be bounded.

Now let’s introduce a third task, Task 3, which has a higher priority than Task 2 but lower than Task 1 (see Figure 1). If Task 3 is ready to run while Task 2 is executing, it will preempt Task 2, and Task 2 will not be able to run again until Task 3 blocks or completes. Of course, this new task will increase the blocking factor for Task 1; that is, it will further delay the execution of Task 1. The total delay introduced by preemption is a priority inversion.

Figure 1 Task 1 is waiting for Task 2 to complete an activity, at which point Task 3 preempts Task 2. This new task further delays Task 1’s execution

In practice, multiple tasks can preempt Task 2 in this way, creating an effect known as chain blocking where Task 2 may be indefinitely preempted, resulting in unbounded priority inversion and causing Task 1 to miss any of its deadlines.

This is where priority inheritance comes into play. If we return to our scenario and have Task 2 run at Task 1’s priority during synchronization, then Task 3 will not be able to preempt Task 2, thus avoiding priority inversion (see Figure 2).

Figure 2 Task 2 inherits the higher priority of Task 1, thus preventing Task 3 from preempting Task 2. Task 3 no longer delays the execution of Task 1.

■ Partitioned scheduler

For many systems, guaranteeing resource availability is critical. If a critical subsystem is deprived, say, of CPU cycles, then the services that subsystem provides become unavailable to users. For example, in a denial of service (DoS) attack, a malicious user can bombard the system with requests that require high-priority processes to handle. This process may overload the CPU, interrupting other processes’ CPU cycles, rendering the system unavailable to users.

Security vulnerabilities are not the only cause of process starvation. In many cases, adding software features to the system can push the system to the “edge of collapse,” causing existing applications to consume excessive CPU time. Timely applications or services may no longer respond as expected or required. Historically, the only way to address this issue was either to retrofit hardware or to recode (or redesign) software—both of which are undesirable options. To resolve these issues, system designers need a partitioning scheme that enforces CPU computations by hardware or software to prevent processes or threads from monopolizing CPU cycles needed by other processes or threads. Since real-time operating systems already provide centralized access to CPU, memory, and other computing resources, they are the best choice for performing CPU partitioning operations.

Some real-time operating systems provide a fixed partition scheduler. Using this scheduler, system designers can group tasks into groups or partitions and allocate a certain percentage of CPU time to each partition. Using this approach, the CPU time consumed by tasks in any given partition will not exceed the percentage statically defined for that partition. For example, suppose a partition is allocated 30% of the CPU. If a process in that partition subsequently becomes the target of a denial of service attack, then its CPU time will not exceed 30%. This allocation limit allows other partitions to maintain their availability; for example, it can ensure that the user interface (such as a remote terminal) remains accessible. Thus, operators can access the system and resolve issues without having to press the reset switch.

However, this approach has a problem. Because the scheduling algorithm is fixed, a partition can never use CPU cycles allocated to other partitions, even if those partitions are not using their allocated cycles. This approach wastes CPU cycles and prevents the system from handling peak demands. Therefore, system designers must either use more expensive processors, tolerate slower systems, or limit the number of functions the system can support.

■ Adaptive partitioning

Another partitioning scheme called adaptive partitioning addresses the drawbacks of static partitioning by providing a more dynamic scheduling algorithm. Like static partitioning, adaptive partitioning allows system designers to reserve CPU cycles for a process or group of processes. Thus, designers can ensure that the load on one subsystem or partition does not affect the availability of other subsystems. However, unlike the static method, adaptive partitioning can dynamically reallocate CPU cycles from non-busy partitions to partitions that can benefit from the extra processing time, enforced only when the CPU is fully loaded. Thus, the system can handle peak scheduling applications without requiring them to change their scheduling behavior. Additionally, designers can dynamically reconfigure partitions to optimize system performance. Participate in Niu Ke School’s automotive operating system technology training to learn more practical techniques. Scan the QR code at the end of the article to register.

■ “Dual” kernel

GPOS, including Linux, Windows, and various styles of Unix, typically lack the real-time mechanisms discussed so far. To fill this gap, GPOS vendors have developed a variety of real-time extensions and patches. For example, there is a dual-kernel approach, where GPOS runs as a task on top of a dedicated real-time kernel (see Figure 4). Thus, these tasks can preempt them when GPOS needs to execute, and only when GPOS has completed its work will it yield the CPU to them.

Unfortunately, tasks running in the real-time kernel can only make limited use of the existing system services in GPOS—file systems, networking, etc. In fact, if a real-time task requests any service from GPOS, that task will be subject to the same priority handling requirements and achieve 100% utilization while enjoying the benefits of resource guarantees.

Equally importantly, adaptive partitioning can overlay existing systems without requiring redesign or code modification. System designers can simply start existing POSIX-based applications in partitions, and the real-time operating system scheduler can ensure that each partition receives its allocated budget. Within each partition, each task is scheduled according to priority-based preemption rules.

Figure 3 Adaptive partitioning can prevent high-priority tasks from consuming more than the allocated percentage of CPU unless the system has unused CPU cycles. For example, Tasks A and D can run within the allocated time for Partition 3 because Tasks E and F do not need the remaining budget CPU cycles.

It prevents GPOS processes from deterministic behavior. Therefore, new drivers and system services must be created specifically for the real-time kernel, even if GPOS already has similar services. Furthermore, tasks running in the real-time kernel cannot benefit from the robust environment provided by most GPOS for regular, non-real-time processes, as they run unprotected in kernel space. Thus, real-time tasks that contain common coding errors (such as corrupted C pointers) can easily lead to fatal kernel errors. This is a problem because most systems requiring real-time capabilities also require high reliability. Compounding the complexity, different implementations of the dual-kernel approach use different APIs. In most cases, services written for GPOS cannot be easily ported to real-time kernels, and tasks written for one vendor’s real-time extension may not run on another vendor’s extension.

Figure 4 In a typical dual-kernel implementation, GPOS runs as the lowest-priority task in a separate real-time kernel.

These solutions highlight the real challenges and vast scope of enabling GPOS to support real-time behavior. It’s not a matter of “real-time operating systems are good, GPOS are bad.” GPOS such as Linux, Windows, and various Unixes can serve well as desktop or server operating systems. However, when they are forced into deterministic environments for which they were not designed, such as automotive telematics units, medical devices, real-time control systems, and continuous media applications, they fail to meet the requirements.

■ Extending operating systems to meet application-specific needs

Whatever their shortcomings in deterministic environments, there are benefits to using them.

These benefits include support for widely used APIs and support for Linux’s open-source model. With open source, developers can customize operating system components to meet application-specific needs, saving significant troubleshooting time. Real-time operating system vendors cannot ignore these benefits. Broad support for the POSIX API (the same API used by Linux and various styles of Unix) is an important first step. Providing well-documented source code and customizable toolkits to meet embedded developers’ specific needs and design challenges is equally important.

The architecture of real-time operating systems also plays a role. For example, microkernel-based real-time operating systems can fundamentally make the work of customizing the operating system easier than other architectures. In a microkernel real-time operating system, only a small portion of basic operating system services (such as signaling, timing, scheduling) reside in the kernel itself. All other components—drivers, file systems, protocol stacks, applications—run as independent, memory-protected processes outside the kernel (see Figure 5). In fact, as user space programs, such extensions become as easy to develop as standard applications, as they can be debugged with standard source-level tools and techniques.

Figure 5 In a microkernel real-time operating system, system services run as standard user-space processes, simplifying the task of customizing the operating system.

For example, if a device driver attempts to access memory outside its process container, the operating system can identify the responsible process, indicate the erroneous location, and create a process dump file that can be viewed with source-level debugging tools. The dump file can contain all the information needed for the debugger to identify the source code that caused the problem, along with diagnostic information such as the contents of data items and function call history.

This architecture also provides better fault isolation and recovery: if a driver, protocol stack, or other system service fails, it can be fault-isolated and recovered without disrupting other services or the operating system kernel. In fact, “software supervision” can continually monitor such events and dynamically restart the offending service without requiring a full system reset or involving the user in any way. Similarly, drivers and other services can be dynamically stopped, started, or upgraded without shutting down the system.

These benefits should not be overlooked—unplanned system restarts are the greatest disruption to real-time performance! Even planned restarts to incorporate software upgrades can disrupt operations, albeit in a controlled manner. To ensure deadlines can always be met, developers must use operating systems that can remain continuously available, even in the event of software failures or service upgrades.

■ A strategic decision

Real-time operating systems help make complex applications both predictable and reliable; indeed, the precise control over time that real-time operating systems provide increases reliability that GPOS cannot achieve. (If a system based on GPOS does not work correctly due to incorrect timing behavior, we can reasonably say that the system is unreliable.) However, choosing the right RTOS itself can be a complex task. The underlying architecture of real-time operating systems is an important criterion, but other factors are also important. These include:

• Flexibility in scheduling algorithms—does the real-time operating system support the selection of scheduling algorithms (FIFO, round-robin, sporadic scheduling, etc.)? Can developers allocate algorithms per thread, or does the real-time operating system force them to allocate one algorithm for all threads in the system?

• Time partitioning—does the real-time operating system support time partitioning to provide a certain percentage of CPU cycles for processes? Such guarantees simplify the integration of subsystems from multiple development teams or vendors. They can also ensure that critical tasks remain available and meet their deadlines, even if the system is subjected to denial of service (DoS) attacks and other malicious attacks.

• Support for multicore processors—the migration to multicore processors has become foundational for various high-performance designs. Does the real-time operating system support the choice of multiprocessing models (symmetric multiprocessing, asymmetric multiprocessing, binding multiprocessing) to help developers fully utilize multicore hardware? Do system tracking tools support the real-time operating system to allow developers to diagnose and optimize the performance of multicore systems? Without tools capable of highlighting resource contention, excessive thread migration, and other common issues in multicore designs, optimizing multicore systems can quickly become a burdensome and time-consuming task.

• Tools for remote diagnostics—since many embedded systems cannot tolerate downtime, real-time operating system vendors should provide diagnostic tools that can analyze system behavior without interrupting the services provided by the system. Look for a vendor that provides runtime analysis tools for system analysis, application analysis, and memory analysis.

• Open development platforms—do real-time operating system vendors provide development environments based on open platforms (such as Eclipse), allowing developers to “plug in” their favorite third-party tools for modeling, version control, etc.? Or is the development environment based on proprietary technology?

• Graphical user interfaces—does the real-time operating system use raw graphics libraries, or does it support various human-machine interface technologies (HTML5, Qt, OpenGL ES, etc.) and provide advanced graphical features such as multi-layer interfaces, multi-head displays, accelerated 3D rendering, and a true windowing system? Can the appearance and feel of the GUI be easily customized? Can GUIs simultaneously display and input multiple languages (Chinese, Korean, Japanese, English, Russian, etc.)? Can 2D and 3D applications easily share the same screen?

• Standard APIs—does the real-time operating system lock developers into proprietary APIs, or does it provide certified support for standard APIs such as POSIX and OpenGL ES, making code easier to port between different environments? Additionally, does the real-time operating system provide comprehensive support for APIs, or does it only support a small subset of defined interfaces?

• Middleware for digital media—flexible support for digital media is becoming a design requirement for a range of embedded systems, including automotive radios, medical devices, industrial control systems, media servers, and of course consumer electronics. A system may need to handle multiple media sources (devices, streams, etc.), understand multiple data formats, and support various DRM schemes. By providing well-designed middleware for digital media, real-time operating system vendors can eliminate a significant amount of software work required to connect to multiple media sources, organize data, and initiate appropriate data processing paths. Moreover, a well-designed middleware solution will have the flexibility to support new data sources, such as the next generation of iPods, without requiring modifications to the user interface or other software components.

For any project team, choosing a real-time operating system is a strategic decision. Once real-time operating system vendors provide clear answers to the issues outlined above, you will be able to choose the best fit for now and the future.

Disclaimer: The content of this article is sourced from the internet, and the copyright belongs to the original author. If there are copyright issues, please contact me for removal.

———— END ————

● Column

Related posts

Leave a Comment Cancel reply