Real-Time Scheduling Algorithms for Autonomous Driving Safety

Welcome to follow the public account below: Abao1990, this public account focuses on autonomous driving and intelligent cockpits, providing you with automotive insights every day. We start with cars, but we are not limited to cars.

“Scheduling” refers to arranging tasks in a certain order. According to Mindtools‘ casual definition, it considers scheduling in daily life, which is the art of planning your activities so that you can achieve your goals and priorities within the available time. In other words, scheduling ensures that there is sufficient time to handle the most important tasks by prioritizing the most urgent ones. Translating this to the world of hard real-time systems: the key function of scheduling is to guarantee the execution of safety-critical tasks. It is important not only that tasks are executed but also that they are executed accurately and timely.

We have previously discussed hard real-time systems and their requirements in the article “Future-Oriented Real-Time Automotive Safety Systems“. To introduce this concept, Professor Kopetz described the scheduling problem in his classic work Real-Time Systems: meeting their specified deadlines. Each task requires computational and data resources to proceed. The scheduling problem is related to allocating these resources to meet all timing requirements.

Let us place this in the context of our industry, where automotive electronic and electrical systems are evolving towards the integration of traditionally separate domains. As automotive functions aggregate on common hardware, the importance of the system’s accurate timing behavior is increasing. This is especially true for autonomous driving (AD) systems, which combine complex functionalities with stringent safety requirements. Their software functionalities must coexist and share resources with other software-driven automotive functions without compromising their real-time safety requirements, thereby raising the standards for automotive operating systems. Furthermore, AD imposes strict real-time requirements not only on individual functions but also on the priority dependencies between sensors, control software, and actuator networks. TTTech lab scientist Silviu S.Hybrid-critical systems require higher standards in safety-critical compliance, temporal and spatial isolation, leading to new scheduling functionalities needed at runtime and new configuration tools required at design time.

With the emergence of automated, connected, electric, and shared (ACES) mobility, these challenges are compounded by the increasing number of digital functions in vehicles. To respond adequately, modern vehicles not only need additional computational power to run increasingly complex software but also mechanisms that can truly guarantee individual and collective safety levels as well as the priority of all automotive functions. This is why safe real-time scheduling is at the core of the future of automobiles.

What is Real-Time Scheduling?

Distributed real-time computer systems execute multiple system functions simultaneously on their nodes. To ensure consistent execution of functionalities across the distributed system, all incoming events must be processed by their respective nodes in the order they occur.

Therefore, all interconnected components of the distributed system must be appropriately synchronized with each other to ensure reliable behavior of the entire system. For real-time distributed systems, it is also necessary to synchronize the local clocks of all nodes correctly with a reference clock that follows physical world indicators by aligning with International Atomic Time (TAI).

Nodes in real-time distributed systems must operate under the same global time

To define “correct” synchronization, we can imagine each node ticking with its local clock, which we will call micro-ticks. A subset of micro-ticks aligned within an acceptable tolerance range with the global time scale will be referred to as macro-ticks (or simply ticks) of global time. Thus, when these macros reflect the micros correctly chosen from the synchronized local physical clocks of the nodes under consideration, the concept of global time is introduced into the system.

In addition to synchronizing task execution across nodes, coordinated communication between nodes is also essential for the successful operation of the entire system. Each task consists of multiple individual operations or steps. During the execution of each task, a sequential program reads input data and the internal state of the node, determines and passes output results, and updates the internal state of the node.

All of this can be measured in time: the actual duration of a task is the time interval between the start and end of the task. The maximum possible duration of task execution under all possible input data and execution scenarios is called the Worst-Case Execution Time (WCET). In real-life scenarios, the closer the WCET is set to runtime, the more efficient the use of scheduling will be. Jitter is the difference between the WCET and the minimum duration of the task (the minimum time interval for task execution).

In hard real-time systems, all critical tasks must be completed on time

In hard real-time computer systems, safety-critical tasks must be executed under strict timing constraints. These tasks must meet all deadlines and be executed within very tight time intervals to achieve “global” real-time end-to-end guarantees. Scheduling is the method of allocating the resources of the computer system to tasks to ensure their accurate and timely execution. Based on this, real-time scheduling is a method that allocates tasks on the system’s nodes in such a way that the maximum allowed jitter value for any task on any node does not exceed, thereby ensuring predictable and thus safe behavior of the entire system.

Implementing Hard Real-Time Scheduling for Autonomous Vehicles

Real-time scheduling can be implemented in different ways. Let us consider the available options as shown below:

Classification of Real-Time Scheduling

The main difference between dynamic and static hard real-time scheduling is that dynamic scheduling is executed online at runtime, while static scheduling is executed offline before runtime. Dynamic schedulers are flexible and can make scheduling decisions “anytime, anywhere”. This means they adapt to evolving task flows, selecting tasks to schedule from the current set of ready tasks. The overhead of finding such schedules at runtime can be significant. Dynamic schedulers allocate task start times at runtime, while static schedulers have already found these times at design time. Static schedulers generate scheduling tables that contain a complete set of information about offline tasks, which are then scheduled at runtime.

Continuing with our classification diagram, we can see two different approaches to static and dynamic scheduling—preemptive and non-preemptive scheduling. In preemptive scheduling, some tasks can be interrupted (preempted) by other more urgent tasks. In non-preemptive scheduling, tasks can decide when to release resources (usually upon completion) to other tasks. If many short tasks are to be executed, it makes sense to schedule tasks non-preemptively.

Finding a feasible schedule is a complex process that can lead to higher efficiency across the entire distributed system. Certain task characteristics may assist in this search. One particularly useful measure for schedulability is the task request time, which is the time at which a task is requested for execution. Based on task request time, we can distinguish between two types of tasks: periodic and sporadic. After the first request for a periodic task, all future request times can be known by adding multiples of the known period to the initial request time. For sporadic tasks, the request time is unknown before its activation. In this case, the schedulability criterion is whether there is a minimum interval between any two request times of sporadic tasks. Additionally,

Considering the available options, what is the correct approach to real-time scheduling for AD systems? We have pointed out that AD heavily relies on hard real-time behavior. Looking inside AD systems, we see many concurrent tasks sharing common resources and exchanging data. In real-world scenarios, this involves considering many dependencies, such as higher-priority tasks taking precedence over lower-priority tasks and mutual exclusion constraints between tasks to find suitable scheduling. Faced with these challenges, dynamic scheduling techniques struggle to guarantee the urgent deadlines of these tasks, especially considering the need to manage communication between E/E architecture nodes. In other words, the expressive power of dynamic models is insufficient to effectively capture all these real-life demands, and once the system exceeds the (extremely oversimplified) assumptions in the dynamic model, it becomes nearly impossible to ensure guarantees. Therefore, to achieve the required level of predictability and effectively ensure the safety of autonomous driving functionalities, we need a static pre-runtime scheduling instead.

Static scheduling is determined at design time based on the overall events and execution activities, applicable to systems of varying complexity, including distributed real-time systems. It handles pre-planned resource usage and pre-planned access to communication media in distributed systems. Although computer systems cannot control external interrupts, the time points at which these incoming events will be served can be defined a priori, based on assumptions about each class of events. We can view static scheduling as a search for feasible scheduling. The goal of this search is to find a complete timetable that considers all priority relationships and mutual exclusion constraints between tasks and ensures that all tasks are completed before their deadlines. Heuristic functions can be applied to the search to improve and optimize results. Award-winning papersMapping and Scheduling Automotive Applications on ADAS (Advanced driver-assistance systems) Platforms using Metaheuristics proposed this search optimization strategy.

Although static scheduling is a method to achieve high predictability in advanced systems, it can only handle situations where the dependencies between tasks are already known at design time. To be able to address variable real-world use cases, certain methods have been introduced to increase the flexibility of static scheduling.

One of these efficiency measures is mode switching. ADAS applications operate in different operational modes, such as parking mode or highway mode. These two modes do not require the same services and functionalities. If tasks are scheduled separately, better resource utilization can be achieved based on the active mode. Resources are then allocated to the time and place they are needed. System designers must identify all operational and emergency modes and calculate static scheduling for each mode offline, then activate the mode-switching schedule accordingly when a mode switch is requested at runtime.

These are the elements that static real-time scheduling brings to ensure safe AD system behavior and optimal use of computational resources under a wide range of possible scenarios and their personalized requirements. They must ensure the correctness and responsiveness of the system.

Practicing Safe Real-Time Scheduling with MotionWise

The aforementioned ACES development is driving the automotive industry towards software-defined and centralized E/E architectures. With this trend, it is expected that vehicle operating systems will evolve towards supporting all vehicle functions from a few central computing centers while ensuring the availability and safety of the entire system. The demand for testing and verification work is increasing, and there is a need to simulate real scenarios to achieve accurate and reliable system behavior. This integration of fields, along with different applications of mixed-criticality integrated into the system, requires a system design that supports the guarantee priorities of critical software functionalities and their resilience against potential disturbances.

Enter MotionWise, a software safety platform that combines runtime services and design-time tools to address these issues, providing end-to-end guarantees for software behavior. By defining and enforcing predefined execution boundaries for each application and implementing pre-runtime static scheduling, MotionWise orchestrates the mechanisms provided by lower-level operating systems to ensure real-time behavior across the entire system. With its tools, MotionWise can seamlessly develop automotive software along the development lifecycle, including design, integration, testing, and verification.

Let us delve into the working technologies here: MotionWise provides an execution manager that deterministically coordinates the scheduling of multiple applications across all hosts in the system. All communication between hosts is achieved through deterministic networks and coordinated by the MotionWise Communication Manager Stack. The global scheduling concept ensures that all task scheduling on the hosts is consistent with the network scheduling of the backbone communication network connecting these hosts. Since global scheduling requires the same time concept across all hosts, dedicated MotionWise functionalities must support reliable time synchronization. Furthermore, all of this is abstracted from the user’s perspective through MotionWise planning tools and the execution onboard software stack.

The complexity of modern distributed systems, combined with extremely complex and unpredictable real driving conditions, creates a challenging starting point for finding appropriate schedules that can meet the safety requirements of highly automated, software-defined vehicles. Therefore, creating effective schedules requires a global view: system-wide planning and global time considerations. This often means working in multi-host, and sometimes even multi-ECU environments. The global scheduler creates schedules for the entire heterogeneous, multi-ECU, multi-SoC, multi-core system. This is achieved through MotionWise Creator, which generates effective schedules based on user-defined constraints. This tool can also provide information when a feasible schedule cannot be formulated.

MotionWise Global Scheduler Functionality

As mentioned earlier, real-life situations often introduce highly complex task flows, requiring proper management of a large number of interdependent and mutually exclusive tasks. To ensure that all these tasks are well organized and scheduled according to their interdependencies and latency requirements, MotionWise groups these tasks into so-called computation chains (CC). Additionally, they can be scheduled on any host and have different time slots. At runtime, the deterministic execution of the CC is ideally ensured by the WCET of its tasks and message transmissions. If the WCET cannot be accurately estimated, it cannot be guaranteed that a specific task will not exceed its scheduled time. In practice, runtime budgets are iteratively estimated to minimize the likelihood of overruns. In the rare cases where overruns still occur, MotionWise detects, reports, and triggers appropriate error responses. Incoming tasks can be categorized as event-driven, data-driven, or time-driven. For example, planning and control are time-driven, while multiple asynchronous sensor inputs are event-driven. An example of a data-driven task is the perception layer, which consists of a series of processes that have data dependencies between them.

Computation Chain

As different applications run on different hosts based on their requirements and priorities, ADAS system architects decide which strategies to apply to which applications to meet all functional and safety requirements.

Time-triggered scheduling strategies can execute relevant time-driven tasks with high determinism. These tasks are activated and executed within their respective fixed-length time slots and occur periodically at predetermined time points. The scheduling tables are generated offline and statically deployed. This approach allows for high non-interference; however, tasks may not require the entire allocated time budget to execute them (especially those with highly variable runtimes), thus requiring more efficient use of resources in these scenarios. Since even safety-critical tasks may have variable runtimes, it is crucial to use resources effectively on each host. Additionally, in the overall processing activities of our hard real-time systems, not only periodic tasks but also sporadic tasks need to be considered. This seems to be a weakness of static scheduling.

Clearly, the modern automotive industry needs flexible scheduling mechanisms to cover a variety of possible use cases. In response to real-world driving conditions, MotionWise scheduling capabilities embody the perfect synergy between flexibility and safety. Our time-aware solution architecture enables the execution of various applications by allowing the completion of event-driven and data-driven tasks while still providing time guarantees. For example, periodic sensor inputs (camera image frames) are inherently asynchronous and must be processed in an event-driven manner. MotionWise Scheduling Service handles time-driven and event-driven tasks by integrating event-driven tasks (sensor inputs) into the time-aware architecture. Unlike the predefined time slots for time-driven tasks,

Real-Time Scheduling Algorithms for Autonomous Driving Safety

Event-driven tasks in our time-aware architecture

MotionWise also allows for data-driven scheduling, where users can define a set of data-driven tasks. The use case for data-driven methods is the dynamic sequential execution of tasks in the perception layer, where data-driven tasks can start immediately upon meeting prerequisites (input data being ready). Although the execution of individual tasks does not inherently depend on the completion of previous tasks, it is crucial that the end-to-end latency does not exceed the upper limit.

Optimizing resources through data-driven scheduling

MotionWise is both flexible and precise, showcasing different mechanisms that support this ultimate safety goal. We have mentioned that mode switching is a very useful method to improve the efficiency of static scheduling. To allocate resources effectively, it is important to adopt separate scheduling methods for different operational modes of the system. In other words, having a single scheduling table can lead to suboptimal resource usage, and generating schedules takes longer. MotionWise Scheduling Service supports switching between multiple pre-configured scheduling tables at runtime for mutually exclusive applications, such as highway navigation and parking assistance.

As discussed in this article, hard real-time scheduling is crucial for ADAS and the entire development of AD. The MotionWise platform provides all the functionalities needed to plan, build, and operate safe real-time E/E architectures today. As ACES mobility gradually becomes a reality in our world, it is imperative to ensure that human lives are never put at risk alongside technological advancements. At TTTech Auto, we understand the significance of this journey and work passionately every day to provide unconditional solutions for safety, which is the demand of truly prosperous modern mobility.

—END—

Related posts

Leave a Comment Cancel reply