Click on the blue text above Tansi Laboratory
Get more automotive cybersecurity information
The automotive electrical/electronic (E/E) architecture is evolving towards centralized computing resources. This initially occurred in domain controllers and then shifted to regional and centralized approaches. As multiple real-time functions are integrated into regional controllers, the demand for processor performance increases, and the complexity of operating systems and software also rises. The industry is increasingly turning to Armv8-R based solutions, such as the Cortex-R52 and Cortex-R52+ CPUs (summarized in this paper as Cortex-R52+), to realize this vision of software integration. Some automotive chip manufacturers have incorporated these processors into high-performance microcontroller designs for regional platforms and safety islands. Meanwhile, automotive software vendors have developed solutions integrated with Armv8-R, leveraging the Arm architecture used in this processor family. This article summarizes the cutting-edge overview of this industry trend.
The hardware and software of new systems must meet the requirements of each individual workload carried by the device simultaneously. These include:
— Meeting the software dependencies of workloads, including libraries, operating system calls (including access to input/output), and application binary interfaces (ABI). Specific versions of these may be required. When a single operating system cannot meet all dependencies simultaneously, system software may include multiple operating systems.
— Performance, including the determinism required for real-time workloads, which may have different hard real-time response time requirements, from a few microseconds or more. Failure to meet stringent real-time demands will lead to incorrect operation of workloads. Some workloads may have more relaxed real-time requirements, as failing to meet these demands may result in performance degradation. Other workloads have no specific real-time requirements, and thus these software products operate on the best-effort basis.
— For functionally safe workloads, it is fundamental to meet the assumptions regarding the correct execution environment for the workload and to provide any assumed external safety mechanisms. The automotive safety integrity level (ASIL) of the execution environment and safety mechanisms must be equal to or higher than the level assigned to the workload.
— For workloads with lower safety relevance, it is best not to increase the workload’s ASIL merely because there are other higher ASIL workloads present on the device.
— Meeting security requirements such as confidentiality, integrity, privacy, and authenticity by ensuring other workloads cannot access sensitive data.
— The ability to update individual workloads, including over-the-air firmware updates (FOTA). This also includes a range of other topics, such as workload authentication, secure boot, and system-level updates. The FOTA of hypervisors is a significant topic that cannot be discussed in this white paper.
— For workloads derived from existing (legacy) applications, the ideal choice is to integrate workloads with minimal adaptation. When workloads are integrated into standalone system hardware designs, the software must prevent any behavior that has side effects affecting other applications in the system. For workloads associated with normative applications, certification may be required (e.g., in applications related to onboard diagnostics (OBD)). To avoid needing re-certification every time other workloads change, it must be demonstrated that other workloads do not interfere with certified workloads.
By hosting multiple workloads, the system hardware and software must provide appropriate isolation between workloads to ensure that one workload does not prevent another workload from meeting its requirements. If multiple operating systems are required, similar isolation requirements exist between each operating system. For functional safety, this type of isolation is referred to as fault isolation (FFI), which requires a mechanism to ensure that a failure related to one workload does not lead to failures in the execution environment and system safety mechanisms provided to another workload. Systems that provide this level of isolation between workloads also have the advantage of allowing each workload to be developed (and debugged) in isolation from other workloads. This is particularly important if workloads come from different vendors.
The system hardware and software provide the following isolation mechanisms:
— Logical isolation. Using privilege models and memory protection mechanisms to isolate states belonging to different workloads.
— Temporal isolation. Scheduling private resources and shared resources, partitioning and monitoring shared resources, and managing monitor timers to detect timing violations.
Cortex-R
ARM has a CPU processor combination designed to address a wide range of computing, from minimal, lowest power microcontrollers to ultra-high-performance server-class computing. Cortex-R processors have been developed to meet the needs of applications requiring real-time processing and are suitable for a range of different use cases, particularly in automotive applications where systems must respond within short and deterministic time frames to successfully meet system requirements. In many cases, these applications also include functional safety (and security) requirements, which increase the challenges faced by system integrators and developers. Cortex-R processors, such as the Cortex-R52+, can be used in standalone microcontrollers (MCUs) or as additional cores in SoC (system-on-chip) designs, for example, as a safety island.
The first Cortex-R processors, such as the Cortex-R5, were built on the Armv7-R architecture. However, since then, the architecture has evolved, and Arm’s Cortex-R52 and Cortex-R52+ processors implement the Armv8-R architecture, which helps address the increasingly complex issues of automotive real-time software and the transition from discrete dedicated controllers to functionally consolidated and combined controllers. The Armv8-R architecture adds support that enables better control of software within a single processor, providing code isolation and supporting repeatable and understandable behavior, including virtualization in real-time processors.
The Cortex-R52 and Cortex-R52+ processors are highly configurable and can be defined to meet the application requirements of implementers. Table 1 describes some configurability.
As part of the Armv8-R architecture, these processors provide additional exception levels for user space exception level 0 (EL0) and operating system space exception level 1 (EL1). This new exception level 2 (EL2) can be used to help manage hypervisors/separation kernels to assist in managing software on the processor, simplifying how partners control software access to shared resources and their interactions. This can be used to maintain isolation between tasks running on a single processor across the same operating system or across multiple operating systems.
Along with the new exception level, a two-stage memory protection unit (MPU) has been added, which can enforce processor access to different resources. The operating system can control the resources of the MPU at EL1, but the processor can implement a second stage of the MPU that can only be configured from EL2, with the hypervisor running from EL2.
Access to resources can be managed by software running at the new higher exception level 2. Application tasks can request access to the required resources through this software, which will enforce access using the two-stage memory protection unit (MPU). This approach is not limited to two different critical levels but can also support many different contexts with different protection features. Unlike memory management units (MMUs), the availability of the MPU can provide access management from Cortex-R processors to system resources without introducing additional, potentially scheduling interruptions, delayed search times, and loading page tables from memory. These items are difficult to manage and challenging to evaluate and guarantee timely completion.
The two levels of the MPU are:
— EL1 MPU, which is managed by the operating system to enforce separation between the operating system and application tasks/ISRs, and between application tasks/ISRs themselves. The EL1 MPU can be programmed by code running at EL2 or EL1.
— EL2 MPU, which can only be programmed by code running at EL2 and is used by a hypervisor to provide additional separation.
The Cortex-R52+ provides information outside the core to enable the system to establish and maintain control over access based on the software running. This is achieved by propagating the virtual machine ID (VMID) for device transactions, allowing the system to manage access to these resources. In the case of the Cortex-R52+, this is further extended by supporting buffer and memory transactions and requests, which are generated directly by the hypervisor running at EL2.
These Cortex-R processors integrate their own generic interrupt controller (GIC), shared by all CPUs within the cluster, to provide low-latency interrupts from the system. This can flexibly allocate and prioritize shared peripheral interrupts (SPIs) to any core within the cluster. The GIC supports the ability to signal both physical interrupts and virtual interrupts simultaneously and can capture interrupt accesses to EL2 for interrupt virtualization.
These processors have tightly coupled memory for highly deterministic, low-latency core access to code and data. They have multiple interfaces to external resources, including SRAM, main memory, and devices. Resources accessed via interfaces are allocated based on their address locations, allowing implementers to flexibly allocate space in their memory maps and manage the allocation of resources to be privately assigned to virtual machines.
Software Integration Mechanisms
Virtualization and Virtual Machines
As the number of software in vehicles increases, more applications are integrated onto a single microcontroller. This is particularly evident in the powerful central vehicle computer bridging the domain/regional controllers.
The hypervisor and virtual machine microcontroller using Cortex-R52+ can support integrated applications with the necessary separation of systems. Each application runs in its own independent instance—commonly referred to as a partition or virtual machine (VM).
Virtual machines typically consist of:
Some physical or virtual processor cores
— Some memory
— Some physical or virtual peripherals
— Some physical or virtual configuration registers
The software managing the virtual machine is typically referred to as the system monitor, separation kernel, or VM manager, running at the privileged level of exception level 2 (EL2) on Cortex-R processors. To some extent, the system monitor gives the guest software running inside the virtual machine the illusion that it runs on its own microcontroller without sharing microcontroller devices with other software in other virtual machines. A single physical processor core can host multiple virtual cores by context switching between virtual cores, just as an operating system context switches between processes. The context of a virtual core consists of the values of the general-purpose registers, floating-point registers, some system configuration registers, and the configuration of the EL1 MPU. In places where legacy software runs inside a VM, we want the virtual machine to appear as much like a real microcontroller as possible to avoid needing to change legacy software, rather than relinking, so that the customer software running in each VM uses separate memory.
Using SMPU and Peripheral Protection Mechanisms
The primary function of the SMPU is to control which bus masters (e.g., DMA controllers) can access which memory addresses. Cortex-R processor cores and other microcontroller components, such as Cortex-M cores and some peripherals, can act as bus masters. Typically, an SMPU will have a set of regions. Each region has a configurable starting address, a size, and is assigned to one or more bus masters (or, in more advanced designs, assigned to one or more virtual machines using the VM identifiers stored in the VSCTLR of the Cortex-R52+). Bus masters (or VMs) can only access memory within the regions assigned to them. The microcontroller can also include peripheral protection mechanisms to allow peripherals to be assigned to bus masters (or VMs). Here, a peripheral is assigned to one or more bus masters (or VMs), and then the peripheral protection mechanism prevents any other bus masters (or VMs) from accessing the registers of the peripheral.
By using SMPU and peripheral protection mechanisms, we can achieve cluster-level separation. That is, the memory and peripherals of a microcontroller can be partitioned among multiple virtual machines, where one virtual machine contains all cores in a Cortex-R core cluster. If virtual machines have different security levels (e.g., different ISO 26262 ASIL levels), then merely relying on the above mechanisms will not allow us to have multiple virtual machines in the same cluster. Each cluster has a generic interrupt controller (GIC) to route interrupts to the cores in the cluster. Each core has a separate GIC redistributor to handle software-generated interrupts (SGIs) and private peripheral interrupts (PPIs), but the GIC distributor used for handling SPI interrupts is common to all cores in the cluster. If we allow multiple virtual machines in the same Cortex-R52 core cluster to write to the memory-mapped GIC distributor registers, one VM could interfere with another VM’s interrupt configuration by changing it (either accidentally or maliciously).
Using EL2 for Para-virtualization
In addition to protection mechanisms like SMPU and peripheral protection provided by the microcontroller, the Cortex-R52+ itself also includes features to support virtualization. One of these features is the EL2 privilege level. EL2 is more privileged than the EL1 (monitor) level used by the operating system and the EL0 (user) level used by application code. A hypervisor runs at EL2, and code in customer software VMs runs at EL1 or EL0.
HVC (EL1 call) instructions can issue requests to the hypervisor using HVC, just as application software can issue requests to the operating system using SVC (HVC call) instructions. When software running at EL1 executes an HVC instruction, the Cortex-R52+ core switches to EL2 and accepts a hypervisor input exception. The hypervisor handles this exception and then returns to the guest OS at EL1. HVC instructions allow for virtualization. The guest OS knows it is running in a virtual machine, and the hypervisor provides an API (using HVC instructions) that the guest OS uses to request the hypervisor, insert device drivers into the hypervisor (EL2 device drivers), or request device drivers running in other virtual machines.
For example, consider how to use virtualization to allow multiple virtual machines to exist in the same cluster, despite sharing the GIC distributor. The SMPU (or core MPU) is configured to prevent virtual machines from accessing the memory-mapped registers of the GIC. When guest software wants to change its interrupt configuration, it issues an API request to the hypervisor. The hypervisor performs the necessary GIC configuration after first checking that the requested change will not interfere with other VMs. Virtualization can also be used to allow peripherals to be shared and create virtual peripherals. External devices, such as Ethernet controllers, can be shared similarly to the GIC. Fully virtual peripherals can also be created. For example, a virtual Ethernet controller can be created for communication between virtual machines running on the same microcontroller. In both cases, the hypervisor will contain an EL2 device driver that either manages access to shared peripherals or implements virtual peripherals. This is similar to how an operating system uses device drivers to manage access to peripherals shared by multiple processes or tasks.
Virtualization can be used as a solution for peripherals that do not support or only partially support hardware virtualization. Ideally, peripherals will support the virtualization principles for real-time systems described in the paper to avoid the need for aligned virtualization—at least in the data plane. Compared to device passing (where peripherals are driven directly by customer software), virtualization (and trapping and emulation) will always add some extra cycles.
Interrupt Virtualization
Interrupt virtualization EL2 alone does not allow us to share or virtualize interrupt-driven peripherals. The Armv8-R architecture defines that typically when an interrupt occurs, it interrupts the currently running code at the current privilege level. For example, if code is running at EL1 when an IRQ occurs, the interrupt will be handled at EL1 using the IRQ entry in the EL1 vector table, but if the IRQ occurs while code is running at EL2, the interrupt will be handled at EL2 using the IRQ entry in the EL2 vector table. To address this, the Cortex-R52+ supports interrupt virtualization. When interrupt virtualization is enabled (by setting the IMO and FMO flags in the HCR register and configuring the ICH_HCR register), FIQ or IRQ interrupts (exceptions) always cause the Cortex-R52+ to switch to EL2 and use the EL2 vector table for FIQ or IRQ interrupts. The hypervisor can then handle the interrupt or use the features of the Cortex-R52+ core to virtualize interrupts (by using list registers), so that when guest software runs at EL1/EL0, it receives virtual interrupts at EL1 using the EL1 vector table. For customer software, virtual interrupts are independent.
Since all interrupts are initially handled by the hypervisor, the hypervisor can decide whether an interrupt should be handled by the hypervisor itself, by an EL2 device driver, or should be virtualized and injected into the VM. This allows for the handling of shared/virtual peripherals driven by interrupts. If necessary, EL2 device drivers can also inject virtual interrupts into virtual machines. Interrupt virtualization also allows for “remote control” of virtual machines. For example, a privileged management VM or EL2 device driver running on one physical core can generate interrupts in a second physical core to request the hypervisor to do something on the second core, such as shutting down or restarting the VM running on the second core.
Of course, nothing is free, and interrupt virtualization increases the total time required to process interrupts. The exact overhead depends on many factors, including the arrival patterns of interrupts and how many different GIC interrupts are used. There are two timing issues related to interrupt virtualization:
1. Virtualizing interrupts at EL2 consumes processor time.
2. Because the priority of virtual interrupts is independent of the priority of actual interrupts, customer software may experience timing anomalies.
Virtual Processor Cores
We can also leverage interrupt virtualization to support virtual processor cores. For example, timer interrupts can be handled by the system monitor and used to drive a virtual core scheduler that decides when to context switch between virtual cores. Since guest software cannot prevent interrupts occurring at EL2, corrupted or malicious guest software cannot deny processor time to other guest software. Interrupts for virtual cores that are not currently running can be virtualized, queued in software, and injected into the virtual core when it next runs. Context switching between virtual cores typically requires reprogramming the EL2 MPU. It is also essential to consider the configuration registers for co-processors 14 and 15 (these registers are used to configure various aspects of the processor, such as environment, whether caching is enabled, and whether EL1 MPU is enabled). Some of these registers’ effects may affect all virtual cores on the physical core and the hypervisor. Other registers’ effects may be specific to virtual cores, such as the EL1 MPU region registers. The latter class of registers needs to be part of virtual registers.
If multiple virtual cores are hosted by a single physical core, consideration must be given to how to schedule virtual cores. The simplest approach is to use a static TDMA (Time Division Multiple Access) algorithm. TDMA algorithms have very low runtime overhead, are easy to understand, and are straightforward to calculate when virtual cores run in wall-clock time. The disadvantage of pure static algorithms is that they may lead to long delays when handling asynchronous events (such as interrupts). By carefully constructing static VM scheduling, long delays can be avoided to ensure that virtual cores running before handling interrupts do not have to wait too long. However, this may require a detailed understanding of the worst-case execution time of interrupts. Another approach for systems to handle asynchronous events with short latencies is to use dynamic scheduling algorithms. An example of a dynamic scheduling algorithm is reservation-based scheduling, where each virtual core acts as a delayable server. This algorithm has been used in some versions of the ETAS RTA-HVR hypervisor. This provides shorter latencies when handling asynchronous events.
Using virtual processor cores provides system designers with flexibility:— A virtual machine can be created that contains more cores than are available when using only physical cores. Additional cores may make the software structure easier—similar to using threads in an operating system. — Multiple virtual machines can be hosted by a single physical core. However, context switching between virtual cores incurs overhead, as the hypervisor must save a virtual core’s general-purpose registers, floating-point registers, associated configuration registers, and EL1 MPU settings, and then reload these registers for another virtual core. Virtual core context switches are similar to context switches between processes in an operating system. Using virtual cores will impact interrupt latency. With static scheduling algorithms, interrupts arriving for currently non-running virtual cores will not be processed until the next time the virtual core runs. By a dynamic scheduling algorithm that automatically switches to a virtual core handling interrupts, the time for virtual core context switching will be added to the interrupt latency. System designers need to assess which applications.
Recommendations for Software Integration
Deciding whether a hypervisor-type separation or OS-level separation is better?
The mechanisms described above allow a hypervisor to create an illusion that customer software has its own microcontroller and enforce separation between the virtual machines running customer software. If the applications to be integrated are tightly coupled (tight coupling means that applications call each other synchronously or depend on activities in all applications scheduled by the same scheduler), then integration using operating system containers may be more appropriate. The AUTOSAR operating system allows tasks and ISRs to be grouped into containers known as “operating system applications,” and the MPU can be used to separate these different containers. OS-level integration means using a single scheduler to schedule all tasks and better supports tightly coupled communication. New initiatives like the Flex-CP of the classic autosar system are being developed to support more dynamic application integration into the classic autosar system.
Using core-local resources and cluster-local resources
It is best to use memory and peripherals that are “close” to the cores using them. Each Cortex-R core has TCM that is accessed faster than other types of RAM. Typically, microcontrollers have cluster-local flash and RAM, and in some cases, peripherals (such as CAN and LIN controllers) can be assigned to a cluster. Besides being typically faster, using cluster-local resources usually results in less memory bus contention because cores accessing cluster-local resources may not have to compete for access to the memory bus with other cores in the cluster. Note that using resources close to the core may limit any migration of virtual processor cores between physical processor cores at runtime (if such migration is supported). For example, if a virtual core is connected to local flex-r core cluster 0, if migrated to cluster 1, the virtual core will run slower.
Considering Interrupt Latency and Real-Time Requirements
The table below summarizes a technical overview applicable to applications with different interrupt latencies and real-time requirements.
Recommendations for Future Microcontrollers
Providing Fine-Grained Peripheral Allocation for Virtual Machines
If peripherals can be allocated to virtual machines at a fine granularity, then the need for EL2 device drivers can be reduced. For example, being able to allocate a single controller area network (CAN) channel, or even a single general-purpose input/output (GPIO) pin to a virtual machine is beneficial. This reduces the need for aligned virtualization or trapping and emulation, thereby improving performance. While complete peripheral virtualization is ideal (see below), a compromise is to use virtualization/trapping and emulation to perform initialization and configuration of peripherals while allowing direct access to the data plane.
Supporting Peripheral Virtualization
Even better than fine-grained virtual machine allocation is complete peripheral virtualization. Here, a peripheral is configured by a hypervisor to present a single “view” to multiple virtual machines. Each virtual machine can then use the peripheral as if it had been allocated a unique instance of the peripheral.
Ensuring DMA is Virtualization Aware
When VMs use DMA transfers or peripherals using DMA transfers, DMA transfers must not allow VMs to read or write memory addresses that are typically prohibited by the SMPU or core MPU. Ideally, the DMA module/channel will automatically inherit the VM identifier of the VM configuring the DMA module/channel or the VM of the peripheral using the DMA module/channel. Then, the SMPU on the DMA transfer will at least check the VM identifier, and if the VM does not have permission to read or write memory involved in the DMA transfer, the DMA transfer will be blocked. To support this behavior, Armv8-R VMIDs should be distributed to peripherals and DMA controllers. If automatic inheritance of virtual machine identity is not possible, the DMA module/identifier should be assignable to a virtual machine programmatically by a privileged software entity so that SMPU control over DMA transfers can be achieved. It is important that when a single physical core may run multiple VMs, DMA is controlled by the VMs and not just the bus masters.
Adopting Available Virtualization Standards to Simplify Software Portability
For microcontrollers typically used to run classic AUTOSAR systems, virtualization is a relatively new topic, and naturally, microcontroller manufacturers are adding features to give themselves an edge over competitors. However, users of these microcontrollers want to develop software that can be moved to different microcontrollers if necessary. Therefore, the combination of microcontrollers and hypervisors needs to provide a relatively standard set of features and usage methods for users of microcontrollers. For this purpose, where industry standards exist, they should be adopted. Some work has already been done regarding the Armv8-R architecture, with a good example found in Arm’s paper on “Device Virtualization Principles for Real-Time Systems.”
Conclusion
The development of electronic electronic architecture, including regional controllers, requires further real-time software integration solutions. Classic autosar is effectively a standard in the automotive real-time software world, but further integration options, such as legacy software, must be advanced. The Armv8-R architecture with EL2 separation options is an excellent option for enabling intelligent integration options. However, how this integration option is used highly depends on the applications, and dedicated application requirements will define which integration approach is best suited. This paper presents various different techniques that can be used to support the integration of different types of applications. This will help system designers understand how best to use the Cortex-R52+ core to support application integration.
Sign Up Now
Tansi Laboratory AutoSec Intelligent Vehicle Security Attack and Defense Training Course, October, Shanghai