Author: ARM Strategic Information Technology Expert Andrew Hopkins
Today, the automotive industry is undergoing rapid transformation, with design, usage, and sales models evolving quickly. Driver safety technologies, traffic congestion, environmental issues, and the fundamental premise of vehicles as transportation tools are all influencing the development of the next generation of automobiles. To address these challenges, many automakers are trying to enhance computing capabilities to optimize vehicle control. The new standards issued by the European New Car Assessment Programme (EuroNCAP) stipulate that safety assistance functions such as lane change support are necessary conditions for obtaining a five-star safety rating.
The number of onboard processors is steadily increasing across all segments, with an average of 40-50 processors per vehicle, while some high-end models are now equipped with nearly 120 processors. According to Semicast Research, by 2022, the market size for electronic control unit (ECU) components under the engine hood will reach nearly $86 billion, a compound annual growth rate of 7% compared to $53 billion in 2015. Semiconductor manufacturers will have the opportunity to tap into a gold mine in automotive electronics.
High-tech chips can improve power system emissions, enhance safety performance, and utilize cellular networks to achieve interconnectivity between vehicles and road infrastructure. However, as systems become more complex, ensuring driver safety becomes even more critical, necessitating the creation of more automated, systematic, and proactive solutions—commonly referred to as “functional safety.”
What is Functional Safety?
In short, the ultimate goal of functional safety is to ensure that products operate safely, even in the event of an issue, allowing them to continue providing protection. Based on this concept, ARM prioritizes safety as a top priority, rather than simply following market trends, continuously enhancing research and development to launch more functional safety-related products.
Every industry establishes standards to guide future development and set minimum entry thresholds. In the automotive electronics industry, this standard is ISO 26262, which defines functional safety as “avoiding unreasonable risks due to electrical/electronic system failures.”
Standards in different fields are not entirely consistent; for instance, IEC 61508 for electrical and electronic systems and DO-254 for aircraft electronic hardware have their own definitions. It is also worth noting that they all have specialized terminologies and provide engineering development guidance, including target parameters. Therefore, determining the target market and establishing appropriate processes before starting product development is crucial, as modifying development processes midway will inevitably lead to inefficiencies. Figure 1 shows the different application standards for silicon IP. In practice, if multiple standards need to be met, common ground can be sought while listing exclusive requirements and then applying general quality management principles; safety must be the priority from the outset.

In practice, functional safety systems must be certified by independent assessors and comply with all safety standards. Achieving functional safety requires the ability to predict fault modes and to assess in real time whether the system is fully functional, partially degraded, or must shut down for a reboot or reset.
Not all faults will immediately lead to serious accidents. For example, a failure in a car’s power steering system may lead to sudden erroneous steering, but due to the inherent time delays in electrical and mechanical designs, the fault may not have immediate consequences; this delay is typically several milliseconds or more. ISO 26262 defines this as the fault tolerance time interval, with the duration depending on the potential accident type and system design. Therefore, it is understandable that the higher the safety requirements for the system, the more faults that could lead to unsafe events should be avoided.
Ideally, functional safety should not affect system performance; however, in reality, many existing safety measures can severely impact system performance, power, and area (PPA). How to mitigate the adverse effects on system performance and the rising costs of design and manufacturing while ensuring functional safety is a major challenge faced by designers.
Why is Functional Safety Necessary?
The functional safety of chip IP was once a niche area, of interest only to a few chip and system developers in the automotive, industrial, aerospace, and similar markets. However, with the rise of various automotive applications in recent years, the situation has changed dramatically. Beyond automotive, many other industries can also benefit from the increase in electronic devices, with functional safety being a prerequisite. Medical electronics and aerospace are two typical examples.
Autonomous driving has attracted a lot of attention in recent years, but it has always seemed elusive; however, with the proliferation of Advanced Driver Assistance Systems (ADAS) and rich media in-vehicle infotainment systems (IVI), while the era of highly automated driving is still distant, the prospects for autonomous vehicles have become increasingly clear. Drones of various sizes and the growing Internet of Things also urgently require functional safety, and ARM’s technology will be a significant support.
ARM Functional Safety Technology
Like other technology markets, emerging functional safety applications also require semiconductor drivers; this is not just theoretical, as the rapid innovation of products has already sparked strong interest among ARM partners. Most functional safety embedded systems need to have two core elements: safety protection and real-time processing, and the ARM Cortex-R series processors are tailored for these needs, providing high-performance computing solutions for embedded systems, ensuring high reliability, availability, fault tolerance, and/or strong real-time autonomous judgment capabilities. These features lay the foundation for achieving high safety integrity in ADAS and IVI systems, allowing for critical behavior processing, responding to safety-related interrupt events, communicating with other systems, and supervising complex functions with lower integration levels.
What is a Fault?
A fault can be systemic (such as human factors in specification and design processes) or related to the tools used. One way to reduce faults is to implement rigorous quality control processes, which must include detailed planning, reviews, and quantitative assessments. Proper planning for the use of certified tools is essential, as is the ability to manage and track requirement changes. ARM’s Compiler 5 has been certified by TÜV SÜD, assisting in safe development without the need for additional certification for the compiler.
Another type of fault is known as a random hardware fault. These can be permanent faults, as shown in Figure 2, such as a short circuit, or soft faults caused by natural radiation. Such faults can be addressed using solutions integrated into both hardware and software, making system-level technologies equally important. For example, Built-In Self-Test (BIST) can be applied during system startup and shutdown to distinguish between soft and permanent faults.

Countermeasures
The selection and design of fault detection and control measures are the favorite aspects for process designers, as they can showcase both system-level and micro-architecture-level technologies. Establishing a Fault Mode and Effects Analysis (FMEA) is a good start, listing all possible fault modes and their severity consequences. With this information, along with the designer’s deep understanding of complex systems, the most serious fault modes can be identified, and countermeasures can be designed.
There are several common techniques for addressing potential faults, including:• Diversified Checkers: Using another circuit to check if the main circuit has failed. For example, a checker can count the interrupts, continuously recording the total number of interrupts caused by human and system factors. • Complete Lockstep Replication: This technique is mainly used for the Cortex-R5 processor, instantiating an IP component (such as a processor) multiple times, using loops to create operational delays, generating temporal and spatial redundancy. Large capacity storage is usually shared by multiple instances to reduce the required area. Although this technique is very reliable, it is also very expensive. • Selective Hardware Redundancy: In this scheme, only critical parts of the hardware can be replicated, such as arbiters. • Software Redundancy: Hardware redundancy is often very complex and incurs indirect costs, representing an unreasonable use of resources. An alternative to hardware computation is to run the same calculation across multiple processor cores and check if the results match. • Error Detection and Correction Codes are another well-known technique, typically used to protect memory and buses. Various code types exist, but they all share a common goal of achieving higher redundancy with minimal additional bits, without duplicating all underlying data. In automotive systems, this cutting-edge technology can detect two-bit errors in a memory word with sufficient redundancy; and supports error correction.
Fault Logging
Once a fault is detected, it must be logged to help regulatory software assess the system’s health and safety status. Safety faults (such as memory corrections) and hazardous faults (such as irrecoverable hardware failures) must be logged separately.
Fault logging typically begins with fault counting, which can be recorded by system-level architecture or by IP counters for the number of signal events (similar to interrupts); to understand the reasons for these events, it is also best to reference past events to determine the cause of occurrences at the current time. To support this need and facilitate debugging and troubleshooting, some IPs can capture additional information such as the monitored memory addresses. Since this address is usually preserved by a soft reset, it can be read during system startup and self-check processes.
One point to remember is that faults can also occur within the safety architecture itself. Unlike hardware faults, which are typically quickly detected during use, faults within safety checkers may be latent, which cannot detect hazardous faults, but the fault has already quietly spread. Such faults are called latent faults, and regularly testing checkers is a good method.
Safety Integrity Levels
Different standards systems reflect safety integrity levels in various ways, but their main purpose is to intuitively reflect the criticality of functions. For example, the ECU controlling windshield wipers, airbags, or brakes must have a higher integrity than the ECU controlling speedometers or parking sensors, as visibility is crucial, and sudden braking or airbag deployment could have fatal consequences, putting the driver at great risk; while the importance of speedometers or parking sensors for safe parking is much lower.
In other words, the safety integrity level is related to the necessity and ability of people to avoid dangerous situations; the role of various standards is to guide people on how to define safety integrity levels and provide relevant parameters to help quantify system integrity.
IEC 61508 divides safety integrity levels (SIL) into four levels, with level 4 being the highest integrity. Similarly, ISO 26262 proposes Automotive Safety Integrity Levels (ASIL), with the lowest being ASIL A and the highest ASIL D. Furthermore, as shown in Table 2, ISO 26262 provides recommended parameters for single point failures, latent faults, and hardware failure probability indices (PMHF, also known as timely failures) for ASIL B to ASIL D. The proportion of detectable faults is referred to as diagnostic coverage.

Although these indicators are usually viewed as standard requirements, in practice, they are generally regarded as recommendations, and suppliers can set their own target parameters. The most important goal is to create safe products, rather than simply adding a few numbers to the product specification. Again, using the previously mentioned examples—windshield wipers, brakes, and airbags—these components’ safety levels may reach ASIL D, while speedometers and parking sensors may be ASIL B or lower, depending on the overall system safety design.
No matter how high the diagnostic coverage, appropriate processes must be followed when developing functional safety applications—this is also the greatest benefit of the standards system. Furthermore, regardless of the functional safety measures adopted, strict quality processes can enhance the overall quality of any application.
Design Process for Functional Safety IP
When developing functional safety application IP, it is crucial to “follow the rules”. This process must incorporate safety considerations from the very beginning and must foster a culture that supports safety.
A complete development process must include the following key aspects: • Safety management: including team organizational structure, such as defining roles and responsibilities for various positions, building a safety culture, defining the safety lifecycle, and defining functional safety support levels. The safety lifecycle setting includes developing a successful plan, selecting appropriate development tools, and ensuring the team receives adequate training. • Requirement management and traceability of fault detection and control measures (countermeasures). To accurately achieve requirement traceability, the requirement definitions must be clear, precise, and unique. The level of traceability depends on the integrity requirements, with documents potentially being of high level; the product needs to cover all aspects from fault detection to verification—planning processes must not be baseless and must undergo detailed validation. • Quality management is an extension and expansion of requirement traceability. Errata must be properly managed and utilized. ARM has rich experience in this area. Additionally, documenting and communicating processes is equally important. The safety document package for IP development is one way ARM supports its partners, and our relationship does not end when the customer receives the IP. For functional safety-related IP development, ARM defines two levels of safety document packages: • Support for standards up to ASIL B • Extended support for standards up to ASIL D
Each safety document package includes a safety manual detailing the processes followed, fault detection and control functions, applicable scenarios, and other information. We also provide a “Fault Mode and Effects Analysis Report” and case studies explaining how to achieve higher diagnostic coverage with IP; we also provide chip-level support for customers’ independent analyses. Furthermore, the document package clearly defines the development interfaces between ARM and the licensee.
Independent Safety Unit
The establishment and use of safety status reports require a step-by-step approach. This report is provided by chip developers, and information from all manufacturers must be comprehensively considered before delivering it to the customer. The most licensed chip IP is referred to as “Independent Safety Unit” (SEooC), and its designers do not need to understand how the chip will be used later. Therefore, the safety manual must outline the IP developer’s recommendations and instructions for chip usage to prevent misuse. Similarly, level 1 controller suppliers for OEMs can also use the SEooC model to develop safety functions. Thus, the safety document package at the IP level can be used throughout the value chain and is an important part of IP development.
Disclaimer: The content and images of this article are reprinted from BC-AUTO on the internet, sourced from Electronic Innovation Network. If there are copyright issues, please contact the administrator for deletion.