TÜV Nord Functional Safety Discussion: Software Fault Injection Method (Part 1)

On the 26th of Every Month

TÜV Nord Industrial Services Functional Safety Discussion

Software Fault Injection Method (Part 1)

Abstract

Software fault injection is an important technical means for functional safety verification. This article aims to provide a basic overview of software fault injection and introduce existing fault injection techniques.

Introduction to Fault Injection Methods

In critical scenarios related to functional safety, intensive testing activities are crucial to ensure that new systems and built-in fault tolerance mechanisms operate as expected. Ensuring that systems operate normally in the event of a failure (Fail Operational) is a more complex issue than traditional testing content. The process of introducing faults into a system to evaluate its behavior and measure the efficiency of fault tolerance mechanisms (i.e., coverage and latency) is called fault injection.

The development of fault injection methods has progressed in parallel with digitalization.

Initially, digital systems only utilized simple hardware systems. Therefore, the first fault injection methods involved injecting physical faults into the target system hardware by assuming simple hardware fault models (such as bit flips or bit freezes) (e.g., using radiation, pin levels, power disturbances, etc.).

The increasing complexity of hardware has made these physical methods quite challenging, if not impossible, leading to the popularity of a new fault injection method based on runtime simulation of hardware faults through software (Software-Implemented Fault Injection, SWIFI).

As critical systems expand into other application areas, we see that the software components of these systems become increasingly complex, contributing to system failures. The maiden flight of the Ariane 5 rocket (June 4, 1996) is an example. During the test flight, the rocket deviated from its flight path and exploded less than a minute after launch, resulting in a loss of $500 million. The explosion was caused by erroneous data conversion in the software, from a 64-bit floating point to a 16-bit signed integer representation. This vulnerability stemmed from the reuse of software subsystems from previous missions without substantial retesting, as developers believed the mission was compatible with the new system.

SWIFI tools are used to inject errors into program states (e.g., data and address registers, stack, and heap memory) and program code (e.g., memory areas where code is stored before or during program execution). Unfortunately, in complex software-intensive systems, it is not possible to accurately simulate the impact of real software faults through SWIFI. This is because the scale of code lines used in automobiles has exponentially increased from tens of thousands to hundreds of millions over the past thirty years.

Compared to the first fault injection method, using fault injection to simulate the impact of real software faults (i.e., bugs), known as Software Fault Injection (SFI), is a relatively new approach. In fact, software fault injection involves introducing small changes into the target program code, creating different versions of the program (each version with an injected software fault).

The ISO 26262 standard specifies the use of error detection and handling mechanisms in software, as well as verification through fault injection.

Software fault injection is a hypothetical experiment that may originate from any stage of the software development process, including requirements analysis, design, and coding activities. The goal is to execute the target under a given workload and insert faults into specific software components of the target system. The primary objective is to observe the system’s behavior in the presence of injected faults, considering that these faults may reproduce reasonable failures that could affect the given software components during runtime.

Key Characteristics of Fault Injection Methods

A fault is the cause of an incorrect judgment or assumption about the system’s state, referred to as an error. A fault is an event that occurs when erroneous services are provided, which is perceived as an erroneous state by the user or external system.

The accuracy of the results obtained from fault injection activities largely depends on several key characteristics of the experiment, namely:

Representativeness

Refers to the ability of the fault load and workload to represent the real faults and inputs that the system will experience during operation. By defining a realistic fault model and accurately reproducing that fault model during the experiment, representativeness can be achieved.

Non-Intrusiveness

Requires that the instruments used during the fault injection process (such as fault insertion and data collection) should not significantly alter the actual behavior of the system. For example, executing additional code to disrupt the software state may lead to intrusiveness.

Repeatability

Refers to the characteristic that ensures statistical equivalence of results when performing fault injection activities multiple times using the same program in the same environment. Achieving this characteristic is not easy due to the many sources of uncertainty present in computer systems, such as thread scheduling and event timing.

Practicality

Refers to the effectiveness of fault injection in terms of cost and time. These factors include the time required to implement and set up the fault injection environment, the time to execute experiments, and the time to analyze results. This property requires that experiments be supported by automated tools to meet time and budget constraints.

Portability

Requires that fault injection techniques or tools can be easily applied to different systems for comparison. The portability of fault injection tools also refers to the ability of the tool to support multiple fault models and be extended with new fault models.

Characteristics of Software Faults

The injection of software faults requires precise definitions of the faults to be injected, which in turn requires a clear understanding and description of software faults. This is not easy to achieve because software faults are caused by human errors that occur during the development process, which manifest as erroneous instructions in the program that affect software artifacts.

To improve software reliability, several fault classification schemes have been proposed. Among the fault classification schemes, Orthogonal Defect Classification (ODC) is one of the most widely adopted models by researchers and practitioners and has been used in multiple studies to define fault models for software fault injection. ODC is a framework for classifying software faults aimed at obtaining metrics and quantitative feedback from the software development process;

Author: Zheng Wei

Functional Safety (SIL/ASIL) Certification Assessor

Functional Safety Engineer Qualification Course Authorized Instructor & Functional Safety Expert

ASPICE Provisional Assessor

National Registered Auditor

Member of the Technical Committee for Standardization of Industrial Process Measurement Control and Automation in China

Member of the Sub-Technical Committee on System and Functional Safety

Expert Member of the Drafting Working Group for National Functional Safety Standard GB/T20438

Leader in Organizing and Promoting the “TÜV Functional Safety Engineer” Course

About TÜV Nord

Functional Safety Engineer and Cyber (Information) Security Engineer Training

TÜV Nord Functional Safety and Cyber (Information) Security Engineer training enjoys excellent international recognition. This training is specifically designed for professionals in the fields of functional safety and cyber (information) security (such as process control, machinery safety, rail transportation, automotive safety, etc.). It covers functional safety requirements across multiple industries including IEC61508, IEC61511, ISO26262, IEC62061, ISO13849, etc., and cyber (information) security requirements across multiple industries including IEC62443, ISO21434, etc.

TÜV Nord’s functional safety and cyber (information) security series training aims to cultivate professionals and their relevant knowledge and practical capabilities in functional safety and cyber (information) security; to provide complete solutions for enterprises in personnel safety; and to assist enterprises in aligning with international functional safety and cyber (information) security technologies. Training examinations require participants to have project technical experience in functional safety and cyber (information) security fields and possess practical project handling capabilities.

In 2023, the annual planned TÜV Nord Functional Safety Engineer, Automotive Functional Safety Manager, and Cyber (Information) Security Engineer training will continue as scheduled. Early registration is welcome to enjoy discounted prices and priority access to textbooks.

About TÜV Nord Functional Safety and Cyber (Information) Security Services

TÜV Nord Functional Safety and Cyber (Information) Security Services

Mainly engaged in certification, assessment, training, and other fields related to functional safety and cyber (information) security services. Currently, TÜV Nord’s functional safety and cyber (information) security certification services in the Greater China region mainly include: functional safety and cyber (information) security training, project technical assessments, functional safety and cyber (information) security management system certification, functional safety expert certification, and more.

TÜV Nord’s functional safety and cyber (information) security services

Span various industries, including aerospace, rail transportation, automotive electronics, nuclear power instrumentation control systems, process automation safety instrument systems (SIS), valves, actuators, industrial machinery, escalators, smart grids, etc.

Experts in TÜV Nord’s functional safety and cyber (information) security services have participated in the OPEN Alliance Technical Alliance, NA 052 DIN Automotive Engineering Standards Committee, ISO/TC 22/SC 33 Working Group, FlexRay Consortium In-Vehicle Network Standards, AutoSAR, SAE International, SOTIF, and other standard committees or technical organizations, and have participated multiple times in the drafting of international standards for functional safety and cyber (information) security such as IEC61508, ISO26262, IEC62061, ISO13849, IEC62443, ISO21434, ISO21448, etc., possessing many years of R&D experience in safety-related systems and precise understanding of standards. Well-known enterprises in various fields have chosen TÜV Nord as a partner and have highly praised TÜV Nord’s professionalism and responsibility.

Related Services Please Contact:

Zheng Wei

Phone: 13402122657

WeChat: zhengwei_SIL

Email: [email protected]

Related posts

Leave a Comment Cancel reply