0x01 Basic Overview

Fault injection is a side-channel attack technique that introduces a certain form of disturbance or invalid state into a system to alter its behavior. Typically, in embedded hardware and electronic devices, this disturbance can take various forms. Common methods of fault injection in electronic devices include:

– Clock glitch injection (imposing erroneous clock edges on the input clock lines of the IC)

– Voltage fault injection (applying a voltage above or below the expected voltage to the IC power lines)

– Electromagnetic interference (introducing electromagnetic disturbances)

This article will focus on voltage fault injection, particularly the introduction of transient voltages outside the normal operating conditions of the target device’s power supply. These transient pulses or input voltage drops (glitches) affect the operation of the device and achieve specific effects. Commonly expected effects include “corrupting” instructions or memory within the processor and skipping instructions. Previous studies have shown that these effects can be predictably achieved and have also provided some explanations for the EM effects that may arise from various behaviors (caused by glitches).

However, there is a gap in published research regarding the association of faults and related EM effects with specific state changes at the processor level (i.e., what exactly happens within the processor at the moment a fault occurs that causes instructions to be corrupted or skipped). This article aims to quantify and delineate the state of the processor before, during, and after fault injection, and describe discrete changes in marked states, such as registers (including general-purpose registers) and control registers (e.g., $ pc and $ lr), and memory.

Special thanks to colleagues at Toothless Consulting, whose excellent blog article series introduced me to fault injection and inspired this project. Thanks also to Chris Gerlinsky, whose research on the security of embedded devices, particularly his talk on breaking the CRP on LPC series chips, has been a valuable resource for this project.

https://toothless.co/blog/bootloader-bypass-part1/

https://recon.cx/2017/brussels/talks/breaking_crp_on_nxp.html

0x02 Test Setup

The target device selected for testing is the NXP LPC1343, which is an ARM Cortex-M3 microcontroller. To control the input target voltage and coordinate glitches, a Digilent Arty A7 development board based on Xilinx Artix 7 FPGA was used. Custom gateware was developed for the Arty board to facilitate the control and triggering of glitches based on various factors. For the purposes of this article, the two main triggers used are a GPIO line that synchronously goes high/low with certain device operations, and the SWD signal corresponding to a “step” event. The source code for the FPGA gateway software is [available here](https://github.com/Ethan-ks/glitcherPlatform).

https://github.com/Ethan-ks/glitcherPlatform

To switch between standard voltage levels (Vdd) and glitch voltage levels (Vglitch), a Maxim MAX4617 multiplexer IC was used. It is capable of switching between inputs in as little as 10ns, making it suitable for generating glitch waveforms on the LPC 1343 power rail with sufficient precision and timing.

Studying the Impact of Fault Injection on Processor States via Step Debugging

As shown in the figure above, the Arty A7 monitors the “trigger” line, which is either the GPIO from the target or the SWD line between the target and the debugger, depending on the operating mode. When the expected conditions are met, the A7 will “glitch out” according to the provided waveform specification, triggering the switch between Vdd and Vglitch via the power multiplexer circuit, feeding it to the target Vcore voltage line. A Segger J-Link is used to provide debug access to the target, and the SWD line is also fed to the A7 for triggering.

To facilitate triggering of any SWD command, a quasi-system SWD receiver was implemented on the A7. The receiver parses the SWD transactions sniffed from the bus and outputs deserialized headers and transaction data, which can then be compared against pre-configured target values. This allows triggering of the glitchOut line based on any SWD data (e.g., STEP and RESUME transactions), providing a timed glitch for single-step instructions.

Studying the Impact of Fault Injection on Processor States via Step Debugging

Before performing any direct testing of faults during single-step instruction execution, observing the faults and their induced effects during normal operation helps to gain fundamental insights and provides a platform for hypotheses to be tested later. To provide an environment for observing the results of fault injection in various forms and durations, the program execution includes a simple loop that increments and decrements two variables. In each iteration, the value of each variable is checked against known target values, and when either condition is met, the execution will break out of the loop. Outside the loop, these values will be checked against expected values, and if they differ, they will be transmitted to the attacker’s PC via UART.

Using Binary Ninja reverse engineering software provides a visual representation of the compiled C. Since the provided assembly representation is generated from the machine code after compilation and linking, we can ensure it matches the behavior of the processor exactly (ignoring concepts like parallel execution).

Studying the Impact of Fault Injection on Processor States via Step Debugging

Although simple, this environment provides many interesting targets for fault injection. The loop includes memory access instructions (LDR, STR), arithmetic operations (ADDS, SUBS), comparisons, and branching operations. Additionally, the pulse from PIO2_6 provides the trigger signal for the glitchOut from the FPGA, which can target different areas/instructions throughout the loop based on the delay applied to that signal. The execution can be tracked using shunt resistors and transmission line probes to observe the power consumption of the ARM core.

The following waveform shows the GPIO trigger line (blue) and the power trace from the LPC (purple). The GPIO line goes high for one cycle and then goes low, marking the start of the loop. This is followed by a pattern repeated 16 times, indicating the 16 iterations of the loop. This is limited on either side by the power trace corresponding to the code used to write data to the UART and branches back to the start of the main loop.

Studying the Impact of Fault Injection on Processor States via Step Debugging

The following content was obtained:

1. A reference of the actual instructions being executed by the processor (disassembled via Binary Ninja)

2. A visual representation of that execution, which can be viewed in real-time as the processor executes (via power tracking)

3. A means of taking action within the system under test, allowing calibration based on the behavior of the processor (FPGA glitcher).

Using the above information, the offset between the fault and the trigger can be altered, and this timing can be (roughly) associated with a given instruction or group of instructions being executed. For example, by triggering a glitch at a certain time during the sixth repetition of the pattern on the power trace, we can observe that this part of the power trace seems to be cut off prematurely, and the value reported on the UART by the target reflects some malfunction or corruption during the sixth iteration of a certain loop.

Studying the Impact of Fault Injection on Processor States via Step Debugging

So far, the approach taken has been consistent with traditional fault injection parameter search techniques, optimizing system visibility using certain behaviors inherent in device operations (the GPIO line pulse here) to determine the most effective timing and glitch duration. While this provides a rough understanding of the impacts of successfully injected faults (for the example above, we can assume that some operation during the sixth iteration of the loop has changed, but more specificity is merely conjecture), it could be other possibilities such as skipped load instructions, memory corruption, or comparison failures.

To illustrate this, below is a parsing, sorting, and counting output of UART traffic from the target device after running the external loop for thousands of iterations. The glitch delay and duration remain constant, but the discrete state impacts on variable states at the end of the loop are quite extensive. Some entries are easy to infer, such as the first and most common result: B is the expected value after six iterations (16-6 = 10), but A is 16, so the skipped LDR or STR instruction may have left a value of 16 in that register due to a previous operation. However, other results are difficult to infer, such as entries containing ASCII text or variables with erroneous values that seem unrelated to the number of loop iterations.

Studying the Impact of Fault Injection on Processor States via Step Debugging

In some applications of fault injection, this degree of ambiguity is acceptable, such as breaking infinite loops, as sometimes seen in secure boot bypass techniques. However, for more complex attacks where specific operations need to be disrupted in the correct manner, higher specificity is required, necessitating a finer understanding.

Thus, the next innovative part of the research conducted in this article is to create a method for targeting single instructions with error injection attacks and utilizing debugging interfaces like SWD/JTAG to achieve instruction isolation and timing. Besides the research value provided by this work, the methods developed may also have practical applications in certain uncommon device scenarios, which will be discussed in later sections.

0x03 Overview of the SWD Protocol

SWD is a debugging protocol developed by ARM for debugging many devices, including the Cortex-M3 core on the LPC 1343 target board. From the [ARM Debug Interface Architecture Specification ADIv5.0 to ADIv5.2](https://developer.arm.com/documentation/ihi0031/latest/)

https://developer.arm.com/documentation/ihi0031/latest/

The Arm SWD interface uses a single bidirectional data connection and separate groups to synchronize data transfer. Operations on the line include two or three phases: packet request, acknowledgment response, and data transfer.

Of course, there are more features, but for the purposes of this article, we are really interested in data transfer, thanks to the special case of the Cortex-M3 debug registers: stopping, stepping, and continuous execution are all managed by writing to the Debug Halt Control and Status Register (DHCSR). In addition, write operations to this register are always prefixed with 0xA05F, with only the low 4 bits used to control the debug state – [MASKINTS, STEP, HALT, DEBUGEN] from high to low. Therefore, we can track STEP and RESUME actions by looking for SWD write transactions with data 0xA05F0001 (RESUME) and 0xA05F000D (STEP).

Because the protocol has the aforementioned bidirectionality, it is not as simple as matching bit patterns: depending on whether a read or write transaction occurs and which phase is currently being executed, data may be valid on either clock edge. It turns out that the simplest solution is to implement half of the protocol and discard the irrelevant parts, retaining only the data for comparison. Below is a Vivado ILA trace of the SWD implementation that successfully parses STEP transactions sniffed from the SWD line.

Studying the Impact of Fault Injection on Processor States via Step Debugging

0x04 Fault Injection

Thus, by single-stepping an instruction and sniffing the SWD line from the A7, it is possible to trigger a fault at the moment (or very close, within 10ns) when the debug device on the target board latches the data. Since the target requires several trailing SWCLK cycles to complete any operations the debug probe needs to perform, there is considerable swing space between the latched data and the actual execution of the instruction. In fact, with power tracking, it can be clearly shown when the processor activity begins after the completion of the SWD transaction.

As can be seen above, there is a certain delay around 4us, which is eternal at the A7’s 100MHz. By delaying the glitch to various offsets within the “bump” corresponding to the instruction execution, we can finally do what we set out to do here: glitch the single-stepping processor.

To produce more interesting results! A simple script was written via OpenOCD to manage the behavior of the debugger/processor. The script has two modes: one is a “fast” mode, which is the single-step speed that the debugger can keep up with to find the correct timing and waveform for the fault; the other is a “slow” mode, which checks the registers and stack before and after each fault event, highlighting any unexpected behavior. We will see some interesting results where faults occur in the middle of loading register instructions in the innermost loop, in which case, LDR r3 [sp] loads the previous value of variable A into r3 and increments it in the next instruction.

We can see no changes, indicating that these operations did not happen or were not completed – a skipped instruction. This reliably leads to a one-to-one difference in the device’s UART output: because one of the inc/dec operations is at play, the result of A/B is less/more than expected by 1, and is actually unrelated to the state of variable A.

Interestingly, this study shows that the effectiveness of fault injection is not limited to instructions that access memory (LDR, STR, etc.), but can also be used to affect the execution of arithmetic operations such as ADDS and CMP, and even branch instructions (although whether the instruction itself has been corrupted or whether corruption is occurring on the ASPR that determines branching requires further study). In fact, while the success rate does vary depending on the instruction, no instruction tested for this article has proven immune to single-step gliding.

Here we see the CMP instruction, which checks if A matches the expected target 0x10. We see that xPSR is not updated (which means the zero flag is not set, indicating that the value of CMP does not match, so the values of A and B are sent via UART). Interestingly, we see that r1 has been updated to 0x10, the same as the immediate value used in the original CMP. The machine code for CMP r3 is 0x10, it should be 0x102b. Given the possible explanations for the observed behavior, one might consider using an instruction like LDR or MOVS that might have moved the value into the r1 register.

While this does not provide a definitive answer for the observed behavior, the conjecture far exceeds the level of information obtained through power tracking analysis and similar techniques.

0x05 Research Summary

If you can access the device through a JTAG/SWD debugger, you can conduct fault injection research. I recently read a great blog post about how to leverage it to obtain a JTAG interface!

https://labs.ioactive.com/2021/01/taping-stack-for-fun-and-profit.html

However, there is a very common configuration for embedded devices where the research proposed here may prove useful for such configurations. Many devices, including the STM32 series (like the DUT in this article), implement a “high but not highest” security mode that allows limited debugging functionality but prevents read and write operations to certain memory areas, rendering many techniques that exploit open JTAG connections ineffective. This option is chosen because a more secure option is to completely disable debugging, as the latter leaves no option for fixing or updating device firmware (without a custom bootloader), and many OEMs may choose maintainability over security. However, in most such implementations, single-stepping is still allowed!

In this case, with a copy of the device firmware, similar to the probing setup described here or a combination of both, it may render time-consuming and tedious attacks almost trivial, as all calibration and timing parameterization conditions are typically required for fault injection attacks. Need to bypass secure boot on a partially locked device? Just interrupt on CMP to check the return value of is_secureboot_enabled().

Further research is needed to truly categorize the applicability of this method in real-time testing, but initial results are indeed promising. Further testing may be conducted on more realistic/practical device firmware, such as the secure boot scheme mentioned earlier.

Moreover, more directly, the second part of this article series will continue to focus on better understanding what happens within integrated circuits during fault injection attacks, particularly in complex integrated circuits (like CPUs). Over the past few months, I have been assembling an 8-bit CPU from the 74 series discrete components in my spare time, which, once completed, will serve as an ideal target for this research: the external clock is controllable/steppable, and each module is independent (oscillator, ALU, registers, etc.) (accessible via standard oscilloscope probes and other equipment).

References:

[1] J. Gratchoff, “Proving the wild jungle jump,” University of Amsterdam, Jul. 2015

[2] Y. Lu, “Injecting Software Vulnerabilities with Voltage Glitching,” Feb. 2019

[3] D. Nedospasov, “NXP LPC1343 Bootloader Bypass,” Aug. 2017, https://toothless.co/blog/bootloader-bypass-part1/

[4] C. Gerlinsky, “Breaking Code Read Protection on the NXP LPC-family Microcontrollers,” Jan. 2017, https://recon.cx/2017/brussels/talks/breaking_crp_on_nxp.html

[5] A. Barenghi, G. Bertoni, E. Parrinello, G. Pelosi, “Low Voltage Fault Attacks on the RSA Cryptosystem,” 2009

References and Sources:

https://labs.ioactive.com/2021/04/watch-your-step-research-into-concrete.html

Original source: 嘶吼专业版

0x01 Basic Overview

Related posts

Leave a Comment Cancel reply