Understanding ARM64 Inline Hooking Techniques

/ Today’s Tech News /

Recently, before Sora ignited the text-to-video track, domestic ByteDance also launched a revolutionary video model – Boximator. Unlike models like Gen-2 and Pink1.0, Boximator can precisely control the actions of characters or objects in the generated video through text. It was revealed that according to experimental data, Boximator maintains the original model’s video quality while possessing very powerful action control capabilities. It can also serve as a plugin to help existing video diffusion models improve generation quality. In response, ByteDance related personnel stated: Boximator is a research project on the technical methods of controlling object movement in the field of video generation, and it is currently not a mature product. There is still a significant gap in picture quality, fidelity, video length, and other aspects compared to leading foreign video generation models.

/ Author’s Introduction /

This article is contributed by Second-hand Programmer, mainly sharing related content on inline hooks inARM64 assembly, which is believed to be helpful for everyone! Thanks to the author for the wonderful article.

Original article link:

https://mp.weixin.qq.com/s/-WNNaAyUetiP5UWC5Re0SQ

/ Introduction /

This discussion focuses solely on inline hooks under ARM64, providing a simple demo merely as a starting point. For those interested in more details, please seek additional resources and explore open-source projects.

Inline hooks in ARM64 are considerably more complex compared to ARM, as many instructions are missing and direct access to the PC register is not possible.

ARM64 processors are compatible with the ARM32 instruction set, allowing the execution of ARM64, ARM32, and Thumb-2 (Thumb16 + Thumb32) instruction sets on ARM64 processors, but here we will only discuss the ARM64 mode.

/ Flowchart /

Inline hook is to implement a hook on a specified function, and its general process is illustrated in the following diagram:

First, it is necessary to replace the head instruction of the target function, modifying the value of the PC to the address of the hook function and jumping to execute it.

For the overwritten instructions, we will execute them within the hook function, corresponding to the red part in the diagram. Many issues may arise here, and those interested can find relevant information.

After the hook function has executed, the value of the PC needs to be modified to the address of the next instruction after the overwritten instruction, and then jump to execute it.

During these processes, we need to ensure that the values of the registers before and after executing the hook function remain the same, and that the executed overwritten instructions do not modify the registers unless their values are stored first.

/ Solution Design /

Step 1 – Original Program Instrumentation

The design idea is: first, the fundamental requirement for instrumentation code is to have a jump function. In ARM64, the PC cannot be directly read or written; so how do we change the PC? Typically, there are two types of jump methods: relative addressing and direct addressing.

Relative addressing has distance limitations; since our hook program is also a shared object, its position is not fixed, and the target shared object’s position is also not fixed, the distance between these two shared objects is also variable, making relative addressing unwise.

Direct addressing allows ARM64 to use BR X?? to directly jump to the 64-bit address stored in the X?? register. Hence, the plan should be:

LDR X0, [TARGET_ADDRESS] 
BR X0

However, this will overwrite the original value in X0, so we need to store some values of X0:

STP X1, X0, [SP, #-0x10]
LDR X0, [TARGET_ADDRESS]
BR X0

The first instruction above stores X1 and X0 on the stack; X1 is merely redundant, intended to satisfy the ARM64 stack’s requirement for 16-byte alignment, as there are no available PUSH instructions.

Since we altered the stack, we need to balance it, so when we finally return to the original program, we need to modify the stack values back:

STP X1, X0, [SP, #-0x10]
LDR X0, [TARGET_ADDRESS]     ; TARGET_ADDRESS is a dynamic value, more instructions will be needed after compilation to achieve this
BR X0
LDR X0, [SP, -0x8]

Here, there is a slight difference from our envisioned process; we need to wait for the hook function to finish executing before jumping to the last instruction above to restore the value of the X0 register.

It is important to note that our instrumentation instructions must consist of at least 4 instructions, occupying 16 bytes. If the function body is too small, the hook will lead to errors. Thus, this is an optimization point.

Step 2 – Hook Program

We first need to save the values of all registers. After saving the register values, we can utilize these registers. The saving structure is as follows:

The corresponding instructions are as follows:

sub     sp, sp, #0x20          ; sp = sp - 0x20, moving the pointer up by 2
mrs     x0, NZCV               ; store the status register value in x0
str     x0, [sp, #0x10]        ; store the x0 (status register) value at sp + 0x10, which is the PSR in the diagram
str     x30, [sp]              ; store the x30 value at sp, which is the X30 LR in the diagram
add     x30, sp, #0x20         ; X30 = sp + 0x20
str     x30, [sp, #0x8]        ; store the value at sp + 0x20 at sp + 0x8, which is the original sp value in the diagram
ldr     x0, [sp, #0x18]        ; set x0 to point to the stack top
sub     sp, sp, #0xf0          ; allocate space to store X0 - X29 registers
stp     X0, X1, [SP]
stp     X2, X3, [SP,#0x10]
stp     X4, X5, [SP,#0x20]
stp     X6, X7, [SP,#0x30]
stp     X8, X9, [SP,#0x40]
stp     X10, X11, [SP,#0x50]
stp     X12, X13, [SP,#0x60]
stp     X14, X15, [SP,#0x70]
stp     X16, X17, [SP,#0x80]
stp     X18, X19, [SP,#0x90]
stp     X20, X21, [SP,#0xa0]
stp     X22, X23, [SP,#0xb0]
stp     X24, X25, [SP,#0xc0]
stp     X26, X27, [SP,#0xd0]
stp     X28, X29, [SP,#0xe0]

Since there are no LDM/STM instructions, we can only store a large number of registers to the stack one pair at a time.

Next, we can execute the previously overwritten code; this will not be demonstrated here, but a demo will be written later.

Then we will restore the registers:

ldr     x0, [sp, #0x100]     ; assign the address storing PSR to X0
msr     NZCV, x0             ; restore the value of the status register

ldp     X0, X1, [SP]         ; restore the values of X0 and X1
ldp     X2, X3, [SP,#0x10]
ldp     X4, X5, [SP,#0x20]
ldp     X6, X7, [SP,#0x30]
ldp     X8, X9, [SP,#0x40]
ldp     X10, X11, [SP,#0x50]
ldp     X12, X13, [SP,#0x60]
ldp     X14, X15, [SP,#0x70]
ldp     X16, X17, [SP,#0x80]
ldp     X18, X19, [SP,#0x90]
ldp     X20, X21, [SP,#0xa0]
ldp     X22, X23, [SP,#0xb0]
ldp     X24, X25, [SP,#0xc0]
ldp     X26, X27, [SP,#0xd0]
ldp     X28, X29, [SP,#0xe0]
add     sp, sp, #0xf0         ; set sp to point to X30 LR location

ldr     x30, [sp]             ; restore X30 register
add     sp, sp, #0x20         ; restore the sp value, origin_sp is unused

After restoring the registers, we need to jump back to the original function to continue executing the last line of the overwritten instruction:

ldr   x0, ret_addr
br    x0

Thus, a framework for a hook solution has been designed.

/ Example /

We will attempt to perform an inline hook on the fopen function in libc.so.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <sys/mman.h>
#include <dlfcn.h>

int main()
{
    void *handle = dlopen("libc.so", RTLD_NOW);
    void *hook_addr = dlsym(handle, "fopen");

    printf("hook_addr = %p\n", hook_addr);

    return 0;
}

Usage: dlopen and dlsym find the address of the fopen function:

sailfish:/data/local/tmp # ./inlinehook                                      
hook_addr = 0x724f5433d4

Run the compiled program to confirm that the address can be correctly obtained. If it is a 32-bit shared object, this address might be an odd number, indicating it is in thumb mode.

Next, we will write the hook function. It is important to note that compilers generate prolog and epilog code for general functions, and since we need complete control over the assembly instructions generated by the hook function, we need to use naked functions or write assembly directly.

void __attribute__((naked)) hook_func()
{

}

We need to overwrite the head instructions of the fopen function, which requires modifying the .text section. Since the .text section is not writable, we need to use the mprotect function to gain write permissions.

mprotect((void *)((uint64_t)hook_addr & 0xfffffffffffff000), 0x1000, PROT_WRITE | PROT_EXEC | PROT_READ);

We can check the /proc/pid/maps file to see if it has taken effect:

7454e64000-7454edc000 r-xp 00000000 103:12 946                           /system/lib64/libc.so
7454edc000-7454edd000 rwxp 00078000 103:12 946                           /system/lib64/libc.so
7454edd000-7454f2c000 r-xp 00079000 103:12 946                           /system/lib64/libc.so

It can be seen that indeed one segment’s permissions have changed to rwx.

Now, we can start overwriting instructions. However, a frustrating point arises: in ARM32, it is very simple, just use LDR:

LDR pc, [pc-4]
hook_func_addr

The above two instructions mean that the first instruction assigns the content of the next instruction to the PC, so the second instruction is not a real instruction but the address of the hook function, allowing for a very simple implementation of the hook function’s jump.

In ARM64, we cannot manipulate the PC register directly; we can only use the B instruction to jump:

LDR X0, [TARGET_ADDRESS]
BR X0
LDR X0, [SP, -0x8]

Here arises a problem: TARGET_ADDRESS is a dynamic value, and we cannot determine it at compile time. So what should we do? We need to adopt a special writing method:

STP X1, X0, [SP, #-0x10]   ; store x0, x1
LDR X0, 8                  ; load the content at the address pc + 8 into x0
BR X0                      ; jump to the address corresponding to x0
ADDR(64)                   ; target address
LDR X0, [SP, -0x8]         ; set x0 to point to the stack top

ADDR is the address of the hook function, assigning this address to X0 and then using BR X0 to jump. Note that this address occupies 8 bytes, so it effectively corresponds to 6 instructions.

LDR X0, 8 needs special explanation; this patch program generated by IDA produces an offset relative to the base address of the shared object, not relative to the PC. To generate instructions relative to the PC, the following writing method is needed:

LDR X0, .+8

I saw an open-source project that directly used LDR X0, 8, and I’m unsure if it’s a difference in the assembly language writing method.

In IDA, generate these instructions in assembly code and overwrite the first 6 instructions of the fopen function:

// Instructions are 4 bytes, using uint32, addresses use unit64

// STP X8, X0, [SP, #-0x60]  -> E8 03 3A A9
(*(uint32_t *)(hook_addr + 0)) = 0xA93A03E8;

// LDR X0, 8 -> 40 00 00 58
(*(uint32_t *)(hook_addr + 4)) = 0x58000040;

// BR X0  -> 00 00 1f d6
(*(uint32_t *)(hook_addr + 8)) = 0xd61f0000;

// ADDR  -> 00 00 1f d6, here needs to be 64-bit
(*(uint64_t *)(hook_addr + 12)) = hook_func;

// The overwritten instruction operated on sp, so we need to calculate the sp value (sp - 0x60 + 0x50)
// LDR X0, [SP, #-0x10]  -> E0 03 5F F8
(*(uint32_t *)(hook_addr + 20)) = 0xF85F03E0;

Here are some details to note:

We stored X8 and X0 at the position sp – 0x60 because the overwritten fopen instruction would operate on sp. Its function stack size is 0x50, so I placed X8 and X0 in a position that fopen cannot reach, to avoid losing our stored data due to the overwritten instruction modifying stack data.
Store X8 because later, when generating the instruction to jump back, GCC uses X8.
Store X0 because we use the X0 register for jumping, so it needs to be saved and restored later.

Once the overwrite instructions are set, we can write the hook function:

void __attribute__((naked)) hook_func()
{

    // Get parameters
    asm("ldr x0, [sp, #-0x58]");
    asm("str x0,%0":"=m"(x0));

    // Execute the overwritten instructions
    // .text:00000000000783D4 FF 43 01 D1                   SUB             SP, SP, #0x50           ; Alternative name is 'fopen'
    // .text:00000000000783D8 F7 0B 00 F9                   STR             X23, [SP,#0x10]
    // .text:00000000000783DC F6 57 02 A9                   STP             X22, X21, [SP,#0x20]
    // .text:00000000000783E0 F4 4F 03 A9                   STP             X20, X19, [SP,#0x30]
    // .text:00000000000783E4 FD 7B 04 A9                   STP             X29, X30, [SP,#0x40]
    // .text:00000000000783E8 FD 03 01 91                   ADD             X29, SP, #0x40
    asm("SUB             SP, SP, #0x50");
    asm("STR             X23, [SP,#0x10]");
    asm("STP             X22, X21, [SP,#0x20]");
    asm("STP             X20, X19, [SP,#0x30]");
    asm("STP             X29, X30, [SP,#0x40]");
    asm("ADD             X29, SP, #0x40");

    // Jump to return address; this statement generates assembly that uses the x8 register
    asm("ldr x0, %0" : :"m"(hook_return_addr));
    // Restore the x8 register value
    asm("ldr x8, [sp, #-0x8]");
    asm("br x0");
}

The hook function I wrote is relatively simple, mainly doing three things:

Get the value of X0, as X0 is the first parameter; we can assign X0 to a global variable and print it to check if the hook was successful.
Execute the overwritten instructions, which is straightforward; just copy the first six instructions of fopen.
After execution, we need to jump back to fopen to continue executing; we first calculate the instruction address, then use inline assembly to generate the corresponding instructions.

Besides using inline assembly, we can also implement it directly in assembly, which I am not very familiar with, but I will briefly introduce it. First, define a variable in the assembly file:

.global _shellcode_start_s

Then, use this variable as a label:

_shellcode_start_s:

    sub     sp, sp, #0x20

In another C file, we can directly reference this variable using extern:

extern unsigned long _shellcode_start_s;

void *p_shellcode_start_s = &_shellcode_start_s;

In this way, we obtain the starting address of the hook function. Interested readers can check out relevant open-source projects on GitHub.

/ Source Code /

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <sys/mman.h>
#include <dlfcn.h>

static uint64_t hook_return_addr;
static uint64_t x0, x1;

void __attribute__((naked)) hook_func()
{

    // Get parameters
    asm("ldr x0, [sp, #-0x58]");
    asm("str x0,%0" : "=m"(x0));
    asm("str x1,%0" : "=m"(x1));

    // Execute the overwritten instructions
    // .text:00000000000783D4 FF 43 01 D1                   SUB             SP, SP, #0x50           ; Alternative name is 'fopen'
    // .text:00000000000783D8 F7 0B 00 F9                   STR             X23, [SP,#0x10]
    // .text:00000000000783DC F6 57 02 A9                   STP             X22, X21, [SP,#0x20]
    // .text:00000000000783E0 F4 4F 03 A9                   STP             X20, X19, [SP,#0x30]
    // .text:00000000000783E4 FD 7B 04 A9                   STP             X29, X30, [SP,#0x40]
    // .text:00000000000783E8 FD 03 01 91                   ADD             X29, SP, #0x40
    // .text:00000000000783EC 56 D0 3B D5                   MRS             X22, #3, c13, c0, #2
    // .text:00000000000783F0 C9 16 40 F9                   LDR             X9, [X22,#0x28]
    asm("SUB             SP, SP, #0x50");
    asm("STR             X23, [SP,#0x10]");
    asm("STP             X22, X21, [SP,#0x20]");
    asm("STP             X20, X19, [SP,#0x30]");
    asm("STP             X29, X30, [SP,#0x40]");
    asm("ADD             X29, SP, #0x40");

    // Jump to return address; this statement generates assembly that uses the x8 register
    asm("ldr x0, %0" : :"m"(hook_return_addr));
    // Restore the x8 register value
    asm("ldr x8, [sp, #-0x8]");
    asm("br x0");
}

int main()
{
    void *handle = dlopen("libc.so", RTLD_NOW);
    void *hook_addr = dlsym(handle, "fopen");
    if (hook_addr != NULL)
    {
        hook_return_addr = hook_addr + 20;
    }
    else
    {
        return -1;
    }

    printf("hook_addr = %p\n", hook_addr);

    // Here we change the properties of the data in the range 0x1000
    mprotect((void *)((uint64_t)hook_addr & 0xfffffffffffff000), 0x1000, PROT_WRITE | PROT_EXEC | PROT_READ);

    // getchar();

    // Instructions are 4 bytes, using uint32, addresses use unit64

    // STP X8, X0, [SP, #-0x60]  -> E8 03 3A A9
    (*(uint32_t *)(hook_addr + 0)) = 0xA93A03E8;

    // LDR X0, 8 -> 40 00 00 58
    (*(uint32_t *)(hook_addr + 4)) = 0x58000040;

    // BR X0  -> 00 00 1f d6
    (*(uint32_t *)(hook_addr + 8)) = 0xd61f0000;

    // ADDR  -> 00 00 1f d6, here needs to be 64-bit
    (*(uint64_t *)(hook_addr + 12)) = hook_func;

    // The overwritten instruction operated on sp, so we need to restore
    // LDR X0, [SP, #-0x10]  -> E0 03 5F F8
    (*(uint32_t *)(hook_addr + 20)) = 0xF85F03E0;

    printf("hook_func = %p\n", hook_func);

    getchar();

    FILE *fp = fopen("/data/local/tmp/android_server64", "rb");
    uint32_t data;
    fread(&data, 4, 1, fp);
    fclose(fp);
    printf("data = %p\n", data);
    printf("x0 = %s\n", x0);
    printf("x1 = %s\n", x1);

    return 0;
}

Running the program outputs:

sailfish:/data/local/tmp # ./inlinehook                         
hook_addr = 0x7c1015a3d4
hook_func = 0x5d3d3099d4

data = 0x464c457f
x0 = /data/local/tmp/android_server64
x1 = rb

It can be seen that we successfully hooked the fopen function:

data is the first 4 bytes of the so file, which is .elf
x0 and x1 are parameters

Step 1 – Original Program Instrumentation

Step 2 – Hook Program

Related posts

Leave a Comment Cancel reply