Understanding Position-Independent Code in ARM Architecture

1. Why Do We Need Position-Independent Code?

First, we need to understand the boot process of ARM boards.

1. Exynos 4412 Boot Process

  1. First, let’s take a look at the Exynos 4412 memory map: Understanding Position-Independent Code in ARM Architecture It can be seen that: the iROM base address is 0x00000000 and the iRAM base address is 0x02020000.

Both of these memory blocks are in the SOC.

  1. Check the Exynos 4412 Booting Sequence:

Located in Chapter 5.

Understanding Position-Independent Code in ARM Architecture The above image shows the boot process when the Exynos 4412 is powered on and reset, which is roughly as follows:

<1> Execute a piece of code in the internal ROM (iROM) (firmware provided by the manufacturer), which mainly initializes some basic system configurations, such as initial clock settings, stack, and boot mode (corresponding to the marker in the figure).

<2> The code in iROM copies the BL1 image to the internal SRAM from the corresponding storage medium based on the boot mode obtained in stage one (OM_STAT register). BL1 mainly completes the initialization of system clocks and some timing configurations for the memory controller. After completing these tasks, the OS image is copied into memory (corresponding to the markers in the figure).

<3> Jump to the OS to execute.

SRAM has only 256KB, while the uboot image is generally larger than this size, which means it cannot copy the entire uboot image into SRAM. Therefore, it is speculated that the copying method here should be: “BL1 copies only a part of uboot”, which not only sets up the basic hardware operating environment but “also copies itself (uboot image) completely into memory”, and then uboot runs in memory, completing the copying and booting of the OS image.

In general, the addresses of the two do not match, and the address relocation process for the program in DRAM must be completed by the programmer.

This leads to the concept of “position-independent code”, which refers to code that can execute regardless of the address space specified at link time; it is special code that can be loaded into any address space and executed.

Uboot is moved to DRAM, and then jumps to DRAM to continue executing the remaining code of uboot. Therefore, the code before the move must be position-independent and cannot use absolute addressing instructions, otherwise addressing errors will occur.

2. How to Implement Position-Independent Code?

1. What is ‘Compile Address’? What is ‘Run Address’?

“Compile Address:”

For a 32-bit processor, each instruction is 4 bytes, and they are executed sequentially in 4-byte storage order. The CPU executes sequentially, and as long as there are no jumps, it will execute in order. The compiler assigns a compile address to each instruction, which is allocated during the compilation process, and we call this the compile address.

“Run Address:”

This refers to the actual address where the program instructions run, which is specified by the user. Wherever the user programs the run address, that is the run address. For example, if an instruction’s compile address is 0x40008000, the actual run address is also 0x40008000. If the user programs this instruction at 0x60000000, then the run address of this instruction will be 0x60000000.

What happens when the compile address and the run address are different? The result is that jumps cannot occur; the compilation will produce jump addresses, and if the actual address does not equal the generated address, then jumps cannot happen.

“C Language Compile Address:”

It is hoped that the compile address and the actual run address will be together, but assembly code does not need to convert C language to assembly, so it can directly write the address; hence it directly writes its run address. This is why any bootloader will initially have a section of assembly code, as the compile address and the actual address of the initial code do not match. This section of code is unrelated to assembly and is used for jump based on the run address.

2. Example

Implementing position-independent code mainly considers the following two aspects:

1. Position-independent function jumps 2. Position-independent constant access 

Next, we will explain in detail through two examples.

Code

The linker file used for compiling the code “map.lds” is as follows:

OUTPUT_FORMAT("elf32-littlearm", "elf32-littlearm", "elf32-littlearm") /*OUTPUT_FORMAT("elf32-arm", "elf32-arm", "elf32-arm")*/ OUTPUT_ARCH(arm) ENTRY(_start) SECTIONS { . = 0x40008000; . = ALIGN(4); .text : { gcd.o(.text) *(.text) } . = ALIGN(4); .rodata : { *(.rodata) } . = ALIGN(4); .data : { *(.data) } . = ALIGN(4); .bss : { *(.bss) } } 

As shown in the file map.lds: “0x40008000” is the linking address,

Other source files are as follows: “gcd.s”

.text .global _start _start: ldr sp,=0x70000000 /*get stack top pointer*/ bl func ldr pc,=func b main func: mv pc,lr 

“main.c”

/* * main.c * * Created on: 2020-12-12 * Author: 一口Linux */ int aaaa=0; int main(void) { aaaa = 0x11; while(1); return 0; } 

“Makefile”

TARGET=gcd TARGETC=main all: arm-none-linux-gnueabi-gcc -O1 -g -c -o $(TARGETC).o $(TARGETC).c arm-none-linux-gnueabi-gcc -O1 -g -c -o $(TARGET).o $(TARGET).s arm-none-linux-gnueabi-gcc -O1 -g -S -o $(TARGETC).s $(TARGETC).c arm-none-linux-gnueabi-ld $(TARGETC).o $(TARGET).o -Tmap.lds -o $(TARGET).elf arm-none-linux-gnueabi-objcopy -O binary -S $(TARGET).elf $(TARGET).bin arm-none-linux-gnueabi-objdump -D $(TARGET).elf > $(TARGET).dis clean: rm -rf *.o *.elf *.dis *.bin 

Disassembly File “gcd.dis”

Understanding Position-Independent Code in ARM Architecture As shown in the image above:

  1. The linking address of _start is 0x40008000
  2. Line 9 corresponds to the instruction bl func
  3. Line 10 corresponds to the instruction ldr pc,=pc
  4. The linking address of func is 0x40008010
  5. The global variable aaaa is located in the bss segment at 0x4000802c
  6. Line 19 aaaa = 0x11 assignment statement corresponds to the machine code

If we copy the generated bin file to memory location 0x40008000, it will definitely work,

bl func and ldr pc,=func can both jump to func function, while line 19 can also access the global variable aaaa. 

If we copy this program to another address, will it run normally?

Assuming we copy it to address 0, the program’s execution address needs to be rearranged starting from 0, that is, _start corresponds to address 0, and main corresponds to 0x18.

After copying to address 0, the memory layout:

Understanding Position-Independent Code in ARM Architecture After running at address 0, the **content of instructions (machine code)** in memory remains the same as before,

the value of pc will be corrected according to the actual run address.

  1. First, look at bl func

The corresponding assembly code is line 9; the machine code for this instruction is 0xeb000001, we have discussed the format of this machine code in “4. Learning ARM from Scratch – ARM Instructions, Shifts, Data Processing, BL, Machine Code”. This machine code format is offset from the pc position by one instruction. Because of the three-level pipeline, it should offset by three instructions down to the func position, so bl can still correctly find the func function.

Understanding Position-Independent Code in ARM Architecture
bl func
  1. ldr pc,=func corresponds to assembly code line 10;

Understanding Position-Independent Code in ARM Architecture We can see that it takes the corresponding memory value from the pc value +4, where pc value +4 is 14, corresponding to line 15, which writes 0x40008010 to the pc, Understanding Position-Independent Code in ARM Architecture

However, our bin file is only 44 bytes in size, so at this time, memory 40008010 does not contain any of the code we wrote. Therefore, ldr pc,=func cannot jump to func.

  1. c accesses the global variable aaaa

Corresponding to assembly code line 19;

Understanding Position-Independent Code in ARM Architecture
c accesses global variable aaaa

We can see that it takes the corresponding memory value from the pc value +4, where pc value +4 is 28, corresponding to line 22, which writes the value in r2 to the address 0x4000802c, and at this time that address is not the memory of the global variable aaaa,

so this instruction cannot find the memory of variable aaaa in the bss segment.

4. Summary

1. Position-Independent Code:

The CPU fetches instructions using relative addresses (for example, pc +4), as long as the relative address does not change, it can fetch and run. That is, this segment of code can run correctly regardless of where it is placed in memory. The reason is that the code does not use absolute addresses; all are relative addresses.

2. Position-Dependent Code:

Fetch and run using absolute addresses, which requires you to store the program (during the linking process) according to the requirements of the linking script (specified in the Makefile with -Ttext xxx or linking script). That is, its address is related to the position of the code, which is an absolute address, such as: mov PC, #0xff; ldr pc,=0xffff, etc.

3. Applications of Position-Independent Code:

1) Programs dynamically loaded into memory during runtime; 2) Programs loaded into memory after being combined with different programs in different scenarios (shared dynamic libraries); 3) Mapping between different addresses during runtime (such as bootloaders).

4. Conclusion

  1. Using “mov pc, xxx; ldr pc, xxx” is position-dependent code. These use absolute instruction addressing.
  2. Using “bl, b, adr, ldr” is generally position-independent code.
  3. When using “b, bl” to call functions in C language, do not use global variables, because the addresses of global variables in C are also generated based on the linking address.
  4. There is a significant difference between using = and not using =. “Without =: takes the value at that label, position-independent; with =: takes the address of that label, position-dependent”.

[Quiz] Why is the instruction for the reset exception in the uboot exception vector table a b reset, while other exceptions use the position-dependent code we discussed, ldr pc,XXXXXX?

The corresponding uboot exception vector table for ARM is as follows:

arch/arm/cpu/armv7/start.S 
Understanding Position-Independent Code in ARM Architecture
Insert image description here

Recommended Reading

[1] 12. How to Use UART Based on Cortex-A9 Detailed Explanation Must Read
[2] Network/Command Line Packet Sniffer Tool tcpdump Detailed Explanation Must Read
[3]13. Understanding Cortex-A9 RTC in One Article Must Read
[4]DNS Principle Introduction Guide
[5] 14. Learning ARM Cortex-A9 Watchdog Introduction from Scratch
[6] What is the Difference Between apt and apt-get?

Understanding Position-Independent Code in ARM Architecture

To join the group, please add Yikoujun’s personal WeChat, and I will guide you through embedded systems.

Reply ‘1024’ in the public account to get free learning materials, looking forward to your attention~

Leave a Comment

Your email address will not be published. Required fields are marked *