Welcome to the series of articles “Introduction to ARM Assembly Programming.” This series is designed to lay the groundwork for the upcoming “ARM Exploit Development Tutorial” (in progress). Before we dive into writing shellcode and constructing ROP chains using ARM assembly, we need to first grasp some fundamental knowledge about ARM assembly.
We will cover the following topics step by step:
First article: Introduction to ARM Assembly
Second article: Data Types and Registers
Third article: ARM Instruction Set
Fourth article: Memory Read and Write
Fifth article: Advanced Memory Read and Write
Sixth article: Conditional Branching
Seventh article: Stacks and Functions
To execute the examples in this article, you need to set up a learning environment for ARM assembly. If you do not have an ARM device (like a Raspberry Pi), you can create a virtual device using the QEMU emulator by following this tutorial (https://azeria-labs.com/emulate-raspberry-pi-with-qemu/). If you lack the basic knowledge of using GDB to debug programs, you can learn it in this tutorial (https://azeria-labs.com/debugging-with-gdb-introduction/). In this series of tutorials, we will focus on ARM 32-bit, and all examples will be compiled according to the ARMv6 instruction set.
Why Introduce ARM?
This tutorial is written for those interested in learning the basics of ARM assembly knowledge, especially those who want to write exploits on the ARM platform. You may have noticed that ARM processors are ubiquitous in your life. When I look around, I find that most of the devices around me use ARM processors rather than Intel. These devices include my phone, router, and the recently popular IoT devices. It can be said that ARM processors have become the most widely used CPU cores in the world. With this widespread use comes issues similar to those in the PC era; ARM devices are also vulnerable to attacks such as buffer overflows. Due to their widespread use and potential vulnerabilities, attacks targeting these devices will become increasingly common.
Currently, in the field of binary security, we have much deeper research on x86/x64 platforms than on ARM platforms, even though ARM assembly may be the easiest to learn among mainstream CPU instruction sets. So, why are there not more people focusing on ARM and studying ARM? Perhaps it is because most of the learning materials for exploit development are targeted at Intel platforms, with very few aimed at ARM platforms. For example, the famous Intel x86 exploit writing tutorial by Corelan Team (https://www.corelan.be/index.php/2009/07/19/exploit-writing-tutorial-part-1-stack-based-overflows/) has helped many interested in binary exploitation enter this field by learning and practicing the knowledge contained in this tutorial. If you are interested in exploit development on the x86 platform, the Corelan Team’s tutorial is a great starting point. In this tutorial, we will focus on the basics of ARM assembly and how to write exploits on the ARM platform.
ARM Processors vs. Intel Processors
There are many differences between ARM processors and Intel processors, the most significant being their instruction sets. Intel is a CISC (Complex Instruction Set Computing) processor. Therefore, it has a larger and richer instruction set and allows instructions to perform some complex memory operations. It also supports more complex operations and addressing modes, and the number of registers is much fewer than that of ARM. CISC processors are generally used in general-purpose PCs, workstations, and servers.
ARM is a RISC (Reduced Instruction Set Computing) processor. Therefore, it has a streamlined instruction set (around 100 instructions or even fewer) and more general-purpose registers than CISC processors. Unlike Intel processors, ARM instructions only operate on data in registers and use a load/store architecture to access memory, meaning that only load/store instructions can access memory. So if we want to increment a value stored at a memory address, we need at least three types of instructions (load instruction, addition instruction, and store instruction). First, we need to use the load instruction to load the value from the specified memory address into a register, then use the addition instruction to increment the value in the register, and finally use the store instruction to write the value back to memory.
There are pros and cons to the reduced instruction set. One significant advantage is that instructions can be executed faster (RISC processors reduce the number of clock cycles each instruction occupies through pipelining to shorten execution time). The disadvantages are also evident; fewer instructions increase the complexity of software (in fact, the compiler). Another important fact is that ARM has two operating modes (comparable to x86’s real mode and protected mode), ARM mode and Thumb mode. Thumb instructions can be 2 or 4 bytes long (more details will be covered in the third article: ARM Instruction Set).
More differences between ARM and x86/x64 include:
-
In ARM, most instructions can be used for conditional branching.
-
Intel’s x86/x64 series CPUs are little-endian.
-
ARM architecture was little-endian before ARMv3; thereafter, ARM processors provide a configuration option to switch between big-endian and little-endian.
In fact, not only are there differences between the ARM platform and Intel platform, but there are also many differences between different versions within the ARM platform itself. We strive to make this series of tutorials as general as possible so that you can have a comprehensive understanding of the ARM platform. Once you master the basics of ARM, it will be easier to learn about a specific version. The examples in this tutorial are created on a 32-bit ARMv6 (Raspberry Pi 1) platform, so the explanations related to the examples are targeted at this version.
We just mentioned that the ARM instruction set has different versions, which might confuse you. We will briefly represent the mapping relationship between the ARM instruction set versions and processor versions in the following table:
ARM Processor Family |
ARM Instruction Set Architecture |
ARM7 |
ARM v4 |
ARM9 |
ARM v5 |
ARM11 |
ARM v5 |
Cortex-A |
ARM v7-A |
Cortex-R |
ARM v7-R |
Cortex-M |
ARM v7-M |
Writing ARM Assembly
Before we dive deeper into writing exploits for the ARM platform, we need to understand the basic methods of writing programs using ARM assembly. Why do we use ARM assembly for programming? Don’t we have many high-level languages and scripting languages? If you want to reverse engineer ARM programs to understand the execution flow, construct ROP chains to implement your own ARM shellcode, or debug ARM programs, you need a foundational knowledge of ARM assembly.
To engage in reverse engineering and exploit development on the ARM platform, you do not need to know all the details of the ARM assembly language, but you should grasp the relevant core knowledge. This series of tutorials will introduce the necessary foundational knowledge, and if you want to learn more, you can visit the links listed at the end of this chapter.
So, what exactly is assembly language? Assembly language is a simple syntax layer over machine code, consisting of mnemonics that map to binary machine code. Binary machine code is the instruction set understood by the CPU. So why don’t we write machine code directly? I can only say that it would be a pain in the ass (original text: that would be a pain in the ass, finally learned how to say it in Chinese). Therefore, we use assembly language, which is easier for humans to understand. Of course, our computer cannot run assembly code; it needs machine code. We will use the assembler ‘as’ from the GNU Binutils toolset to convert assembly code into the corresponding machine code. ‘as’ will read assembly source code files with the suffix ‘.s’ and then output the assembled binary object files.
The final process is as follows: when you have written an assembly file with the suffix ‘.s’, you can use ‘as’ to assemble it, and finally use ‘ld’ to link it, as shown below:
$ as program.s -o program.o
$ ld program.o -o program
Diving Deeper into Assembly Language
In this section, let’s start from the very bottom, from the ground up, and see how assembly language works. At the lowest level of a computer system are circuits transmitting electrical signals. The signals are formed by controlling the voltage, switching between two levels, such as 0 volts (low level representing an off signal) and 5 volts (high level representing an on signal). For hardware systems, the specific values of the voltages in the circuits are meaningless, so we use abstract numbers 0 and 1 to represent the on/off levels of the circuits. Interestingly, 0 and 1 not only represent electrical signals but also constitute a binary system. On this basis, we group sequences of electrical signals (01 sequences), with each group representing a machine code instruction. Below is a schematic of a machine code instruction (not actual machine code):
1110 0001 1010 0000 0010 0000 0000 0001
So far, everything is going smoothly, but we will soon encounter our first difficulty: machine code sequences are hard to remember. To solve this problem, we introduce mnemonics, which are short names we assign to machine code instructions, typically consisting of 2 to 4 characters (this is not mandatory, as a few mnemonics may be longer). We can use these mnemonics along with operands that conform to the syntax rules of that mnemonic to form assembly instructions to write program code. This program code is called assembly program code. The collection of mnemonics used to represent machine code and the rules for their accompanying operands (i.e., the assembly instruction set) is called the assembly language of the computer. Therefore, assembly language is the lowest-level language used by humans to write programs. Here is an example:
MOV R2, R1
Now that we know that assembly program code consists of many assembly instructions, we need to convert it into the corresponding machine code. According to the above, for ARM assembly, the GNU Binutils project provides us with a tool called ‘as’ to accomplish this conversion. Using an assembler like ‘as’ to convert ARM assembly program code into ARM machine code is called assembling. In summary, we know that computers can read and understand sequences of electrical signals, and we can represent these sequences with 0s and 1s to inform the computer (this is machine code). We can use machine code to make the computer respond in certain ways, allowing us to program it. But these machine code sequences are hard to remember, so we name them, introducing mnemonics to represent instructions. These mnemonics and their corresponding operand syntax make up the assembly language, and we use an assembler to convert assembly program code into machine code. This process is similar to how a compiler converts high-level language into assembly code.
Further Reading
-
Whirlwind Tour of ARM Assembly. https://www.coranac.com/tonc/text/asm.htm
-
ARM assembler in Raspberry Pi. http://thinkingeek.com/arm-assembler-raspberry-pi/
-
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation by Bruce Dang, Alexandre Gazet, Elias Bachaalany and Sebastien Josse.
-
ARM Reference Manual. http://infocenter.arm.com/help/topic/com.arm.doc.dui0068b/index.html
-
Assembler User Guide. http://www.keil.com/support/man/docs/armasm/default.htm

This article was translated by the Kanxue translation team ljcnaix, sourced from Azeria Labs.
Please cite the source from the Kanxue community.
Popular Reads
-
Call for Topics | 2017 Security Developer Summit
-
2017 Security Developer Summit Tickets, 55% Off Limited Time Sale!
-
Analyzing PoSeidon Downloader and Keylogger
-
Under the Spotlight—The Potential Threat of Steganography
-
Fareit, Tricks and Countermeasures of a Top Malware Family
-
Evading Techniques in Malware, Mastering All Skills
Click to read the original article/read,
More valuable content awaits you~