Publicity for Information Security, Enlightenment of Information Security Knowledge.
Add WeChat group to reply to the public account: WeChat Group; QQ Group: 16004488
You can get free materials by joining the WeChat or QQ group:Learning Tutorials
Tutorial ListSee the bottom menu of the public account
Assembly language (assembly language) is a low-level language used for electronic computers, microprocessors, microcontrollers, or other programmable devices, also known as symbolic language. In assembly language, mnemonics replace the operation codes of machine instructions, and address symbols or labels replace the addresses of instructions or operands. In different devices, assembly language corresponds to different machine language instruction sets, which are converted into machine instructions through the assembly process. Generally speaking, a specific assembly language corresponds to a specific machine language instruction set, and cannot be directly migrated between different platforms.
Many assembly programs provide additional support mechanisms for program development, assembly control, and debugging assistance. Some assembly language programming tools often provide macros, which are also known as macro assemblers.
Assembly language is not as widely used for programming as most other programming languages. In today’s practical applications, it is usually applied at the lower level, in hardware operations and high-demand program optimization scenarios. Drivers, embedded operating systems, and real-time running programs all require assembly language.
I. Development History
To talk about the emergence of assembly language, we must first mention machine language. Machine language is a collection of machine instructions. Machine instructions can be elaborated as commands that a machine can execute correctly. The machine instructions of electronic computers are a series of binary numbers. The computer converts this into a series of high and low levels to drive the electronic components of the computer to perform calculations.
The computer mentioned above refers to a machine that can execute machine instructions and perform calculations. This is the concept of early computers. In our commonly used PC, there is a chip that performs the functions of the aforementioned computer. This chip is what we commonly refer to as the CPU (Central Processing Unit). Each microprocessor, due to different hardware designs and internal structures, requires different level pulses to control its operation. Therefore, each microprocessor has its own machine instruction set, which is the machine language.
Early programming was done using machine language. Programmers wrote program code composed of 0s and 1s on paper tape or cards, with 1 representing a hole and 0 representing no hole, and then input the program into the computer through a tape reader or card reader for computation. Such machine language consists purely of 0s and 1s, making it very complex, inconvenient to read and modify, and prone to errors. Programmers quickly realized the troubles brought by using machine language; it was difficult to discern and remember, posing obstacles to the development of the entire industry, thus assembly language was born.
The main body of assembly language is assembly instructions. The difference between assembly instructions and machine instructions lies in the representation method of the instructions. Assembly instructions are a more memory-friendly writing format of machine instructions.
1234567 | Operation: Move the content of register BX to AX 1000100111011000 Machine Instruction mov ax,bx Assembly Instruction |
After that, programmers began to write source programs using assembly instructions. However, since computers can only understand machine instructions, how can we make the computer execute the programs written by programmers using assembly instructions? At this time, a translation program is needed to convert assembly instructions into machine instructions, and this program is called a compiler. Programmers write source programs in assembly language and then compile them into machine code using an assembler, which is ultimately executed by the computer.
II. Language Characteristics
Assembly language is a programming language that is directly oriented towards the processor. The processor operates under the control of instructions, and each instruction that the processor can recognize is called a machine instruction. Each processor has its own complete set of instructions that it can recognize, known as the instruction set. When executing instructions, the processor takes different actions based on different instructions to complete different functions, which can change its internal working state and control the operation state of other peripheral circuits.
Another characteristic of assembly language is that it operates on objects that are not specific data, but rather registers or memory. In other words, it interacts directly with registers and memory, which is why assembly language execution speed is faster than other languages. However, this also makes programming more complex, because since the data is stored in registers or memory, there must be addressing methods to find the required data. For example, in the above example, we cannot directly use data like in high-level languages; instead, we must first retrieve data from the corresponding registers AX and BX. This increases the complexity of programming, because in high-level languages, addressing is handled by the compiler, while in assembly language, it is handled by the programmer, which increases the complexity of programming and the readability of the program.
Moreover, assembly language instructions are a symbolic representation of machine instructions, and different types of CPUs have different machine instruction systems, which means different assembly languages. Therefore, assembly language programs are closely related to the machine. Thus, aside from a certain degree of portability among assembly language programs for different models within the same series of CPUs, assembly language programs for different types (e.g., minicomputers and microcomputers) cannot be migrated. In other words, the universality and portability of assembly language programs are lower than those of high-level language programs.
Because of the characteristic of “machine relevance,” programmers can reasonably allocate various internal resources of the machine when writing programs in assembly language, keeping them in optimal usage states. Programs written this way have shorter execution code and faster execution speeds. Assembly language is the most closely related and direct programming language to hardware, and it is also the most efficient in terms of time and space. It is a compulsory course in computer application technology in higher education, playing an important role in training students to master programming techniques and familiarize themselves with machine operations and program debugging techniques.
Overall Characteristics
1. Machine Relevance
This is a low-level language oriented towards machines, usually specifically designed for a particular computer or series of computers. Since it is a symbolic representation of machine instructions, different machines have different assembly languages. Using assembly language can better leverage the characteristics of the machine to produce higher quality programs.
2. High Speed and High Efficiency
Assembly language retains the advantages of machine language, characterized by directness and simplicity, allowing effective access and control of various hardware devices such as disks, memory, CPU, I/O ports, etc., while occupying less memory and executing faster, making it an efficient programming language.
3. Complexity of Writing and Debugging
Due to direct control over hardware and the fact that even simple tasks require many assembly language statements, program design must cover all aspects, considering all potential issues and reasonably allocating and utilizing various software and hardware resources. This inevitably increases the burden on the programmer. Similarly, during program debugging, once there is a problem in running the program, it is difficult to identify.
Advantages
1. Since programs designed using assembly language are ultimately converted into machine instructions, they can maintain consistency with machine language, are direct and straightforward, and can access and control various hardware devices such as disks, memory, CPU, and I/O ports. Using assembly language allows access to all accessible software and hardware resources.
2. The target code is short, occupies less memory, and executes quickly, making it an efficient programming language often used in conjunction with high-level languages to improve program execution speed and efficiency, compensating for the shortcomings of high-level languages in hardware control, with very wide applications.
Disadvantages
1. Assembly language is machine-oriented and is considered a low-level language in the entire hierarchy of programming languages, usually specifically designed for a particular computer or series of computers. Different processors have different assembly language syntax and compilers, and compiled programs cannot be executed on different processors, lacking portability;
2. It is difficult to understand the programming intent from assembly language code, making it poorly maintainable; even simple tasks require a large amount of assembly language code, making it easy to produce bugs and difficult to debug;
3. Using assembly language requires a deep understanding of a specific processor, and optimization can only be done for specific architectures and processors, resulting in low development efficiency, long cycles, and monotony.
III. Composition of the Language
Data Transfer Instructions
This part of the instructions includes general data transfer instructions MOV, conditional transfer instructions CMOVcc, stack operation instructions PUSH/PUSHA/PUSHAD/POP/POPA/POPAD, exchange instructions XCHG/XLAT/BSWAP, address or segment descriptor selector transfer instructions LEA/LDS/LES/LFS/LGS/LSS, etc. Note that CMOVcc is not a specific instruction but a cluster of instructions used to decide whether to execute a specified transfer operation based on the status of certain bits in the EFLAGS register.
Integer and Logic Operation Instructions
This part of the instructions is used for performing arithmetic and logic operations, including addition instructions ADD/ADC, subtraction instructions SUB/SBB, increment instructions INC, decrement instructions DEC, comparison operation instructions CMP, multiplication instructions MUL/IMUL, division instructions DIV/IDIV, sign extension instructions CBW/CWDE/CDQE, decimal adjustment instructions DAA/DAS/AAA/AAS, logic operation instructions NOT/AND/OR/XOR/TEST, etc.
Shift Instructions
This part of the instructions is used to move register or memory operands a specified number of times. This includes logical left shift instructions SHL, logical right shift instructions SHR, arithmetic left shift instructions SAL, arithmetic right shift instructions SAR, logical left rotate instructions ROL, and logical right rotate instructions ROR.
Bit Operation Instructions
This part of the instructions includes bit test instructions BT, bit test and set instructions BTS, bit test and reset instructions BTR, bit test and complement instructions BTC, bit scan forward instructions BSF, and bit scan reverse instructions BSR.
Conditional Set Instructions
This is not a specific instruction but a cluster of instructions, including about 30 instructions used to set an 8-bit register or memory operand based on the status of certain bits in the EFLAGS register. For example, SETE/SETNE/SETGE, etc.
Control Transfer Instructions
This part includes unconditional transfer instructions JMP, conditional transfer instructions Jcc/JCXZ, loop instructions LOOP/LOOPE/LOOPNE, procedure call instructions CALL, subroutine return instructions RET, interrupt instructions INTn, INT3, INTO, IRET, etc. Note that Jcc is a cluster of instructions containing many instructions used to decide whether to transfer based on the status of certain bits in the EFLAGS register; INT n is a soft interrupt instruction, where n can be a number between 0 and 255, indicating the interrupt vector number.
String Operation Instructions
This part of the instructions is used to operate on data strings, including string transfer instructions MOVS, string comparison instructions CMPS, string scan instructions SCANS, string load instructions LODS, string save instructions STOS, these instructions can selectively use REP/REPE/REPZ/REPNE and REPNZ prefixes for continuous operations.
Input Output Instructions
This part of the instructions is used for exchanging data with peripheral devices, including port input instructions IN/INS and port output instructions OUT/OUTS.
High-Level Language Auxiliary Instructions
This part of the instructions provides convenience for high-level language compilers, including instructions for creating stack frames ENTER and releasing stack frames LEAVE.
Control and Privilege Instructions
This part includes no-operation instructions NOP, halt instructions HLT, wait instructions WAIT/MWAIT, escape instructions ESC, bus lock instructions LOCK, memory range check instructions BOUND, global descriptor table operation instructions LGDT/SGDT, interrupt descriptor table operation instructions LIDT/SIDT, local descriptor table operation instructions LLDT/SLDT, descriptor segment limit value loading instructions LSR, descriptor access rights reading instructions LAR, task register operation instructions LTR/STR, requested privilege level adjustment instructions ARPL, task switch flag clearing instructions CLTS, control register and debug register data transfer instructions MOV, cache control instructions INVD/WBINVD/INVLPG, model-specific register reading and writing instructions RDMSR/WRMSR, processor information retrieval instructions CPUID, timestamp reading instructions RDTSC, etc.
Floating Point and Multimedia Instructions
This part of the instructions is used to accelerate floating-point data operations, as well as to accelerate multimedia data processing single instruction multiple data (SIMD and its extensions SSEx) instructions. The data in this part of the instructions is very large and cannot be listed one by one; please refer to the INTEL manual.
Virtual Machine Extension Instructions
This part of the instructions includes INVEPT/INVVPID/VMCALL/VMCLEAR/VMLAUNCH/VMRESUME/VMPTRLD/VMPTRST/VMREAD/VMWRITE/VMXOFF/VMON, etc.
IV. Related Technologies
Assembler
A typical modern assembler constructs target code by interpreting mnemonic instructions of the instruction set into operation codes (OpCode) and resolving symbolic names into memory addresses and other entities. The use of symbolic references is an important feature of the assembler, as it saves tedious and time-consuming calculations for manual relocation after modifying the program. Essentially, it transforms machine code into letters, and during compilation, replaces the input instruction letters with obscure machine codes.
Compilation Environment
The symbolic program written in assembly language and other non-machine languages is called the source program, and the role of the assembly language compiler is to translate the source program into the target program. The target program is a machine language program, and once it is placed in the predetermined location in memory, it can be processed and executed by the computer’s CPU.
Overall, the debugging environment for assembly is relatively limited, and there are very few excellent compilers. The choice of compiler depends on the type of target processor and specific system platform. Generally speaking, a well-functioning compiler should be very convenient to use, for example, it should automatically organize formats, highlight syntax, integrate compilation, linking, and debugging into one, making it practical.
For widely used personal computers, the assembly language compilers available for free selection include MASM, NASM, TASM, GAS, FASM, RADASM, etc., but most do not have debugging functions. If the goal is to learn assembly language, Easy Assembler, which has a complete integrated environment, is a very suitable assembly compiler for beginners.
V. Development Prospects
Assembly language is a mnemonic for machine language, which is easier to read, write, debug, and modify compared to the tedious machine code. At the same time, excellent assembly language designers, through clever design, allow the code assembled from assembly language to execute faster and occupy less memory than high-level languages. However, the running speed and space occupancy of assembly language are based on high-level languages and require clever design, and some high-level languages also achieve high execution efficiency after compilation. Therefore, this advantage is gradually weakening. Moreover, when writing complex programs, assembly language has obvious limitations; it relies on specific machine types, cannot be generic, and cannot be migrated between different machine types. It is often said that assembly language is a low-level language, but this does not mean it should be abandoned. On the contrary, assembly language is still a language that bottom-level design programmers must understand in certain industries and fields, where assembly is indispensable. However, the largest area of computing today is IT software, which is what we commonly refer to as computer application software programming. In the hands of skilled programmers, programs written in assembly language have relatively improved running efficiency and performance compared to those written in other languages, but the cost is that more time is required for optimization. If one does not have a solid foundation in computer principles and programming, it may actually increase development difficulty, making it not worth the effort. Compared to software development before and after 2010, the software industry has become market-oriented. Coupled with the excellence and cross-platform capabilities of high-level languages, a company cannot allow a team to use assembly language to write everything, spending several times or even dozens of times more time. It is better to use other languages to complete the task as long as the final result is not significantly worse than that produced by assembly language, thus gaining a competitive edge in a market economy.
However, to date, no programmer dares to assert that assembly language is unnecessary to learn. At the same time, assembly language (Assembly Language) is a machine-oriented programming language. Skilled assembly programmers have partially moved away from software development and into industrial electronic programming. In industries that require strict language design requirements but are relatively compact, such as 4-bit microcontrollers, due to their capacity and calculations, electronic engineers usually handle the development and design of circuits and software control, with assembly being the main development language. The use of C language is minimal, and electronic development engineers are in high demand. In some industrial companies, a core electronic engineer earns more than any other staff member, and the salary of a general electronic engineer is often ten times that of a programmer. This situation arises because since the 21st century, although many people have learned assembly, very few have mastered it. It is more challenging to learn and use compared to high-level languages, and its range of application is small. Although it is simple, it is overly flexible, making it more difficult for those who have learned high-level languages to learn assembly compared to those who start with assembly. However, for someone with a comprehensive understanding of microcomputer principles, assembly language is a compulsory language.
VI. Practical Applications
As modern software systems become increasingly large and complex, many encapsulated high-level languages such as C/C++, Pascal/Object Pascal have emerged. These new languages enable programmers to develop more simply and efficiently, allowing software developers to meet the demands of rapid software development. However, due to its complexity, the applicable field of assembly language is gradually diminishing. This does not mean assembly language has lost its usefulness. Because assembly is closer to machine language, it can operate directly on hardware, and programs generated have higher running speeds and occupy less memory compared to those written in other languages. Therefore, it is widely used in programs that require high timeliness, many core modules of large programs, and industrial control.
Moreover, although there are many programming languages to choose from, assembly remains a compulsory course for computer science students in universities to deepen their understanding of computer operating principles.
Historically, assembly language was one of the most popular programming languages. With the growth of software scale, and the resulting demands for software development speed and efficiency, high-level languages gradually replaced assembly language. However, even so, high-level languages cannot completely replace the role of assembly language. Take the Linux kernel as an example; although the vast majority of the code is written in C, there are still unavoidable instances where assembly code is used in some critical areas. This part of the code is closely related to hardware, where even C language may fall short, while assembly language can effectively maximize hardware performance.
First, most assembly language statements directly correspond to machine instructions, resulting in fast execution speed, high efficiency, and small code size, making it particularly useful in situations with limited memory capacity that require rapid and real-time responses, such as in instrumentation and industrial control devices.
Second, assembly language can be used in the core parts of system programs and those that frequently interact with system hardware. For example, core program segments of operating systems, initialization programs for I/O interface circuits, low-level driver programs for external devices, frequently called subroutines, dynamic link libraries, and certain high-level drawing programs, video game programs, etc.
Third, assembly language can be used for software encryption and decryption, analysis and prevention of computer viruses, as well as program debugging and error analysis in various aspects.
Finally, learning assembly language can deepen the understanding of computer principles and operating systems. By learning and using assembly language, one can sense, experience, and understand the logical functions of machines, laying a technical foundation for understanding the principles of various software systems; and a practical application foundation for mastering the principles of hardware systems.
VII. Classic Textbooks
There are many textbooks on assembly language, covering various processors, roughly totaling over a hundred. Among these textbooks, the following can be categorized as widely used:
x86 Processors
1. “x86 Assembly Language: From Real Mode to Protected Mode”,Li Zhong, Electronic Industry Press, 2013-1.
Based on INTEL x86 processors, NASM compiler, and BOCHS virtual machine. Assembly language is the language of the processor; in this sense, since learning assembly language, one must program directly facing hardware, rather than using obscure DOS interrupts and API calls. This is an interesting book that does not spend time calculating tedious math problems. Instead, it teaches how to directly control hardware, display characters, read hard disk data, and control other hardware without relying on BIOS, DOS, Windows, Linux, or any other software support.
We know that 32-bit and 64-bit are mainstream, while real mode and DOS operating systems have become history. Linux and Windows operate in protected mode. This book emphasizes the 32-bit protected mode, and reading this book greatly aids in understanding the workings of modern computers and modern operating systems.
2. “Assembly Language” (2nd Edition), Wang Shuang, Tsinghua University Press, 2013-4-1
Based on INTEL 8086 processors, MASM compiler, and DOS platform, this assembly textbook focuses entirely on the real mode of the 8086 processor, not covering commonly used 32-bit and 64-bit modes, but it is very popular due to its clarity.
3. “80X86 Assembly Language Programming Tutorial”, Yang Jiwen et al., Tsinghua University Press, 1999-3-1
Based on INTEL x86 processors, MASM and TASM compilers, it includes content on both 16-bit real mode and 32-bit protected mode, with a more detailed explanation of the latter.
4. “32-Bit Assembly Language Programming”, Qian Xiaojie, Mechanical Industry Press, 2011-8-1
Based on INTEL x86 processors, MASM compiler, and WINDOWS platform assembly textbook.
5. “16/32-Bit Microcomputer Principles, Assembly Language, and Interface Technology”, Qian Xiaojie, Chen Tao, Mechanical Industry Press, 2005-2-1
Based on INTEL x86 processors, discussing the basic principles of 16-bit microcomputers, assembly language, and interface technology, leading to relevant technologies of 32-bit microcomputer systems.
6. “Intel Assembly Language Programming” (5th Edition), (USA) Irwin, Electronic Industry Press, 2012-7-1
Based on INTEL x86 processors, MASM compiler, and DOS/WINDOWS platform assembly textbook, covering both 16-bit real mode and 32-bit protected mode content.
7. “The Art of Programming in Assembly Language” (2nd Edition), (USA) Hyde, Tsinghua University Press, 2011-12-1
Based on INTEL x86 processors, using the author’s self-made High Level Assembler (HLA) as a teaching tool to partially gain the advantages and functions of high-level languages.
8. “x86 PC Assembly Language, Design and Interface” (5th Edition), (USA) Mazidi, Khosh, Electronic Industry Press, 2011-1-1
Based on INTEL x86 processors, covering both 16-bit real mode and 32-bit protected mode, with some introduction to 64-bit.
ARM and Microcontrollers
1. “Assembly Language Programming – Based on ARM Architecture” (2nd Edition), Wen Quangang et al., Beihang University Press, 2010-8-1
Based on ARM architecture processors, this is an introductory textbook for learning embedded technology.
2. “Learning AVR Microcontrollers from Scratch”, Xu Yimin et al., Mechanical Industry Press, 2011-1-1
An overview of microcontrollers, AVR microcontroller development tools, AVR microcontroller C language, basic structure of ATmega16 microcontroller, AVR instruction system and assembly system, etc.
3. “51 Microcontroller Simulation Practical Tutorial Based on Multisim10”, Nie Dian, Ding Wei, Electronic Industry Press, 2010-2-1
Describes the main functions of NI Multisim 10 in microcontroller simulation.
4. “PIC18 Microcontroller: Architecture, Programming, and Interface Design”, (USA) Berry, Tsinghua University Press, 2009-4-1
Microcontrollers are widely used in automotive, home appliances, industrial control, medical equipment, and many other fields. This book takes Microchip’s PIC18 series microcontrollers as an example, comprehensively explaining how to program microcontrollers using both C language and assembly language.
5. “CASL Assembly Language Programming”, Zhao Lihui, China Electric Power Press, 2002-10-1
CASL assembly language is a required content for the senior programmer level in the Chinese Computer Software Professional Technology Qualification and Level Examination. This book is a monograph on CASL assembly language programming.