Introduction to Assembly Language

Friends with a bit of computer knowledge must know that computers only recognize 0s and 1s. Back in the day, to write a program, one had to use 0s and 1s, haha, cool right? The admiration for programmers likely originated from that time. Later, people found it very inconvenient to write programs using just 0s and 1s; not only was it hard to write, but looking back at it later, it would be difficult to understand. For these reasons, assembly language was created.

Assembly language uses mnemonics to replace various combinations of 0s and 1s, which represent different instructions, making it much more convenient to some extent (an old cow: it’s so much easier!)(a newbie: it’s not convenient at all, I can’t understand it). However, assembly language also has its inconveniences; it’s still not easy to write and maintain, especially as people gradually needed to write larger programs. In this context, high-level languages like Basic, Pascal, C, C++, etc., were invented. The emergence of these languages significantly reduced the difficulty of program development (an old cow: it has become so much easier that I can write programs with my knees!)(a newbie: it’s still just as difficult). Programs that once took a long time to develop in assembly can now be completed quickly and easily. Especially in recent years, the widespread popularity of visual programming has diminished the mystique of programmers, and terms like ‘Coder’ are now everywhere. The most unfortunate one is assembly, which overnight became a low-level language, a language that is looked down upon, like a laborer who eats garlic and doesn’t brush his teeth, or a scoundrel who fills up his car without paying, or an Icelandic person who spits on the bus, etc. (Assembly: Boohoo… I can’t go on living).

However, assembly still has its inherent advantages because it corresponds directly to the CPU’s internal instructions. Therefore, in some special cases, assembly must be used, such as accessing hardware ports or writing viruses. Moreover, the executable files generated are extremely efficient and very small, making it quite enjoyable to write small programs, haha. Writing a keygen in assembly is also quite easy; you don’t have to struggle with how to restore it to a familiar language. After saying so much, let’s get to the point (the audience faints a number of times): since computers only recognize 0s and 1s, all files stored on a computer are also stored in binary form, including executable files.

So, you just need to find a hex editor like Ultra Edit to directly open and view the executable file, haha, if you can understand it. You will find that what you see now is all hexadecimal values (every 4 bits of binary can be converted to one hexadecimal digit). This is the specific content of the executable file, which of course includes the code of the executable file. (an old cow: how nostalgic!)(a newbie: stupid cow, shut up, I’m getting dizzy). Haha, at this moment, do you feel a bit confused looking at these things?

These things look like a foreign language, and no one can analyze them using this stuff, so corresponding software has been developed to convert these hexadecimal values into corresponding assembly code. This is what is known as reverse analysis.

Haha, clever you must be thinking now, if you find the part of the software that calculates the registration code and analyze it to understand its calculation method, then you wouldn’t have to register the software through payment, right? Of course, you can also restore this calculation process to any programming language you are familiar with. The compiled program is called a keygen, and its function is to calculate the registration code for a specific software. (Haha, don’t you often see such descriptions in software? “Prohibiting the creation and distribution of the registration key and cracking programs for this software; prohibiting reverse engineering of this software, such as disassembly, decompilation, etc.”)

We can understand the author’s feelings in doing this; after all, they have put so much effort into their software, so I hope you don’t learn cracking just because you can’t afford the registration fee.

In general, the introduction above is a bit too idealistic. The analysis methods mentioned above are called static analysis. Common tools for this type of analysis include W32DASM, IDA, and HIEW. Static analysis, as the name suggests, is to analyze the software by only viewing its disassembled code. Generally, if you just want to crack software, static analysis is sufficient. However, to truly understand the registration algorithm, dynamic analysis is usually required, which means using a debugger to analyze while executing the program.

I have said so much because I want to tell you the importance of assembly. I don’t expect you to be proficient, but at least you should be able to understand it, otherwise, how can you talk about analysis? Although some friends have managed to crack a few software without knowing any assembly, isn’t that a bit tragic? Do you want to spend your whole life cracking software?

In fact, you don’t need to fear assembly; it looks scary, but it’s actually quite similar to those properties and methods of controls that you usually memorize. You are already managing so many MFC controls; how many assembly commands are there? Moreover, assembly is not only useful when cracking software; it is also very useful in many other areas. Therefore, I believe it is a duty to master assembly: you just need to believe that it is not difficult.

Let me first tell you about the composition of the CPU: the task of the CPU is to execute the sequence of instructions stored in memory. For this purpose, in addition to completing arithmetic and logic operations, it also needs to handle data transfer tasks between the CPU, memory, and I/O. Early CPU chips only included the arithmetic unit and the control unit. In recent years, to better match the speed of the memory with that of the arithmetic unit, a cache memory has been introduced into the chip (do you know why P4 is much more expensive than P4 Celeron?). (A hard object flies over, voiceover: why are you talking about this, we don’t need to design a CPU.) Why are you in such a hurry? Since assembly is relatively “low-level”, it directly operates on hardware. You can’t use variables as casually as in VB; if you don’t master some internal working distribution of the CPU, how will you look at assembly code later? (A voice again: important things should be said quickly!)

Aside from the cache memory, the components can generally be divided into three parts:

1. Arithmetic Logic Unit (ALU) is used for arithmetic and logic operations. This part is not very relevant to us; we don’t need to care about it.

2. Control logic. Similarly, not very relevant to us. 3. This is the most important part: working registers, which play a crucial role in the computer. Each register is equivalent to a storage unit in the arithmetic unit, but its access speed is incredibly fast, much faster than memory. It is used to store various information needed or obtained during calculations, including operand addresses, operands, and intermediate results. Next, we will specifically introduce these registers.

Before the introduction, it’s necessary to mention some basic knowledge. Do you know what 32 bits mean? It means the registers are 32 bits, dizzy~~ that’s equivalent to saying nothing. In the CPU, a binary bit is considered one bit, and eight bits make one byte. In memory, information is stored in bytes, with each byte unit assigned a unique storage address, known as a physical address. When accessing the corresponding memory, it is done through this address. What can eight binary bits express? They can express all ASCII codes, meaning one memory unit can store one English character or number, while Chinese characters require Unicode representation, which means two memory units are needed to store one Chinese character. Sixteen bits equal two bytes, which is not difficult to understand, right? Of course, with sixteen bits, there will definitely be thirty-two bits and sixty-four bits, etc. Thirty-two bits are called double words, and sixty-four bits are called quad words. The CPUs we use today are all thirty-two bits, unless you are using a 286 or earlier model. Naturally, the registers in the CPU are also thirty-two bits, meaning one register can hold 32 0s or 1s (excluding segment registers).

Generally speaking, you need to master sixteen registers, and I will introduce them one by one:

First, let’s introduce the general-purpose registers. There are a total of eight: EAX, EBX, ECX, EDX, ESP, EBP, EDI, and ESI. Among them, EAX-EDX are known as data registers. In addition to direct access, you can also access their high and low sixteen bits (I told you they are 32 bits, right?). Their low sixteen bits are obtained by removing the E in front of them, meaning the low sixteen bits of EAX are AX. Moreover, their low sixteen bits can also be accessed in eight bits, meaning AX can be further divided into AH (high eight bits) and AL (low eight bits). The other three registers can be inferred by yourself. In this way, you can handle various situations. If you want to operate on an eight-bit data, you can use MOV AL (eight-bit data) or MOV AH (eight-bit data). If you want to operate on a sixteen-bit data, you can use MOV AX (sixteen-bit data), and for thirty-two bits, you can use MOV EAX (thirty-two-bit data). Perhaps what I said may still be unclear to you; that’s okay, take it slow. I’ll roughly draw a diagram for you, although it’s not very pretty: ─────────────────────── │ │ │ │ │ │ │ │ │ High Sixteen EAX AH AX AL │ │ │ │ │ │ │ │ │ ─────────────────────── (I’m falling… why can’t this diagram display properly? I’ve redrawn it three times.) Do you understand? If you don’t understand, that’s okay; just understand as much as you can.

These four registers are mainly used to temporarily store operands, results, or other information needed during calculations. The other four registers, ESP, EBP, EDI, and ESI, can only be accessed by words. Their main purpose is to provide offset addresses during memory addressing. Therefore, they can be called pointer or index registers. By the way, since the 386, all registers can be used to store memory addresses. (Here’s a little knowledge for you: have you seen the form [EBX] while cracking? This means that EBX holds a memory address, and what you actually want to access is the value stored in that memory unit).

Among these registers, ESP is called the stack pointer register. The stack is a very important concept; it is a storage area that works on a “last in, first out” basis. It must exist in the stack segment, so its segment address is stored in the SS register. It has only one entrance and exit, so there is only one stack pointer register. The content of ESP always points to the current top of the stack. You may still find it unclear, so let me give you an example: you know laborers building houses, right? Suppose there are two laborers, one laborer (hereinafter referred to as laborer A) is laying bricks on the ground, and the other laborer (hereinafter referred to as laborer B) is passing bricks to laborer A. Laborer A is lying on the ground, and next to him are the bricks brought over by laborer B from a distance. He takes them and uses them, while laborer B, after bringing them from afar, just places them on the pile of bricks. In this way, laborer A takes them from the top. The stack works like this: its base starts at a high address, and whenever data is pushed onto it, it is stored towards lower addresses. The corresponding push instruction is PUSH. Whenever data is pushed onto the stack, ESP changes accordingly; in short, it always points to the last data pushed onto the stack. Later, if you want to use the data pushed onto the stack, you will use the pop instruction to retrieve it. The corresponding instruction is POP, and after executing the POP instruction, ESP will increase by the corresponding data size.

Especially now that we are in the Win32 system, the importance of the stack cannot be ignored. The data used by the API is transmitted through the stack, meaning that the data to be transmitted is first pushed onto the stack, and then a CALL is made to the API function, which will use the pop instruction to pop the corresponding data from the stack and then perform operations. You will understand the importance of this point later. Many software that compares registration codes generally push both true and false registration codes onto the stack before a critical CALL. Then, after the CALL, they compare them. Therefore, as long as you find a critical CALL, you can set a breakpoint at the push instruction to view the true registration code. The specific content will be detailed later; this chapter will not discuss it further.

Additionally, there’s EBP, known as the base pointer register. It can work with the stack segment register SS to determine the address of a certain storage unit in the stack. ESP indicates the offset address of the segment, while EBP can serve as a base address in the stack area to access information in the stack. ESI (source index register) and EDI (destination index register) are generally used with the data segment register DS to determine the address of a certain storage unit in the data segment. These two index registers have automatic increment and decrement functions, making them very convenient for indexing. In string processing instructions, ESI and EDI serve as implicit source and destination index registers, with ESI used with DS and EDI used with the extra segment ES to achieve addressing in the data segment and extra segment, respectively. For now, it’s okay if you don’t understand.

Next, let’s introduce the special registers, haha, does this name scare you? It sounds quite professional. The so-called special registers are two: EIP and FLAGS.

First, let’s talk about EIP. EIP can be considered the most important of all registers. It stands for the instruction pointer register, and it is used to store the offset address in the code segment. During the program’s execution, it always points to the address of the next instruction. It works with the segment register CS to determine the physical address of the next instruction. When this address is sent to memory, the controller can fetch the next instruction to be executed. Once the controller retrieves this instruction, it immediately modifies the content of EIP to always point to the address of the next instruction. It can be seen that the computer uses the EIP register to control the execution flow of the instruction sequence. Those jump instructions achieve their purpose by modifying the value of EIP.

Next, let’s talk about FLAGS, the flag register, also known as the program status word (PSW). This register stores condition flags, control flags, and system flags.

Actually, we don’t need to understand it too deeply; you only need to know how it works. Let me give you an example:

Cmp EAX, EBX ; Subtract EAX from EBX JNZ 00470395 ; If not equal, jump here;

These two instructions are quite simple; they use the number stored in the EAX register to subtract the number in the EBX register to compare whether the two numbers are equal. After executing the Cmp instruction, the corresponding value is set on the ZF (zero flag) of FLAGS; if the result is 0, meaning they are equal, ZF is set to 1, otherwise it is set to 0. There are also OF (overflow flag), SF (sign flag), CF (carry flag), AF (auxiliary carry flag), PF (parity flag), etc.

You don’t need to understand all of this right now; just knowing how to use the corresponding transfer instructions is enough.

The last thing to introduce is segment registers (who just mentioned Sakura? Definitely not me). There are a total of six of these registers: CS (code segment), DS (data segment), ES (extra segment), SS (stack segment), FS, and GS (the latter two are also extra segments). Actually, in the Win32 environment, segment registers are no longer as important as they were in DOS. So, just knowing about them is enough. After all this talk, I believe you have a rough understanding of the CPU, right? What? You still don’t understand anything? Haha, don’t be discouraged; please believe that it’s my fault for not explaining clearly. You can refer to some books. I always feel that having a book on assembly language at your desk is very necessary. The one I have is the Tsinghua edition of “80×86 Assembly Language Programming” edited by Shen Meiming, priced at 46 yuan. Next, let’s talk about some commonly used assembly instructions. (Considering that there are already relevant posts, I will only pick out some of the most commonly used instructions that you need to master; for more content, please refer to the book.) CMP A, B Compare A and B, where A and B can be registers or memory addresses, and they can also be two registers, but not both memory addresses. This instruction is very common; many software that compares registration codes use this instruction. MOV A, B Move the value of B to A, where A and B can be registers or memory addresses, and they can also be two registers, but not both memory addresses. XOR A, A XOR operation, mainly used to clear A. LEA Load address, for example, LEA DX, string loads the address of the character into the DX register. PUSH Push onto the stack. POP Pop from the stack. ADD Addition instruction format: ADD DST, SRC performs the operation: (DST)<-(SRC)+(DST). SUB Subtraction instruction format: SUB DST, SRC performs the operation: (DST)<-(DST)-(SRC). MUL Unsigned multiplication instruction format: MUL SRC performs the operation: byte operation (AX)<-(AL)*(SRC); word operation (DX, AX)<-(AX)*(SRC); double word operation (EDX, EAX)<-(EAX)*(SRC). DIV Unsigned division instruction format: DIV SRC performs the operation: byte operation: 16-bit dividend in AX, 8-bit divisor as source operand, result of the 8-bit quotient in AL, and 8-bit remainder in AH. It is expressed as: (AL)<-(AX)/(SRC) quotient, (AH)<-(AX)/(SRC) remainder. Word operation: 32-bit dividend in DX, AX. DX is the high word, 16-bit divisor as source operand, result of the 16-bit quotient in AX, and 16-bit remainder in DX. It is expressed as: (AX)<-(DX, AX)/(SRC) quotient, (DX)<-(DX, AX)/(SRC) remainder. Double word operation: 64-bit dividend in EDX, EAX. EDX is the high double word; 32-bit divisor as source operand, result of the 32-bit quotient in EAX and 32-bit remainder in EDX. It is expressed as: (EAX)<-(EDX, EAX)/(SRC) quotient, (EDX)<-(EDX, EAX)/(SRC) remainder. NOP No operation, can be used to erase the corresponding statement, so, haha… CALL Call subroutine; you can understand it as a process in high-level languages. Control transfer instructions: JE or JZ Jump if equal. JNE or JNZ Jump if not equal. JMP Unconditional jump. JB Jump if less than. JA Jump if greater than. JG Jump if greater than. JGE Jump if greater than or equal to. JL Jump if less than. JLE Jump if less than or equal to. In general, the above are relatively common instructions that need to be mastered, but there are definitely more instructions you need to learn; I hope you can explore others in your own time by looking at relevant tutorials.

I forgot to mention earlier, so now I will also include number system conversions: First, let’s talk about converting binary to decimal: Each binary digit multiplied by its corresponding weight and summed gives the corresponding decimal number. For example: 10100=2 to the 4th power + 2 to the 2nd power, which is the decimal number 20. 11000=2 to the 4th power + 2 to the 3rd power, which is the decimal number 24. Next, let’s talk about how to convert a decimal number to a binary number: I’m not sure how many methods there are for this, but I will only mention the simplest one – division: Continuously divide the integer part of the decimal number by 2 and note the remainder until the quotient is 0. For example, N=34D (to explain, you may see a letter added at the end of some numbers; this letter is used to indicate the number system, where D represents decimal, B represents binary, O represents octal, and H represents hexadecimal). 34/2=17 (a0=0) 17/2=8 (a1=1) 8/2=4 (a2=0) 4/2=2 (a3=0) 2/2=1 (a4=0) 1/2=0 (a5=1) So N=34D=100010B. For the fractional part of the decimal number being converted, you should continuously multiply by 2 and note its integer part until the result’s fractional part is 0. The conversion between hexadecimal and binary/decimal numbers is quite simple; you just need to convert them using the corresponding values. The base of hexadecimal numbers is 16, with 16 digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F. Among them, A represents 10 in decimal, and so on. Their relationships with binary and decimal numbers are as follows: 0H=0D=0000B, 1H=1D=0001B, 2H=2D=0010B, 3H=3D=0011B, 4H=4D=0100B, 5H=5D=0101B, 6H=6D=0110B, 7H=7D=0111B, 8H=8D=1000B, 9H=9D=1001B, AH=10D=1010B, BH=11D=1011B, CH=12D=1100B, DH=13D=1101B, EH=14D=1110B, FH=15D=1111B. Therefore, to convert between binary and hexadecimal, you just need to group every four bits from low to high and represent them directly in hexadecimal: for example: 1000 1010 0011 0101 = 8 A 3 5. To convert hexadecimal to binary, you just need to represent each digit with four binary digits: for example: A B 1 0 = 1010 1011 0001 0000. Finally, for the conversion between hexadecimal and decimal: converting hexadecimal to decimal is done by summing the products of each hexadecimal digit with its corresponding weight. For example, N=BF3CH = 11*16^3 + 15*16^2 + 3*16^1 + 12*16^0 = 11*4096 + 15*256 + 3*16 + 12*1 = 48956D. Converting decimal to hexadecimal is done by the simplest division: continuously divide the integer part of the decimal number by 16 and note the remainder until the quotient is 0. For example, N=48956D 48956/16=3059 (a0=12) 3059/16=191 (a1=3) 191/16=11 (a2=15) 11/16=0 (a3=11) So N=48956D=BF3CH.

Through the above introduction, I don’t know if you understood anything. If you did, please check the books and read the parts that I didn’t cover or those I have covered a few times. If you didn’t understand anything at all, then you need to read the books even more. If you carefully read the previous introduction about the CPU and understand the concept of registers, and then grasp the assembly instructions mentioned later, you can get started. If you study seriously and memorize carefully, you will find that it’s not as difficult as you imagine. In a week, you can roughly master it, but just mastering it means you can at least read assembly code. If you truly want to learn well, then read the rest as well and write some small programs to practice. Of course, mastering assembly is not something that can be done in a day, two days, or even a month or two, but as long as you have perseverance, what can’t you accomplish? The CPU was made by humans, and instructions are just part of it. If they can create a CPU, you should not be afraid of not being able to learn to use it! Post-Class FAQ

Q: I previously learned 8086/8088 and also wrote programs under DOS, can I do it? A: Absolutely, compared to 8086/8088, the current CPU has not added many new basic instructions. You just need to understand the changes in the registers and supplement your knowledge of Windows programming. Moreover, since you have written programs in assembly under DOS, you must be quite familiar with debuggers like Debug, so you have an inherent advantage.

Q: Assembly is not a problem for me, but why can’t I get started? A: Haha, there are quite a few veterans like that; they are quite skilled in assembly but feel stuck due to experience. Many people were like this back then, including me. When I see a CALL, I just follow it, haha, and I’ve followed quite a few APIs. So for these experts, you just need to practice more and master some analysis techniques.

Q: I haven’t learned programming, can I learn assembly? A: Generally speaking, yes. However, I hope that learning assembly does not make you lose confidence in learning other high-level languages. 🙂 Answering netizens’ questions Q: Can registers be used arbitrarily, are there any restrictions? Can those variables be placed in any register when writing a program?

A: Haha, I will now answer the question from the friend above. Registers have their usage mechanisms, and each register has a clear division of labor. For example, the data registers (EAX-EDX) are general-purpose registers, meaning any data can be stored in them. However, aside from that, they can also be used for their specific purposes. For instance, EAX can be used as an accumulator, which means it is the main register for arithmetic operations. In multiplication and division instructions, it is specified to store operands. For example, in multiplication, you can use AL, AX, or EAX to hold the multiplicand, while AX or DX:AX or EAX or EDX:EAX can be used to store the final product. EBX is often used as a base register when calculating memory addresses. ECX is commonly used to store count values, such as in shift instructions where it holds the shift amount, and it serves as an implicit counter in loops and string processing instructions.

Finally, we are left with the last of the four kings, Li Ming, who has been relatively low-key recently… (don’t hit me, I’ll go hit the wall instead) We are left with EDX, which is generally used to store a double-word long number when performing double-word long operations, combining DX and AX to store a double-word long number (do you remember what a double-word long is? For example, if you have a binary number 01101000110101000100100111010001, you can store the high sixteen bits 0110100011010100 in DX and the low sixteen bits 0100100111010001 in AX. This number is represented as DX:AX). Of course, you can also use EDX to store this number. So, EDX:EAX can also be used to store a 64-bit number; you can infer this.

As for ESP, EBP, EDI, and ESI, I have already introduced them roughly, so I won’t discuss them here.

Of course, there are other restrictions, but since we are only looking at the assembly code of the program (the code is already written, so it shouldn’t contain errors), we don’t need to master them. If you’re interested, you can look at relevant books.

Also, regarding your last question, “Can those variables be placed in any register when writing a program?” I don’t quite understand what you mean. I think you might have misunderstood some key points; the term variable usually appears in high-level languages. When you write a program in a high-level language, you don’t need to understand those registers; they are unrelated to high-level languages. However, ultimately, high-level languages also convert the programs you write into operations on registers and internal memory.

<End of Chapter>

Leave a Comment