Assembly Language Day 01

0x00

This article is dedicated to daily learning and note sharing to help everyone learn assembly language. Why learn assembly language? Because in red-blue confrontations, our tools are often detected and killed by some AV/EDR. Therefore, we need to counter AV, which is the evasion technique. To learn evasion techniques, we must start from the basics. In the future, I may also share some notes on C++, PE file structures, etc. Additionally, I may introduce knowledge related to reverse engineering.

0x01

1. Principles of CPU Interaction with Registers

Overview of Registers

1. The 8086 CPU has 14 registers, named AX, BX, CX, DX, SI, DI, SP, BP, IP, CS, SS, DS, ES, and PSW. Among them, AX, BX, CX, and DX are used to store general data and are called general-purpose registers.

2. All registers in the 8086 CPU are 16 bits, capable of storing two bytes, i.e., 1 word, with a maximum data value of 2^16 – 1.

3. Logical structure of general-purpose registers:

Assembly Language Day 01

4. A 16-bit register can store a 16-bit data

Data: 2000

Storage in register AX:

Assembly Language Day 01

5. In the previous generation of CPUs before the 8086, all registers were 8 bits; to ensure compatibility, these four general-purpose registers can be divided into two independent 8-bit registers.

AX can be divided into AH and AL, where AH is the high byte and AL is the low byte.

Both AH and AL registers can be used independently as 8-bit registers.

Storage of Words in Registers

For compatibility reasons, the 8086 CPU can handle the following two sizes of data at once:

      • Byte, 1 byte = 8 bits, can be stored in an 8-bit register.
      • Word, a word consists of two bytes, namely the high byte and the low byte.

A word can be stored in a 16-bit register, with the high byte and low byte naturally residing in the high 8 bits and low 8 bits of the register. 1 word = 2 bytes = 16 bits.

Some Assembly Instructions

Assembly Instruction    Operation Controlled by CPU    High-level Language Syntax Description
mov ax,8            Load 18 into AX                AX=18
mov ah,20           Load 20 into AH                AH=20
add ax,8            Add 8 to the value in register AX  AX=AX+8
mov ax,bx          Load data from register BX into register AX  AX=BX
add ax,bx          Add contents of AX and BX, result stored in AX  AX=AX+BX

Note:

1. Assembly language is case insensitive!

2. When transferring data or performing operations, ensure that the bit sizes of the two operands are the same!

Physical Address

1. When the CPU accesses memory units, it must provide the address of the memory unit. All memory units form a one-dimensional linear space.

2. In summary, a 16-bit structure describes the following characteristics of a CPU:

1) The arithmetic unit can process a maximum of 16 bits of data at a time;

2) The maximum width of registers is 16 bits;

3) The pathway between registers and the arithmetic unit is 16 bits.

3. The 8086 has a 20-bit address bus, capable of transmitting 20-bit addresses, with an addressing capacity of 1M; however, the internal structure of the 8086 is 16 bits, so it can only transmit 16-bit addresses, resulting in an effective addressing capacity of only 64K.

4. The method for the address adder to generate a physical address: Physical Address = Segment Address * 16 + Offset Address!!!

Method for the 8086 CPU to Provide Physical Addresses

1. Relevant components in the CPU provide two 16-bit addresses, one called the segment address and the other the offset address;

2. The segment address and offset address are sent through an internal bus to a component called the address adder;

3. The address adder combines the two 16-bit addresses into a single 20-bit address.

Assembly Language Day 01

Concept of Segments

1. Misconception: Memory is divided into segments, each with a segment address.❌❌❌❌

2. In fact: Memory is not segmented; the segmentation comes from the CPU. The 8086 CPU provides the physical address of memory units using the formula “(Segment Address * 16) + Offset Address = Physical Address”, which makes it appear as if memory is managed in a segmented manner.

3. In practice, several contiguous memory unit addresses can be viewed as a segment, using Segment Address * 16 as the actual address (base address) of the segment, and using the Offset Address to locate memory units within the segment.

4. Data in memory unit 21F60H can be described in two ways for the 8086 PC:

1) Data exists in memory unit 2000:1F60;

2) Data exists in memory unit 1F60 of segment 2000.

Two points to note:

1. Segment Address * 16 is always a multiple of 16, so the starting address of a segment must also be a multiple of 16;

2. The Offset Address is 16 bits. The addressing capacity of a 16-bit address is 64KB, so the maximum length of a segment is 64KB.

Memory Unit Address

1. When the CPU accesses memory units, it must provide the physical address of the memory unit;

2. The 8086 CPU internally forms the final physical address by adding the segment address and offset address.

Consider two questions:

1. Observing the addresses below, what do readers notice?

Assembly Language Day 01

Conclusion: The CPU can form the same physical address using different segment and offset addresses.

2. If a segment address is given, how many memory units can be located by varying only the offset address?

Conclusion: The Offset Address is 16 bits, with a range of 0~FFFFH, so using only the Offset Address for addressing can locate a maximum of 64K memory units.

For example: Given a segment address of 1000H, using the offset address for addressing, the CPU’s addressing range is 1000H~1FFFFH.

Segment Registers

1. Segment registers provide segment addresses; the 8086 CPU has 4 segment registers: CS (Code Segment Register), DS (Data Segment Register), SS (Stack Segment Register), and ES (Extra Segment Register).

2. When the 8086 CPU needs to access memory, these 4 segment registers provide the segment addresses of the memory units.

3. CS and IP are the most critical registers in the 8086 CPU, indicating the address of the instruction the CPU is currently reading.

CS is the Code Segment Register, and IP is the Instruction Pointer Register (usually stores the offset address).

4. The mov instruction cannot be used to set the values of CS and IP; the 8086 CPU does not provide such functionality.

CS and IP

CS and IP are the most critical registers, indicating the address of the instruction the CPU is currently reading. CS is the code register, and IP is the pointer register.

The working process of the 8086 CPU is briefly described as follows:

      • Read the instruction from the memory unit pointed to by CS:IP, and the read instruction enters the instruction buffer.
      • IP = IP + length of the read instruction, thus pointing to the next instruction.
      • Execute the instruction, return to step 1, and repeat this process.

After powering on or resetting the 8086 CPU, CS = FFFFH, IP = 0000H, and the CPU starts executing from FFFF0H.

Now let’s introduce the complete workflow:

CS points to the segment address

IP points to the offset address

The CPU treats the contents pointed to by CS:IP as instructions to execute.

The process of the CPU reading instructions is as follows:

Assembly Language Day 01

The CPU obtains the segment address and offset address from CS:IP, calculates the memory unit address through the address adder, then reads the instruction from memory through the input-output control circuit via the 20-bit address bus, and finally executes the instruction.

Assembly Language Day 01

After executing the first instruction, changes occur, and the process of executing the second instruction begins:

Assembly Language Day 01

IP will change with the instruction.

Assembly Language Day 01

After all instructions are executed, the segment register and instruction pointer register change.

Assembly Language Day 01

Modifying CS and IP

In the CPU, the programmer can only read and write to registers using instructions, but the mov instruction cannot be used to set the values of CS and IP; transfer instructions (jmp) must be used.

If you want to modify the contents of CS and IP simultaneously, you can use a command like jmp segment address: offset address, for example:

jmp 2AE3:3        After execution CS=2AE3H, IP=0003H   Read instruction from 2AE33H
jmp 3:0B16        After execution CS=0003H, IP=0B16H   Read instruction from 00B46H

If you only want to modify the content of IP, you can use jmp to a valid register to represent the modification of IP using the content in the register.

jmp ax    Before execution ax=1000h    cs=2000h   ip=0003h
          After execution ax=1000h    cs=2000h   ip=1000h

Brief Description of the Interaction Process of the 8086 PC

1. After powering on or resetting the 8086 CPU (i.e., when the CPU first starts interacting), CS and IP are set to CS=FFFFH, IP=0000H.

2. When the 8086 PC starts up, the CPU reads and executes instructions from memory unit FFFF0H.

3. The instruction in unit FFFF0H is the first instruction executed after the 8086 PC is powered on.

Experiment: Use the Debug command to check the production date of your motherboard’s ROM

Tip: The production date of the ROM is in several units of memory from FFFF0H to FFFFFH.

Using Debug

Debug is a debugging tool provided by DOS and Windows in real mode (8086 mode), which allows you to view the contents of various CPU registers, memory status, machine code, and trace program execution.

Debug Functions Used

      • R command to view and change the contents of CPU registers
      • D command to view the contents of memory
      • E command to rewrite the contents of memory
      • U command to translate machine instructions in memory into assembly instructions
      • T command to execute a machine instruction
      • A command to write a machine instruction in memory in assembly instruction format

R Command

First, after entering debug, type r to view all register contents.

Modify the contents of the register, for example, change the value of ax to 200.

r  ax
200

D Command

The D command views memory contents,d uses segment address: offset address to view

d 1000:0    View contents at 1000:0
d 1000:0 9  View contents from 1000:0 to 1000:9

Using the D command will output three parts:

The middle part shows the contents of 128 memory units starting from the specified address, output in hexadecimal format, with each line starting from an address that is a multiple of 16, outputting a maximum of 16 units of content per line, with a “-” in the middle for easier viewing.

The left side shows the starting position of each line.

The right side shows the ASCII characters corresponding to the data in each memory unit; if there is no corresponding character, it is represented by a “.”.

E Command

The E command can rewrite the contents of memory, for example, to change the contents of 1000:0~1000:9 to 0~9, you can use e 1000:0 0 1 2 3 4 5 6 7 8 9 to do so.

You can also modify one address at a time using the following steps:

        • Input e 1000:10, press Enter
        • Debug displays the original content of unit 1000:0010
        • Input data to modify the current memory unit, or do not input data and press space to not modify.
        • After completing the current unit, press space to end the modification and automatically move to the next unit.
        • After all modifications are complete, press Enter to finish.

You can use the E command to write characters, for example, e 1000:0 1 ‘a’ 2 ‘b’ to write the ASCII values of 1, a, 2, b.

You can also write character strings, for example, e 1000:0 1 “a+b” 2 “c++”to write the ASCII values of 1, a+b, 2, c++.

You can also write machine code, using U to view the meaning of machine code in memory, and using T command to execute the machine code in memory, for example, writing these three:

Machine Code                Corresponding Assembly Instruction
b80100                mov ax,0001
b90200                mov cx,0002
01c8                add ax,cx

You can use e 1000:0 b8 01 00 b9 02 00 01 c8 to complete this.

U Command

The U command can view the assembly instructions corresponding to machine code in memory, for example, u 1000:0 to view.

The output of the U command is divided into three parts:

        • The address of each machine instruction
        • The machine instruction
        • The assembly instruction corresponding to the machine instruction

T Command

The T command can execute one or more instructions; simply using the T command can execute the instruction pointed to by CS:IP.

        • First, use the E command to write machine code to the target memory unit.
        • Use the R command to check the status of the CPU registers and modify CS:IP to point to the target address.
        • The T command executes the written instruction, and debug displays the status of the registers after execution.

A Command

The A command writes machine instructions in assembly instruction format; after entering the starting address, simply press Enter to indicate the end of the operation.

a  1000:0
1000:0000  mov ax,1
1000:0003  mov bx,2
1000:0005  mov cx,3
Debug Installation
Debug installation tutorial: https://www.cnblogs.com/zhaijiahui/p/10148698.html

ASMT00ls: https://www.yuque.com/desktop/auth?url=https%3A%2F%2Fwww.yuque.com%2Fattachments%2Fyuque%2F0%2F2025%2Frar%2F46595978%2F1750241004893-8fdee8b4-4083-46df-8686-f099c4123c37.rar%3Ffrom%3Dhttps%253A%252F%252Fwww.yuque.com%252Fyizhiyiyudebuoumao%252Fysr4tp%252Figgn8qdeqy6uva82&token=encrypted%7CsN7SGO0v2US-KGseTh0jmbaSJcNPvME0PlAAwRnmkzVp4VWiKtkHcF6KwSoCcloW0VQg4zr7lhUV26fc9hQx1mxpC9GdJGTzeQYZDmATPKfFvuwk9MIZq5253NilWxZqpbC2we4c-kt6DQIwTNeZ631iDSEPerjQetlq7x0Nf9QtYyjgwX_cpD30rMRZaQnpdmeHxmoxQGeXUdkBf_MhXzvHH8UBVjtbpbxYvx-mMraHwazElOi6yS1n040dA9SFKKPIFfPIslzGICAy33L7uR2WXQL-pMhhBEP-895BGZU%3D

Or use a Windows 2003 virtual machine and directly enter debug in cmd.

ASMTools:

📎AsmTools.rar

0x02

Previous Notes:

Basic Knowledge of Assembly Language

Previous Exciting Practices:

Domain Forest Breach from the Red Team Perspective: A Cross-Domain Control Attack and Defense Confrontation Triggered by Shiro Deserialization

Practice – From Shiro Deserialization to Domain Control

“Nuclear Explosion Effect” After Domain Control Takeover: Harvesting Permissions from 1600 Hosts Based on DCSync and Golden Tickets

0x03

Fan Benefits:

1.6K Hosts Full Domain Compromise Record (Lottery at the end)

Assembly Language Day 01ShareAssembly Language Day 01CollectAssembly Language Day 01LookingAssembly Language Day 01Like

Assembly Language Day 01

Scan to Follow UsBecome an Excellent Network Security Guard

Leave a Comment