A Comprehensive Guide to Assembly Language

Scan the QR code to follow “Finger Tips Sound” and learn together, grow together.

A Comprehensive Guide to Assembly Language

Part1 Definition of Content

1.1 Definition of Data Segment

Assembly language programs are written in segments, generally defining data in the data segment and the program in the code segment. The syntax for defining a segment is as follows:

segment_name  SEGMENT
...(content of the segment)...
segment_name  ENDS

Notes:

  1. Assembly language does not distinguish between uppercase and lowercase letters;

  2. In assembly language, a line can only contain one statement;

  3. The name of the segment must start with a letter or underscore, be meaningful, and not conflict with reserved words;

  4. In assembly language, comments are indicated by a semicolon following the English text;

  5. One segment cannot define another segment within it, meaning segments are independent of each other.

1.2 Definition of Data

The definition of data refers to allocating storage units for given data and storing them in the data segment in a standard format. The elements of data definition statements include DB, DW, DD, DQ, DT, etc.

1.2.1 Define Byte Data DB

Below is a segment of assembly code:

DATA  SEGMENT
X     DB   -1,255,'A',3+2,?
      DB   "ABC",0FFH,11001010B
Y     DB   3 DUP(?)
DATA  ENDS

The following explains the above code segment:

  1. Definition of Variables:X and Y are variable names, indicating that the programmer has defined two variables X and Y. Unlike high-level languages, in assembly language, variables actually represent the address of the first data that follows; the variable name represents several subsequent data items;

  2. Definition of Byte Data:DB indicates that the defined data type is byte type.DB can be used to define integers (including positive and negative numbers, which can be in decimal, hexadecimal, or binary) as well as characters;

  3. Evaluated Expression:When defining data, simple evaluated expressions can appear. For example, DB 3+2 is equivalent to DB 5;

  4. Definition of Unknown Values:A question mark indicates a value that is temporarily uncertain, generally filled with 0 for this unit;

  5. Definition of Multiple Characters:Multiple characters enclosed in double quotes can appear, which will be stored separately in order;

  6. Repeated Definition of Same Data:DUP indicates repeated definition of multiple identical data.The syntax is as follows;

  7. Defining Across Lines:If there is too much data for one line, a new line can be started to continue the definition.No need to rewrite the variable name, but the DB pseudo-instruction needs to be rewritten.

1.2.2 Define Word Data DW

Word data is 16 bits, simply change the DB in the above byte definition syntax to DW.

1.2.3 Define Double Word Data DD

Double word data is 32 bits, simply change the DB in the above byte definition syntax to DD. Note that the high byte of the data is stored in the higher address unit, and the low byte is stored in the lower address unit.

1.2.4 Define Quad Word and Ten Byte Data DQ DT

Simply change the DB pseudo-instruction to DQ and DT.

Part2 Data Transfer

2.1 Format of Instruction Statements

Instruction statements correspond to a machine instruction, and the general format is as follows:

[Label:]  Opcode [Operands] [;Comment]

Syntax Explanation:

  1. Label refers to the name given by the programmer for this instruction statement. Most instruction statements do not need a label; only some special instruction statements require it;

  2. Opcode specifies the operation type of this instruction, and all opcodes are reserved words;

  3. Operands can be 0-3, and if there are multiple operands, they are separated by commas. The rightmost operand is the source operand, and the leftmost operand is the destination operand.

2.2 Classification of Operands

Operands can be classified into three types: register operands, immediate operands, and memory operands. Regarding register operands, note that register IP and FLAGS cannot appear as operands in instructions; regarding immediate operands, note that immediate operands cannot be used as destination operands. Below, we focus on memory operands and first introduce two basic concepts:

  1. Memory operands indicate access to a memory unit, requiring both the segment base address and offset address to perform;

  2. In most cases, the instruction will automatically use the content in the DS register as the segment base address of the operand, therefore, when writing assembly language source programs, the first thing to do is to load the data segment base address into the DS register.;

Now that we have set the segment base address, we just need the offset address to locate the correct memory unit. There are two methods to provide the offset address: direct and indirect. The direct method refers to directly writing the offset address of the memory unit in the instruction, while the indirect method involves loading the offset address into a register in advance and using the value in that register to locate the memory unit when needed.

(1) Direct Method Syntax:

MOV  Destination_Register,  Variable_Name[+Byte_Offset]

This statement’s function is to use the content in the DS register as the segment base address and the sum of the specified variable name’s offset in the data segment (with the byte offset) as the offset address, placing the value from the specified memory unit into the destination register.

(2) Indirect Method Syntax:

MOV  Indirect_Address_Register,  OFFSET Variable_Name
(Here are the statements when using the offset address)
MOV  Destination_Register,     Indirect_Address_Register

Syntax Explanation:

  1. OFFSET is a reserved word that indicates extracting the offset address of the following variable;

  2. Indirect address registers can only be one of BX, BP, SI, DI. Unless otherwise specified, using BX, SI, and DI automatically uses the content of DS as the segment base address, while using BP automatically uses the value of SS as the segment base address.

2.3 Definition of Program Segment

The general format of a program segment is as follows:

CODE SEGMENT
  ASSUME  CS:CODE, DS:DATA
START: MOV  AX, DATA
       MOV  DS, AX
       ...(other instruction parts)...
       MOV  AX, 4C00H
       INT  21H
CODE ENDS
       END  START

Syntax Interpretation:

  1. The first two instructions of the program are used to load the data segment register DS. After entering the program, the code segment register CS’s value is automatically set by the operating system as the segment base address of the code segment, while the data segment’s base address needs to be manually loaded into DS by the programmer;

  2. ASSUME pseudo-instruction is used to specify the segment base address register corresponding to each data segment. In the above code, the segment base address register for CODE segment is CS, and for DATA segment is DS;

  3. INT 21H indicates calling the service program provided by the operating system numbered 21H. The type of service is determined by the function number in AH; in this example, 4CH indicates returning to the operating system’s operation; The code in AL is called the return code, with return code 00H indicating a normal return;

  4. END pseudo-instruction marks the end of the entire program. Any code written below the END statement will not be assembled.The label after END indicates the entry address of the program, which is where the assembly program begins execution.

2.4 Basic Transfer Instructions

Basic transfer instructions are the most frequently used instructions and need to be mastered. The format is as follows:

MOV  Destination_Operand, Source_Operand

Syntax Explanation:

  1. The types of source operand and destination operand must be the same. If they are not the same, a forced type conversion must be used first. The syntax for forced type conversion can be seen below;

  2. Source operand and destination operand cannot both be memory operands, nor can they both be segment registers;

  3. Destination operand cannot be an immediate number;

  4. The code segment base address register CS cannot be a destination operand;

  5. When using an immediate number as a source operand, the immediate number will be extended according to the type of destination operand.

Forced Type Conversion Syntax (use with caution):

Data_Type  PTR[Variable_Name]

Part3 Stack

3.1 Definition of Stack

The stack is also part of the memory used by the user, for storing temporary data and other information. The syntax for defining a stack segment is as follows:

Stack_Name  SEGMENT  STACK
       (stack content)
Stack_Name  ENDS

Syntax Explanation:

  1. The only difference between stack definition and general segment definition is the use of STACK;

  2. For stack segments, the system automatically places the segment base address of SSEG into the SS register when loading the program, and the number of bytes in the stack is automatically placed into the SP register;

  3. Contents in stack segments are allocated and used starting from larger addresses;

  4. For the 8086 CPU, only 2-byte data can be pushed or popped from the stack.

3.2 Methods for Using Stack

Common stack-related instructions include PUSH, POP, PUSHF, and POPF, with the following syntax:

PUSH Source_Operand              ;Push specified operand onto stack for protection
POP  Destination_Operand              ;Restore top operand from stack to specified location
PUSHF                     ;Push flag register content onto stack for protection
POPF                      ;Pop flag register from stack for restoration

Part4 Common Operand Expressions

4.1 Symbol Definition Pseudo-Instruction

Symbol definition is equivalent to C language’s #define preprocessor, used for equivalent replacement of symbols. The syntax for symbol definition is as follows:

Symbol_Name   EQU   Expression

Syntax Explanation:

  1. During assembly, the symbol name defined by EQU is replaced with the corresponding expression;

  2. Symbol names defined by EQU cannot be redefined.

Another way to define a symbol is to use the “=” symbol, with the specific syntax as follows:

Symbol_Name   =   Constant_Expression

Syntax Explanation:

When defining symbols with an equal sign, only constant expressions can be used.

4.2 Get Segment Base Address

The SEG can be used to obtain the segment base address of the address expression, with the specific method as follows:

SEG Address_Expression

Part5 Arithmetic Operations

5.1 Addition Instruction

To add two operands, the ADD instruction should be used, with the instruction syntax as follows:

ADD   Destination_Operand, Source_Operand

Syntax Explanation:

  1. This instruction adds the destination operand to the source operand, with the result stored in the original location of the destination operand;

  2. After the ADD instruction is executed, the CPU’s status flags will be refreshed.

Additionally, there is an INC instruction that increments the operand, with the syntax as follows:

INC   Operand

Syntax Explanation:

  1. The increment operation does not affect the CPU’s status flags;

  2. Increment instructions are often used to modify counters and memory pointer values.

5.2 Subtraction Instruction

The use of subtraction instructions is symmetrical to addition instructions. ADD in addition corresponds to SUB in subtraction; INC in addition corresponds to DEC in subtraction.

5.3 Multiplication and Division Instructions

The multiplication instruction is MUL, and the division instruction is DIV, with usage similar to addition and subtraction. Since multiplication and division are used less frequently, they will not be elaborated further.

Part6 Looping

The syntax for loop instructions is as follows:

LOOP   Label

Syntax Explanation:

  1. The number of loops is determined by the value in the CX register. After each loop, the value in the CX register decreases by 1; when CX=0, the loop terminates, hence CX is also called the counter;

  2. The process of loading the CX register should be completed before the loop begins;

  3. Each successful loop returns to the statement at the label.

Part7 Logical Operations

Logical operations include AND, OR, XOR, and NOT, with the usage syntax as follows:

Logical_Operation_Opcode   Destination_Operand   Source_Operand

Usage Conditions:

  1. AND instruction is mainly used to selectively clear bits of the operand;

  2. OR instruction is mainly used to selectively set bits of the operand;

  3. XOR instruction is mainly used to selectively invert bits of the operand;

  4. NOT instruction is mainly used to invert the entire operand.

Part8 Interrupt Calls

All DOS system function calls are implemented via the soft interrupt instruction INT 21H. INT 21H is an interrupt service program with over 90 sub-functions. Each sub-function of INT 21H is numbered, known as the function number.

Method for DOS system function calls:

MOV Function_Number   ;Place function number into register AH
......
(Place the entry parameters required by the function in other registers)
......
INT 21H     ;Call DOS system function

Common Functions:

8.1 Keyboard Input Single Character

Function number 1, input character stored in AL as ASCII code and displayed simultaneously.

MOV AH 01
INT 21H

8.2 Screen Display Single Character

Function number 2, display the character stored in DL register on the screen.

MOV AH 02
MOV DL Character_To_Display
INT 21H

8.3 Screen Display String

Function number 9, used to display a string stored in the DX register on the monitor, the displayed string must end with ‘$’.

MOV AH 09
MOV DX Address_Of_String_To_Display
INT 21H

8.4 Return to DOS

A function to exit a program normally and return to DOS, with function number 4CH.

MOV AH 4CH
INT 21H

Part9 Definition and Calling of Subroutines

9.1 Define Subroutine

Subroutine_Name PROC
...
       RET   ;Indicate return from subroutine
Subroutine_Name ENDP ;Indicate end of subroutine definition

9.2 Call Subroutine

CALL Subroutine_Name

Part10 Read and Write Ports in Interfaces

MOV DX Port_Address
......
(Initialize other registers)
......
OUT DX Data_To_Transmit_To_Port

Part11 Empty Instruction Delay

Using NOP indicates executing an empty instruction, doing nothing. When a delay is needed between instructions, NOP can be inserted.

NOP

Part12 Selection Structure

12.1 CMP Instruction

The CMP instruction format is as follows:

CMP Destination_Operand, Source_Operand

Syntax Explanation:

  1. CMP is used to compare the sizes of two operands of the same type;

  2. The result of the instruction execution does not modify the two operands, but modifies the flags;

  3. CMP is often used in conjunction with the following instructions.

JGE Before >= After     Jump if  greater or equal
JG Before > After       Jump if  greater
JLE Before <= After     Jump if  less or equal
JL Before < After       Jump if  less
JNE Before not equal After  Jump if not equal
JE Before equal After     Jump if equal
The article has been authorized for reprint by the author, copyright belongs to the original author. If there is any infringement, it is unrelated to this account, please contact for deletion.https://blog.csdn.net/hanmo22357/article/details/127883179A Comprehensive Guide to Assembly Language

Leave a Comment