Instruction Format and Basic Syntax of Assembly Language

Analysis of Assembly Errors in Microcontroller Assembly Language

Currently, there are two different standards for the instruction format of assembly language: Assembly languages on Windows generally follow Intel-style syntax, such as MASM and NASM; whereas assembly languages on Unix/Linux generally follow AT&T-style syntax;

1. General Format of Assembly Language Statements

[Name[:]] Opcode [First Operand][,Second Operand] ;Comment

The number of operands for the opcode in assembly language can be 0, 1, or 2; when there are 2 operands, the statement can have two different formats:

The format for Intel-style assembly language statements on Windows is:

[Name[:]] Opcode Destination Operand DST, Source Operand SRC ;Comment

The format for AT&T-style assembly language statements on Unix/Linux is:

[Name[:]] Opcode Source Operand SRC, Destination Operand DST ;Comment

For example: CYCLE: ADD AX,02H ;In the assembly language statement format, the “Name” is not required for all statements, but if the statement contains a “Name”, then in most cases, the “Name” represents the address of a storage unit in memory, which is the address of the first storage unit where the items following the “Name” are stored in memory (including the segment address and offset address of the segment where the “Name” is located); for example, in the above instruction, CYCLE is the name of the statement, and CYCLE represents the first address in memory where the machine instruction code following it is stored; the separator between the “Name” and the opcode can be a colon “:” or a space character ” “; when separated by a colon, the name represents a label; when separated by a space, the name may represent either a label or a variable; when the opcode has multiple operands, adjacent operands must be separated by a comma “,”; there must be a space separating the opcode and the operands; comments in assembly language statements must start with a semicolon “;”;

2. Elements of a Statement

1. Constants:

Constants in assembly language can be integers or strings; binary, octal, decimal, and hexadecimal; assembly language uses different suffixes to distinguish:

B: Binary; O: Octal; D: Decimal; H: Hexadecimal;

When a numeric value has no suffix, it is assumed to be a decimal number;

String constants are a sequence of characters enclosed in a pair of single quotes (”);

2. Expressions:

Composed of operands and operators;

Arithmetic operators: +, -, *, /, MOD, etc.; The modulus operation MOD is the remainder of two numbers divided;

Logical operators: AND (logical AND), OR (logical OR), NOT (logical NOT), XOR (logical XOR);

Note: Logical operators can also be the opcodes for logical operations, and they are operators only when they appear in the operand part of the instruction; for example:

ADD AL,0CH ADD 0FH ;The first ADD is the opcode, the second ADD is the operator;

Relational operators: EQ (equal), NE (not equal), LT (less than), GT (greater than), LE (less than or equal to), GE (greater than or equal to);

Expressions in assembly language cannot form statements on their own, they can only be part of a statement;

Note: The evaluation of expressions in statements is not completed when the statement is executed, but during the assembly linking of the source program. Therefore, the values of each expression in the statement must be determined during assembly or linking, meaning that the values of identifiers in the expression should be determined during assembly or linking;

3. Labels:

Labels are the names of instructions represented by identifiers, used to indicate the position (address) of the corresponding instruction;

Labels have three attributes: segment address, offset address, and type;

The segment address and offset address attributes of a label refer to the segment address and offset address of the segment where the corresponding instruction is located;

There are two types of labels: NEAR and FAR; defining a label as NEAR means it is used within the segment, while defining it as FAR means it can be used between segments;

Definition of a label: Add an identifier and a colon “:” before the opcode;

For example: START: PUSH DS

In this statement, START is the label we defined, which represents the address of the PUSH instruction, thus, labels can be used as operands for program transfer instructions (i.e., the address to transfer to); labels can also be defined using pseudo-instructions; for example, using the LABEL pseudo-instruction and procedure definition pseudo-instruction;

4. Variables:

Like high-level languages, not all operands are constants; assembly language also has its own variables, the values of which can be changed during program execution;

A. Defining variables: In assembly language, variables are defined using pseudo-instructions; the format for defining variables is as follows:

VariableName DB Expression ;Define byte variable, also known as single-byte variable (1 continuous byte), DB –> BYTE

VariableName DW Expression ;Define word variable, also known as double-byte variable (2 continuous bytes), DW –> WORD

VariableName DD Expression ;Define double variable, also known as four-byte variable (4 continuous bytes), DD –> DWORD

VariableName DF Expression ;Define six-byte variable, also known as six-byte variable (6 continuous bytes), DF –> FWORD

VariableName DQ Expression ;Define long variable, also known as eight-byte variable (8 continuous bytes), DQ –> QWORD

VariableName DT Expression ;Define ten-byte variable (10 continuous bytes), DT –> TBYTE;

Where VariableName is a valid identifier, it cannot have a colon “:” after it, only spaces; VariableName is not mandatory, it can be present or absent; the type of the variable is defined by the keywords DB, DW, DD, DQ, DT;

The “Expression” in the variable definition statement is used to initialize the variable, and can have the following situations:

(1). One or more constants or expressions; when there are multiple constants or expressions, they must be separated by commas; e.g., DATA1–DATA4;

(2). Strings enclosed in single quotes;

For byte-type (DB) variables, each variable is 1 byte in size, and the value of each variable cannot exceed 1 character; each byte stores the ASCII code value of a character, and the entire string can be given in the same pair of single quotes, which is equivalent to defining a character array, e.g., DATA5;

For word-type (DW) variables, each variable is 2 bytes in size, and the value of each variable cannot exceed 2 characters; if there are 2 characters, the high byte is stored in the high byte, and the low byte is stored in the low byte; if there is 1 character, the ASCII code value of that character is stored in the low byte, and the high byte is 00, e.g., DATA6;

For double-word type (DD) variables, each variable is 4 bytes in size, and the value of each variable cannot exceed 2 characters; if there are 2 characters, the high byte is stored in the high byte, and the low byte is stored in the low byte; however, the values of the 2 characters are stored in the lowest 2 bytes of the double variable, and the value of 1 character is stored in the lowest byte;

For long-word type (DQ) variables, each variable is 8 bytes in size, and the value of each variable cannot exceed 2 characters; if there are 2 characters, the high byte is stored in the high byte, and the low byte is stored in the low byte; however, the values of the 2 characters are stored in the lowest 2 bytes of the long variable, and the value of 1 character is stored in the lowest byte;

(3). A question mark “?” indicates that the value of the variable is uncertain, meaning that the content in the memory unit represented by the variable is uncertain, or when the expression is a question mark, no new value has been stored in the corresponding memory area of the variable, but only the corresponding storage space has been reserved; e.g., DATA7, DATA8

(4). Repetition method; the format is: RepeatCount DUP(Expression); the repetition method indicates that the value of the expression can be repeatedly stored in the memory area corresponding to the variable, and the number of repetitions is given by the pseudo-instruction, equivalent to defining an array; e.g., DATA9, DATA10

Examples of defining variables:

DATA1 DB 20H ;1-byte variable

DATA2 DW 0204H,1000H ;2-byte variable

DATA3 DB (-1*3),(15/3) ;1-byte variable

DATA4 DD 123456H ;4-byte variable

DATA5 DB ‘0123’ ;String variable, equivalent to a character array

DATA6 DW ‘AB’,’C’,’D’ ;String variable, equivalent to a string array;

DATA7 DB ? ;1-byte variable, uninitialized

DATA8 DD ? ;4-byte variable, uninitialized

DATA9 DB 5 DUP(0) ;1-byte variable, initialized with 5 zeros, equivalent to an array with 5 DB-type elements

DATA10 DW 3 DUP(?) ;2-byte variable, uninitialized, equivalent to an array with 3 DW-type elements

The function of the pseudo-instruction in the variable definition statement is to sequentially store the values in the expression into the memory area corresponding to the variable name starting from the address, where each value in the expression occupies memory bytes corresponding to the type of the variable;

Summary: The variable name actually represents the effective address (offset address) of the memory area corresponding to that variable in the memory segment; high address refers to relatively larger address values, while low address refers to relatively smaller address values, high and low addresses are relative;

5. Attributes of Variables:

(1). Introduction to Attributes

A variable has the following attributes:

A. Segment Address (SEG): The segment address of the variable;

B. Offset Address (OFFSET): The offset address within the segment where the variable is located;

C. Type (TYPE): The type of the variable defines the number of memory bytes occupied by each variable. The number of memory bytes occupied by variables defined by DB, DW, DD, DQ, DT types are 1, 2, 4, 8, 10 respectively; typically, variables defined by DB, DW, DD types are referred to as BYTE type, WORD type, DWORD type variables;

Common identifier type value list:

Identifier Type Byte Variable Word Variable Double Word Variable Near Label NEAR Far Label FAR

TYPE Value 1 2 4 -1 -2

D. Length (LENGTH): The number of variables defined by a variable name at the time of definition; in variable definitions containing the DUP operator, the number of variables defined by the variable name is the repetition count in the definition format; in other various variable definitions, the number of variables defined by each variable name is 1;

E. Size (SIZE): The total number of bytes allocated for all variables with the same variable name in the variable definition statement, its value is the product of the type of the variable and the length;

Where the segment address, offset address, and type attributes are the main attributes of the variable, while length and size attributes are auxiliary attributes of the variable;

(2). Attribute Operators:

Operator Expression Meaning

SEG SEG Variable Name or Label Retrieve the segment address of the variable name or label

OFFSET OFFSET Variable Name or Label Retrieve the offset address of the variable name or label within the segment

TYPE TYPE Variable Name or Label Retrieve the type of the variable name or label (the number of bytes occupied by the variable)

LENGTH LENGTH Variable Name Retrieve the length of the variable

SIZE SIZE Variable Name Retrieve the size of the variable

These operators cannot form statements on their own, they can only be part of an expression, and the evaluation of the expressions is also completed during assembly;

6. Forced Type Conversion Operator PTR

Format: Data Type PTR Address Expression

The “Data Type” in the format can be BYTE, WORD, DWORD, NEAR, FAR; the first three types are types of variables, while the last two types are types of labels; the expression in the format can be a variable, label, or other address expression;

The function of the PTR operator is to redefine the type of defined variables or labels, its scope is only within the current statement; for example:

DATA1 DW 02H

MOV BYTE PTR DATA1,AL

In this instruction, the type of DATA1 is converted to BYTE type, and then the content of AL is stored in the lowest byte of DATA1; the scope is only within this MOV statement, after this statement, DATA1 remains DW type, i.e., the original type of DATA1 has not been modified;

7. Composite Data Types:

In addition to the repeated data types defined by DUP, assembly language also has structure types, union types, and record types, similar to C/C++ languages;

(1). Structure Type:

A. Type Definition Format:

StructureTypeName STRUC [Alignment Type][,NONUNIQUE]

Field1 Type1 Exp1

Field2 Type2 Exp2

……

FieldN TypeN ExpN

StructureTypeName ENDS

Note: Field names in structures can be optional; if there are field names, they must be unique, and each field can be accessed independently; if there are no field names, they can be accessed by offset;

Alignment Type: Defines the byte alignment boundary for each field, alignment values can be 1, 2, 4, 8, 16 bytes, values must be powers of 2; alignment is similar to the alignment of structure fields in C/C++;

NONUNIQUE: Requires that fields in the structure can only be accessed using their full names;

Fields in the structure can have names or not; fields with names can be accessed directly using the field name, while unnamed fields can be accessed using the offset of the field in the structure;

For example:

PERSON STRUC

NO DD ? ;Named field, offset 0

NAME DB 10 DUP(?) ;Named field, offset 4

DB 1 ;Unnamed field, offset 14

PERSON ENDS

B. Definition of Structure Type Variables:

[VariableName] StructureTypeName

Each field value in the field value list must be separated by commas, and the order and type of field values should be consistent with the fields specified during the structure definition; if a field value in the structure variable uses the default value specified during the structure definition, it can be represented by a comma; if all fields use the default values specified during the structure definition, the field value list can be omitted, and only a pair of angle brackets “” needs to be retained;

For example:

Per1 PERSON ;All fields use default values

Per2 PERSON ;All fields are reinitialized

Per3 PERSON ;The second field uses the default value;

C. Referencing Fields of Structure Type:

Format: StructureVariableName.FieldName

This referencing method is completely consistent with the referencing method in high-level languages; additionally, offsets can also be used to access a certain field;

Method 1: Direct reference using the field name

MOV AX,Per3.NAME

Method 2: Reference using the offset of the field in the structure

LEA SI,Per3 ;Get the effective address of the memory block corresponding to variable Per3

MOV AX,[SI+4] ;Register relative addressing, 4 is the offset of field NAME

(2). Union Type:

A. Type Definition Format:

[UnionTypeName] UNION [Alignment Type][,NONUNIQUE]

Field1 Type1 Exp1

Field2 Type2 Exp2

……

FieldN TypeN ExpN

[UnionTypeName] ENDS

Note: The fields in the union type overlap with each other, i.e., the same storage unit corresponds to multiple fields of different types, and the offset of each field in the union type is 0; the number of bytes occupied by the union type is the maximum number of bytes occupied by its fields, i.e., the number of bytes occupied by the union is the number of bytes occupied by the field with the largest number of bytes among all fields in this union;

Alignment Type: Can specify the byte alignment boundary for each field in the union using 1, 2, 4, 8, or 16 bytes, the default alignment boundary is 1 byte; pseudo-instructions ALIGN or EVEN can be used to redefine the boundary, and command line options /Zp can also be used to define boundaries;

NONUNIQUE: Requires that fields in the union type must be accessed using their full names;

For example:

DATE UNION

YEAR DB 2010

MONTH DB 07

DAY DB 04

DATE ENDS

B. Definition of Union Type Variables:

Union type variables can only be initialized using the data type of the first field; for example:

DATE1 DATE ;Define a union type variable DATE1, and initialize using the data type of the first field

DATE2 DATE ;Initialization error, can only be initialized using the data type of the first field;

C. Referencing Fields of Union Type:

Format: UnionTypeVariableName.FieldName

For example:

MOV DATE1.YEAR,2012 ;Assign value to the field of union type variable

MOV AL,DATE1.MONTH ;AL=07

MOV BX,DATE1.YEAR ;BX=2012

MOV DATE1.MONTH,08 ;Set month to August

(3). Record Type:

A. Type Definition Format:

In assembly language, the record type is different from the record type in high-level languages; in assembly language, the record type is convenient for accessing data by binary bits; the definition of the record type requires another keyword RECORD, formatted as follows:

RecordName RECORD Field[,Field,…]

Where “Field” represents: FieldName: Width[= InitialValueExpression]

Note: RecordName represents the record type; the record type can consist of multiple fields, and adjacent fields are separated by commas; the attributes of fields in the record type include field name, width, and initial value; in the record type, the “width” attribute of a field indicates the number of binary bits occupied by that field, it must be a constant, and the sum of the widths of all fields cannot exceed 16 (i.e., if the sum of the widths of fields is greater than 8, the system will automatically allocate 2 bytes of space for that record type, otherwise only 1 byte of space will be allocated; the last field of the record type is placed at the lowest position of the allocated space, then the fields in the record are assigned binary bits “from right to left”, and the remaining binary bits on the left are automatically filled with 0; the initial value expression gives the default value of the field, if the initial value exceeds the range represented by that field, an error message will be generated during assembly, if a field does not have an initial value expression, its initial value is 0;

Example 1:

COLOR RECORD BLINK:1,BACK:3=0,INTENSE:1=1,FORE:3

The binary bit distribution of this COLOR type is as follows:

The widths of each field in this type are: 1, 3, 1, 3, so this record occupies 8 binary bits, and the system allocates 1 byte for it;

Example 2:

FLOAT RECORD DSIGN:1,DATA:8,ESIGN:1,EXP:4

The binary bit distribution of this FLOAT type is as follows:

The total width of this type is 14 binary bits, so the system allocates 2 bytes of space for it;

B. Definition of Record Type Variables:

[VariableName] RecordTypeName

Note: The variable name is the variable name of the record type, it can be omitted, and then the symbolic name cannot be used to access that memory unit; the field value list is used to assign initial values to each field, and adjacent field values are separated by commas, the order and size of field values should be arranged according to the order and size specified during the record type definition; if a certain field value of the record type variable uses the default value, it can be represented by a comma, and if all fields use default values, the field value list can be omitted, but a pair of angle brackets must be retained;

For example:

COLOR1 COLOR , ,

FLOAT1 FLOAT ,

C. Referencing Fields of Record Type:

Format: RecordTypeVariableName.FieldName

For example: MOV AL,COLOR1.FORE

D. Special Operators for Record Types:

The operators WIDTH and MASK are special operators for record types, which can be used to obtain different attributes of the record type;

WIDTH: Used to return the number of binary bits of the record or its field, i.e., the width of the record type or record type field; the writing format is as follows:

WIDTH RecordName or WIDTH RecordFieldName

For example: If the record type is COLOR, then WIDTH COLOR has a value of 8, WIDTH BACK has a value of 3, WIDTH BLINK has a value of 1;

MASK: Returns an 8-bit or 16-bit binary number, where the corresponding bits used by the specified record or field have a value of 1, otherwise, the value is 0; the writing format is as follows:

MASK RecordName or RecordFieldName

For example: If the record type is FLOAT, then MASK EXP has a value of 000FH, MASK DATA has a value of 1FE0H, MASK DSIGN has a value of 2000H;

Record Field: The record field name is a special operator, and it is also an operand, which returns the number of bits needed to move the field to the lowest bit of the field in the record, i.e., the position of the lowest bit of that field in the record;

For example: If the record type is FLOAT, then:

MOV CL,EXP is equivalent to MOV CL,0

MOV CL,DATA is equivalent to MOV CL,5

(4). Type Redefinition:

Knowing a certain data type, the programmer can define an alias or pointer type for that data type. The pseudo-instruction to express this definition is TYPEDEF, the definition format is as follows:

NewDataTypeName TYPEDEF [Distance][PTR] KnownDataType

Where “Distance” can be NEAR, FAR, PROC, etc.;

For example:

CHAR TYPEDEF BYTE ;Define another alias CHAR for BYTE type, in C++ it is: typedef BYTE CHAR

PCHAR TYPEDEF PTR CHAR ;Define a character pointer data type PCHAR, in C++ it is: typedef PTR CHAR PCHAR, i.e., typedef char* PCHAR

Then, the following variable definitions are valid:

CH1 CHAR ‘ABCDEF’ ;Define a string constant

PCH1 PCHAR CH1 ;Define a variable pointing to the string constant CH1

This function is similar to the typedef statement in C++;

8. Operators in Expressions:

HIGH (High 8 bits), LOW (Low 8 bits)

SEG (Segment Address), OFFSET (Offset), TYPE (Data Type), LENGTH (Variable Length), SIZE (Variable Capacity)

WIDTH (Width of Record/Record Field), MASK (Mask Bits of Record/Record Field), etc.;

Among them, HIGH and LOW are used to select the high 8 bits and low 8 bits of the result of the expression, the usage format is as follows:

HIGH Expression LOW Expression

9. Operator and Operator Precedence:

Precedence: High LENGTH, SIZE, WIDTH, MASK, (), [], . (for structure fields), (for record types)

↓ PTR, SEG, OFFSET, TYPE, THIS, : (for segment crossing prefix)

*, /, MOD, SHL, SHR

↓ HIGH, LOW

+, –

↓ EQ, NE, LT, LE, GT, GE

NOT

↓ AND

OR, XOR

Precedence: Low SHORT

10. Address Expressions:

Address expressions are expressions that calculate the addresses of memory units, which can consist of labels, variable names, and base or index registers enclosed in square brackets “[]”; the result of the calculation represents the address of a memory unit, not the value in that memory unit;

Note: In assembly language, calculations of address values are done in bytes, not based on the size of data types; for example:

W1 DW 1234H,5678H

Then, the data in the memory unit at address W1+1 is 7812H, not 5678H; W1+1 indicates the next byte unit address after W1, and W1+2 indicates the address of the next two byte units starting from W1;

11. Symbol Definition Statements:

In programs, constants or numeric expressions are often used and directly written in instruction values; when modifications are needed, each of them must be modified one by one, which undoubtedly increases the workload of maintaining the program, and the meaning represented by each constant or expression is also easily forgotten; thus, assembly language provides a way to define a symbolic name for constants or expressions; once a symbolic name is defined, it can be used directly in instructions; this function is similar to using macro definition directives #define to define constants in C language, and also similar to using the const keyword to define constants in C++;

(1). Equivalence Statement EQU

General Format:

SymbolName EQU Expression

Function: The symbol name on the left represents the expression on the right;

Note: Equivalence statements do not allocate storage space for symbol names, symbol names cannot duplicate other symbol names, i.e., symbol names must be unique; symbol names cannot be redefined; wherever “expression” appears in the program, it is replaced by “symbol name”;

(2). Using Symbol Names to Represent Constants or Expressions

After defining a constant or expression as a symbolic name with a specific meaning, it can be used in the program to represent that constant or expression; for example:

NUMBER EQU 100 ;Assign a symbolic name for the length of the buffer

BUFF_LEN EQU NUMBER+2

CR EQU 13 ;Define a symbolic name for the ASCII code of the “carriage return” character

LN EQU 10 ;Define a symbolic name for the ASCII code of the “newline” character

(3). Using Symbol Names to Represent Strings

For example:

GREETING EQU ‘How are you!’

(4). Using Symbol Names to Represent Keywords or Opcodes

For example:

MOVE EQU MOV ;Assign another symbolic name MOVE for the opcode MOV

COUNTER EQU CX ;Assign a symbolic name “counter” for register CX

12. Equal Sign Statement

Assembly language provides a method to define symbolic constants using the equal sign “=”, i.e., symbolic names can represent constants; general format is as follows:

SymbolName = Expression

The numeric expression should be computable during assembly, it cannot contain forward references to symbol names; symbolic names defined using equal sign statements can be redefined; this can be seen as an assignment statement in high-level languages, which can be assigned multiple times, this is different from EQU; for example:

ABC = 10 + 200*5 ;The value of ABC is 1010

ABC1 = 5*ABC + 21 ;The value of ABC1 is 5071

COUNT = 1 ;The value of COUNT is 1

COUNT = 2*COUNT + 1 ;The value of COUNT is 3

Note: When defining symbolic names using pseudo-instructions “=” and “EQU”, wherever the symbolic name appears in the program, it is replaced by the constant or expression on the right;

13. Label Definition Statement

This statement defines a specified symbolic name, which has the same segment address and offset address as the corresponding storage unit immediately following it, but the type of this symbolic name is newly specified;

The general format of the LABEL statement is as follows:

SymbolName LABEL DataType

Common data types include: BYTE, WORD, DWORD, structure type, record type, NEAR, FAR;

Among them, the first five types are variable types, and the last two types are label types; if the “DataType” in the format is one of the first five types, then “SymbolName” is a variable name; if the “DataType” in the format is one of the last two types, then “SymbolName” is a label name; variable names and label names both have segment address and offset address attributes;

For example:

WBUFFER LABEL WORD

BUFFER DB 200 DUP(0)

In this LABEL definition statement, WBUFFER and BUFFER have exactly the same segment address and offset address, but their data types are different, the purpose is to use two different types of operations to access the same memory area;

Note: The pseudo-instruction itself does not occupy memory space;

The instruction format of assembly language currently has two different standards: Assembly languages on Windows generally follow Intel-style syntax, such as MASM, NASM; whereas assembly languages on Unix/Linux generally follow AT&T-style syntax;

1. General Format of Assembly Language Statements

[Name[:]] Opcode [First Operand][,Second Operand] ;Comment

The number of operands for the opcode in assembly language can be 0, 1, or 2; when there are 2 operands, the statement can have two different formats:

The format for Intel-style assembly language statements on Windows is:

[Name[:]] Opcode Destination Operand DST, Source Operand SRC ;Comment

The format for AT&T-style assembly language statements on Unix/Linux is:

[Name[:]] Opcode Source Operand SRC, Destination Operand DST ;Comment

For example: CYCLE: ADD AX,02H ;In the assembly language statement format, the “Name” is not required for all statements, but if the statement contains a “Name”, then in most cases, the “Name” represents the address of a storage unit in memory, which is the address of the first storage unit where the items following the “Name” are stored in memory (including the segment address and offset address of the segment where the “Name” is located); for example, in the above instruction, CYCLE is the name of the statement, and CYCLE represents the first address in memory where the machine instruction code following it is stored; the separator between the “Name” and the opcode can be a colon “:” or a space character ” “; when separated by a colon, the name represents a label; when separated by a space, the name may represent either a label or a variable; when the opcode has multiple operands, adjacent operands must be separated by a comma “,”; there must be a space separating the opcode and the operands; comments in assembly language statements must start with a semicolon “;”;

Leave a Comment