This series will explain the book “Assembly Language”. This section covers Chapter 4 – The First Complete Assembly Program.

We can finally write our first complete program. Previously, we were writing some instructions in Debug and executing them there. Now we will start writing a complete assembly language program, using compilation and linking to compile and link them into an executable file (such as *.exe files), to run in the operating system.

In this chapter, we will write our first such program.

Overview of This Section
1. The process from writing a source program to execution
2. The components of the source program
3. The `<span>complete development process</span>` using DOSBox
4. Execution of .exe
5. Who loads the executable file `<span>into memory</span>` and `<span>executes it</span>`?
6. Experiment – Tracing the program execution process

1. The Complete Process from Writing to Executing a File

Assembly Language: Chapter 4 - The First Complete Assembly Program — The complete process from writing to executing a file

Step 1:

Write the source file text, which generates a text file that stores the source file.

Step 2:

Compile and link the source file

Use the compiler to compile the source program in the source file, generating a target file; then use the linker to link the target file, producing an executable file that can be run directly in the operating system.

The executable file contains two parts:

Contents of the Executable File	Explanation
1. Program	Translated from `<span>assembly instructions</span>` in the `<span>source program</span>` to `<span>machine code</span>` and `<span>data</span>` (data defined in the source program)
2. Related descriptive information	For example:`<span> the size of the program</span>`, how much `<span>memory space</span>` it will occupy, etc.

This step will produce an executable file that runs in the operating system.

Step 3:

Execute the program (instructions) in the executable file.

Note that in Step 2, we saw that the executable file contains not only executable instruction machine code and data, but also related descriptive information. This step is to execute only the machine code. How is it executed?

The operating system loads the machine code and data from the executable file into memory, according to the descriptive information in the executable file, and performs related initialization (for example, setting CS:IP to point to the first instruction to be executed), and then the CPU executes the program.

2. Components of the Source Program

First, here is the complete program:

Next, we will explain each component one by one:

Components of the Source Program
Pseudoinstructions
Assembly instructions
Labels

1. Pseudoinstructions

In an assembly program, there are two types of instructions: one is assembly instructions and the other is pseudoinstructions. The difference between the two is as follows:

Instruction	Explanation
Assembly instructions	Correspond to the instructions of machine code, ultimately executed by the CPU
Pseudoinstructions	Intermediate products for better compiler operation, not executed by the CPU

In the above program, there are three types of pseudoinstructions:

1. Segment definition pseudoinstructions:

This is a pair of pseudoinstructions that must be used when using assembly.

As mentioned earlier, memory is divided into multiple segments, and a segment must have a name to identify it, so this instruction is essential.

XXX segment

xxx ends

2. Assume pseudoinstruction:

That is, assume, in the source code it is:

assume cs:codesg

This pseudoinstruction means “assume” it associates a certain segment register with a segment defined in the program using segment…ends. By using assume, this association is explained, and when necessary, the compiler can link the segment register with a specific segment.

This indicates that codesg is used to store the code segment (CS segment).

3. End pseudoinstruction:

This is the last instruction in the assembly source code, used to indicate that the program can end compilation.

The end is a termination marker for an assembly program. When the compiler encounters the pseudoinstruction end during the compilation of the assembly program, it ends the compilation of the source program. Therefore, when we write a program, if the program is finished, we must add the pseudoinstruction end at the end.Otherwise, the compiler will not know where the program ends during compilation.

2. Assembly Instructions in the Source Program

Instructions or data in the source program that are ultimately executed and processed by the computer.

3. Labels

As a name of a segment, ultimately compiled and linked into a segment address.

In the assembly source program, in addition to assembly instructions and pseudoinstructions, there are also some labels, such as "codesg".

A label refers to an address. For example, codesg is in front of the segment, serving as a name of a segment, and this name of a segment will ultimately be processed by the compiler and linker into a segment address.

Large project programs are often decomposed into one segment after another to complete, so labels are sometimes essential.

3. Related to Program Termination

Note that we also have the following two instructions:

mov ax,4c00H
int 21H

These two instructions have not been discussed; they are the program return instructions under the DOS operating system. Now we know that after a program ends, it will return control of the CPU to the program that allowed it to run, and we call this process:program return. So, how to return? We should add a return program segment at the end of the program.

Just like the code segment above!

4. The Complete Development Process Using DOSBox

Previously, we were executing short instruction sets on the debug program in DOSBox. Now we will go through the complete source writing - compilation - linking - execution process.

For details, please refer to this article: The Complete Development Process Using DOSBox.

5. Who Loads the Program from the Executable File into Memory and Makes It Run?

Clearly, for the executable file P1 to run, there must be a running program P2 that loads P1 from the executable file into memory and hands over control of the CPU to it, allowing P1 to run; when P1 finishes running, it should return control of the CPU to the program P2 that allowed it to run.

In fact, in DOSBox, there is a command.com program, which is the shell of the DOSBox operating system.

The operating system is a large and complex software system composed of multiple functional modules. Any general-purpose operating system must provide a program called a shell, which users (operators) use to operate the computer system for work.

The process of running a program is as follows:

(1) When executing 1.exe directly in DOS, it is the running command that loads the program from 1.exe into memory;

(2) command sets the CPU’s CS:IP to point to the first instruction of the program (the entry point of the program), allowing the program to run;

(3) After the program finishes running, it returns to command, and the CPU continues running command, displaying the drive prompt.

Small Insight

So what actually runs is not the executable file, but the instruction set loaded into memory by the program that loads the executable file.

6. Tracing the Program Execution Process

This is what we usually refer to as debugging, setting breakpoints. For details, please refer to this article: Tracing the Program Execution Process in DOSBox.

Conclusion

Alright, in this section we discussed the complete assembly program from writing to running and debugging. Let’s review:

Review of Knowledge from Writing to Running and Debugging a Complete Assembly Program
1. The process from writing a source program to execution
2. The components of the source program.
3. What are pseudoinstructions?
4. What pseudoinstruction associates program segments with registers?
5. The `<span>complete development process using DOSBox</span>`
6. Execution of .exe
7. Who loads the executable file `<span>into memory</span>` and `<span>executes it</span>`?
8. The memory architecture for loading executable files in DOSBox, what do the cx and ds registers hold?
9. Why is the address of the first instruction in the source code said to be ds+10:0000?
10. When debugging a program, provide the call chain of the program.

If you cannot explain each item in this table, then review the above content carefully!

Answers can be found in the comments section, and everyone is welcome to discuss~

😉【Linux】 Essential Knowledge Points Before Learning Linux😉【Linux】 Linux Kernel Memory Abstraction for Processes😉【Linux】 Overview of Linux 1 – Linux’s Use of Physical Memory😉【Linux】 The Entire Process from Writing to Running Software😉【Linux】 “Entering the Intranet” – Overview of Linux Kernel Structure😉【Linux】 Linux Interrupt Mechanism😉【Linux】 Description of Linux Processes😉【Assembly Language】 1 – Basic Hardware Knowledge😉【DOSBox】 1 – Debug😉【Assembly Language】 2 – Basic Knowledge of Registers😉【Assembly Language】 3 – Interaction Between Registers and Memory😉【DOSBox】 Complete Development Process😉【DOSBox】 Debugging Executable Files

About Xiao Xi

😉 Hehe, I am Xiao Xi, focusing on C language, Linux kernel, and cloud computing.

Here is my WeChat, looking forward to learning and communicating with you!

(Please note when adding WeChat)~

Xiao Xi believes:It is best to gradually speed up, and the foundation is always worth spending 85% of the effort on. My articles are all about simple foundational knowledge. If you like this style:

Feel free to follow, comment, and share, so more friends can see it~~~🙈

What do you want to see in the next issue? Leave a message in the comments section! See you next time!

Assembly Language: Chapter 4 – The First Complete Assembly Program