How Assembly Language Is Translated to Machine Language

A computer is composed of many logic gate circuits and some electronic components. Different hardware configurations have different instruction sets, which can be represented by mnemonics, known as assembly language. Early assembly languages could be manually translated into machine language by professionals, and then these machine languages were input into the computer for execution and output results. Later, computers could automatically translate these mnemonics, or assembly language. I want to know how computers perform this automatic translation and what the process is like (I know there are compilers).

Without considering macro assembly, it is roughly like this:

map<string, string> mnemonicToOpcode = {
   {"nop", "90"},
   {"mov", "56"},
   {"$1", "00$1"},
   {["$1"], "01$1"},
   ...
}

Then you can write an assembly program:

nop
nop
mov ax,bx
mov [ax],bx

Then it follows that map to perform simple string replacement. For example, when it sees ‘nop’, it replaces it with the hexadecimal ’90’, and when it sees ‘mov’, it replaces it with the hexadecimal ’56’; the subsequent ‘ax/bx’ in the instruction is replaced with the register address; whether the parameter has brackets indicates whether it is pointer access or direct value retrieval, so different indicators must be added accordingly (for example, in the language I invented, ’00’ is for direct value access, and ’01’ is for indirect address access)…

Note that different assemblers support different styles of mnemonic writing. You can freely agree on your own conventions. Additionally, the instruction set of different CPUs varies, so you need to grasp the rules and summarize the string replacement rules. But in short, it is just string replacement, nothing special.

Of course, different systems may specify different executable file formats. For example, Linux and Windows systems are different; the command files and exe files under the DOS system were also different.

In general, you need to fill in a data structure called the ‘file header’ according to the rules, and then place the first instruction (in machine code) at the correct position; additionally, you may need to do a bit more work for ‘relocation’ support and calculate the relative address of labels, then replace them in the relevant instructions.

But ultimately, the process of translating assembly to machine language is basically just string replacement, nothing mysterious—compiler principles are powerful in the context of ‘high-level language compilation to assembly’, but for assemblers, there is no need for that.

Leave a Comment