Basic Tutorial Series on Assembly Language

Machine Word Length

Machine word length refers to the number of bits of data that the CPU can process in a single operation. Generally, this number is equal to the length of the CPU’s general-purpose registers and the width of the data bus. In the case of the 8086, it is 16 bits. Due to historical reasons, the x86 series of CPUs were produced earlier, so the machine word length of this series is represented by that of the 8086, which is 16 bits. In contrast, the MIPS series of CPUs appeared relatively later, and they were 32-bit CPUs from the outset, so the machine word length discussed in the MIPS series is 32 bits.

Byte Order

Data storage order in memory can be of two types: Little Endian and Big Endian. Little Endian storage is the most common, as all x86 and MIPS series CPUs we encounter in our daily lives use Little Endian storage. Conversely, Big Endian is less common and is used in PowerPC series CPUs. Additionally, socket programmers may also encounter this storage type, as data transmission over the Internet uses Big Endian storage.

Unsigned and Signed Numbers

Students who have learned C language know that integer data types are divided into unsigned types and signed types. The range of unsigned types is [0, 2ⁿ-1] (where n represents the number of bits for that type, such as short being 16 bits, and int being 32 bits in a 32-bit CPU, and so on). The range of signed types is [-2^n-1, 2^n-1-1]. In fact, there is no distinction between signed and unsigned types at the storage level in computers and the processing level in assembly language; both are treated uniformly (without distinction). The difference in their behavior at the C language level arises from the different interpretations at the higher level. You can conduct an experiment by outputting two identical numbers in C language: 1. printf(“%d”, -1); 2. printf(“%u”, -1); Both numbers are -1, but the output results are vastly different (I have not tested this code yet; if they are the same, please let me know… I would like to investigate). The main reason is the different interpretations at the higher level.

For this part, I recommend that unless you are particularly certain, you should avoid using unsigned in C language, especially if you think that this data will not be negative. This could lead to serious issues. For example, consider the following code:

1. Assuming that array indices will not be negative:

for(unsigned i=10; i >= 0; --i) arr1[i] = arr2[i];

2. Assuming that the size of a type will not be negative:

unsigned x=sizeof(int);
for(int i=0; x-i >= 0; ++i) ...;

Both of the above codes will result in an infinite loop issue because unsigned is always greater than or equal to 0. Note that in the second part of the code, there is an implicit type promotion.

Floating Point Number Storage

The storage of floating point numbers in memory is illustrated in the right image, where s indicates the sign, indicating whether it is a positive or negative number, exp represents the exponent, and frac is a decimal between [1.0, 2.0).

Here, the exponent represented by exp is not a simple exponent but must be reduced by a bias. This bias is 127 for single precision and 1023 for double precision.

The frac can be adjusted by modifying exp to ensure it is between [1.0, 2.0). For example, if a number is converted to binary and its binary representation is 111.0011, then by adding 2 to exp, this number can be expressed as 1.110011.

The following image is an example of conversion:

The above discusses normalized representation; in addition, floating point numbers also have denormalized representation:

1. +0 and -0: exp and frac are both 0;

2. +∞ and -∞: exp and frac are both 1;

3. Not a number: exp is all 1s, frac is all 0s.

Note: The order of s, exp, and frac is carefully arranged, as this order allows for direct comparison of floating point numbers based on their binary size without needing to decode first.

Regarding the rounding issue of floating point numbers, in decimal, it follows the general rounding rules, but for numbers like 12.235000000….. (keeping two decimal places), where the reference digit is 5000000….., it must satisfy rounding to even, meaning that after rounding, the last digit should be even. Thus, both 12.235000000….. and 12.245000000….. rounded to two decimal places yield 12.24, ensuring the last digit is even (4). The same applies to binary; when the reference digit is 10000….., it must also round to even.

For example, rounding the following numbers to two decimal places:

1. 10.00011  -->  reference digit 011 < 100...  -->  directly discard  -->  10.00
2. 10.00110  -->  reference digit 110 > 100...  -->  directly carry  -->  10.01
3. 10.11100  -->  reference digit 100 = 100...  -->  round to even  -->  11.00
4. 10.10100  -->  reference digit 100 = 100...  -->  round to even  -->  10.10

Machine Word Length

Byte Order

Unsigned and Signed Numbers

Floating Point Number Storage

Related posts

Leave a Comment Cancel reply