Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Source: Chip Theory

Abstract: The previous articles from the WeChat account “Chip Theory” mentioned that there are many classification methods for chips, with varieties as numerous as the stars in the night sky. This article narrows the scope of chips to focus on high-end general-purpose chips, specifically CPUs. We will find that there are also many types of CPU chips, and various “schools” exist, resembling a “martial arts world” of CPU chips, where instruction sets and microarchitectures serve as the symbols of these schools. What are CPU instruction sets and microarchitectures? Why are instruction sets and microarchitectures the symbols of these schools? This article attempts to explain using simple language.

CPU is the abbreviation for Central Processing Unit, a high-end general-purpose chip. It is used in computers and smart devices to direct various components to work together efficiently. It serves as the control center of computers and smart devices, acting as the brain of the computer.

Computer components need the CPU to command them, and components within the CPU also require coordination to function, and this commander is the program, which issues commands to the CPU’s internal circuits to work in harmony, referred to as instructions. For instance, to perform a calculation of A+B->C, two load instructions must be issued to retrieve A and B from memory and send them to the arithmetic unit; then, one addition instruction is issued to have the arithmetic unit perform the addition; finally, one store instruction is issued to save C back to memory. In this example, two types of instructions are used: load instructions and addition instructions.

In reality, a general-purpose CPU needs to perform various calculations, reasoning, judgments, and control tasks, and it may have dozens to hundreds of instruction types. The collection of all instructions for a CPU is called the CPU’s instruction set. The instruction set determines how the CPU operates and the hardware architecture that corresponds to it, this hardware architecture is called the CPU’s microarchitecture. The instruction set and microarchitecture form the core intellectual property developed by a company after creating a new CPU. The instruction set is the top-level design specification for the CPU, while the microarchitecture is the physical implementation of this specification, which can vary in many ways. Generally, the term CPU architecture encompasses both the CPU instruction set and microarchitecture.

Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Figure 1. A new instruction set and microarchitecture can establish a new school in the CPU chip martial arts world.

Figure 1 illustrates the relationship between instruction sets, microarchitectures, CPU chips, and the user groups of the chips. It can be seen that developing a new instruction set and microarchitecture is equivalent to founding a new school in the CPU chip martial arts world. Specifically, if Company A develops a new CPU, it means creating a new instruction set and microarchitecture, which not only forms user group A for Company A’s CPU chips but also allows Company A to license other companies to develop CPU chips, forming corresponding user groups B, C, D, etc. A new instruction set and microarchitecture can develop a family of CPU chips and create a cluster of chip users, much like establishing a school in the martial arts world. If the instruction set and microarchitecture are well designed, the CPU’s performance will be good, leading to more users and followers, thus making the school flourish.

Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Figure 2. The instruction set is the link between software and hardware.

Figure 2 shows that the instruction set is the link between software and hardware. Whether in application programs or system codes in operating systems, they are ultimately compiled into machine code programs that can be executed within the CPU chip according to the instruction set specifications. The instructions in machine code control the coordination of the CPU’s internal components to work efficiently and realize the overall function of the CPU, thus enabling the CPU to control the entire system. The instruction set is the law that system software and hardware engineers must adhere to; only by following this law can the software written by software engineers run on different models of systems. Hardware engineers can develop systems that run existing application software.

Having discussed so many concepts and analyzed their relationships, if it is still difficult to understand, let us take a look at three representative examples of CPU instruction sets and microarchitectures. The first is the MCS-51 instruction set, representing microcontrollers or microcontroller units (MCUs). The second is the x86 instruction set, representing complex instruction set computers (CISC). The third is the ARM instruction set, representing reduced instruction set computers (RISC). The purpose of citing these instruction sets and microarchitectures is not for research but to demonstrate, aiming to give readers a macro impression of the symbols of schools in the CPU chip martial arts world—instruction sets and microarchitectures.
Why showcase the MCU instruction set alongside the CPU instruction set?Let us first understand the relationship among MCUs, CPUs, and SoCs. MCU stands for Micro Controller Unit, a type of mid-to-low-end general-purpose chip primarily used for controlling small and medium electronic products and systems. SoC stands for System on Chip.It is a specialized chip that includes low, medium, and high-end processing capabilities aimed at specific application areas. It is difficult to draw absolute boundaries among MCU, CPU, and SoC, but after studying the characteristics of the three, you can clearly distinguish them.
The common feature of the three is that they are all control centers for smart electronic products and systems, and they all have instruction sets and microarchitectures to follow. The differences are: MCUs generally have 4-bit, 8-bit, and 16-bit microarchitectures, with not very high operating frequencies. They handle more control tasks and fewer computational and information processing tasks. Compared to CPUs, they integrate some external interfaces and functional components on a single chip. CPUs generally have 16-bit, 32-bit, and 64-bit microarchitectures, with very high operating frequencies, focusing more on processing speed and computational capability, and rarely integrating external interfaces and functional components on a single chip. SoCs emphasize the system integration level of a single chip, with various width microarchitectures, operating frequencies, and integration levels possible, containing multiple external interfaces and functional components on a single chip.
The difference between MCUs and CPUs is in processing capability, with MCUs being smaller and more control-oriented while CPUs are larger and more computation-oriented. SoCs can vary in processing capability; if more external interfaces and functional components are integrated into an MCU or CPU chip, it becomes an SoC chip.
1. Three Representatives of Instruction Sets and Microarchitectures
Based on the above reasons, and since ARM CPUs are typically embedded in SoC chips, this article selects MCS-51 as the representative instruction set for MCUs, x86 as both the representative for CISC instruction sets and CPU instruction sets, and ARM as both the representative for RISC instruction sets and SoC instruction sets; the following briefly introduces and showcases them.
1. MCS-51 Instruction Set and Microarchitecture
The MCS-51 instruction set was developed by Intel around 1980 and includes five major categories of instructions: data transfer, bit manipulation, logical operations and transfers, arithmetic operations, and control transfers, totaling 111 instructions (Figure 3). The corresponding microarchitecture of the MCS-51 instruction set is shown in Figure 4.
By adding, removing, or altering the hardware of this microarchitecture, Intel has successively developed a series of microcontroller chips compatible with the MCS-51 instruction set, totaling 16 models (Figure 3). Additionally, Intel has opened the MCS-51 instruction set and microarchitecture to many companies, allowing them to produce microcontrollers compatible with the MCS-51 instruction set. These companies include ATMEL, PHILIPS, NXP, OKI, and several companies from Japan, Taiwan, and domestic markets. Therefore, the MCS-51 microcontroller is widely used globally. MCS-51 has opened up a large family of microcontrollers with extensive applications.

Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Figure 3. MCS-51 Microcontroller Instruction Set (Source: Wuyou Document)

Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Figure 4. MCS-51 Microcontroller Internal Microarchitecture (Source: Baidu Encyclopedia)

Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Figure 5. MCS-51 Microcontroller Family (Source: Reference Material 5)

2. x86 Instruction Set and Microarchitecture
In 1978, Intel developed a 16-bit CPU named i8086, along with a mathematical coprocessor i8087 that matched it. These two chips used a mutually compatible instruction set, along with the instruction set used for mathematical calculations such as logarithms, exponents, and trigonometric functions, forming what is commonly known today as the x86 instruction set. Over the following 40 years, Intel has successively developed i80286, i80386, i80486, Pentium series, Core series, and other subsequent CPU models. In developing these CPU chips, Intel continued to use the x86 instruction set to maintain forward compatibility with software and expanded the instruction set with additional extensions for 286, 386, 486, Pentium, Pentium II, etc. “New three years, old three years, patching and fixing for another three years”—it can be said that the x86 instruction set is a model of continuous evolution and increasing complexity, thus embodying a true complex instruction set (CISC).
The x86 instruction set includes over 190 instructions across 12 major categories: data transfer, logical operations, shift operations, program control, arithmetic operations, string operations, processor control, and extensions for 286, 386, 486, Pentium, and Pentium II (Figure 7). Figures 8 illustrate the microarchitectures of Intel Core and AMD K8 CPUs.
Intel and AMD are the two largest developers of x86 instruction set CPU chips in the world. Intel has hundreds of CPU models belonging to the x86 series. Furthermore, with AMD having over 80 compatible x86 CPU models, the x86 instruction set CPU chip family is indeed impressive (Figure 10). Both companies’ CPU chips continuously iterate and upgrade, competing with each other, resulting in a remarkable trajectory of CPU product development.

Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Figure 7. x86 CPU Instruction Set (Source: Reference Material 1)

Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Figure 8. x86 CPU Microarchitectures Examples (Source: Reference Material 3)

Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Figure 9. Intel and AMD x86 CPU Chip Models Chart (Source: Compiled from Network Data)

3. ARM Instruction Set and Microarchitecture
ARM Company was established in 1990 and is the world’s leading provider of CPU intellectual property (IP). Over 95% of smartphones and tablets worldwide use ARM architecture processors. ARM does not design or sell CPU chips; instead, it sells and licenses a series of CPU IP based on the ARM architecture to other companies. ARM licensing is generally categorized into three types: architecture licensing, core licensing, and usage licensing; this article focuses only on the first two. Architecture licensing requires high standards for design teams and comes with a high licensing fee, making it suitable only for large companies. Smaller companies typically choose to purchase core licensing.
Architecture licensing (also known as instruction set licensing) allows users who have purchased the rights to design and manufacture ARM processors at the architecture level to modify the ARM architecture based on the entire instruction set and microarchitecture, even trimming or expanding the ARM instruction set to achieve more suitable, higher performance, lower power consumption, and lower cost objectives. Companies that hold ARM architecture licenses include Qualcomm, Apple, Samsung, Microsoft, HiSilicon, among others.
Core licensing (also referred to as scheme licensing) allows users to apply the ARM core (IP core) they have purchased to their designed chips, but users are not allowed to modify the ARM core. Many companies hold core licenses, including Texas Instruments, Broadcom, Freescale, Fujitsu, and many domestic small and medium chip design companies, to name just a few.
Figure 10 displays the ARM instruction set. The ARM instruction set includes eight major categories: branch instructions, data processing, multiply-accumulate instructions, PSR access, load/store instructions, data exchange, shift instructions, and coprocessors, totaling 50 instructions (of which, ARM has 16 instructions, Thumb has 18 instructions, and Thumb-2 has 16 instructions). In addition, there are 15 control pseudo-instructions, totaling 65 instructions. Figure 11 shows a core microarchitecture of ARM that can be licensed, named ARM Cortex A9.
More than 1,500 companies globally have been authorized by ARM to develop and produce processors based on ARM architecture and SoC chips that include ARM cores. Figure 12 is a logo map of users, tool vendors, and partners utilizing ARM architecture processor technology. The ARM CPU school dominates the mobile communication field and is penetrating into other areas, including IoT, desktop computers, and servers, causing great concern for Intel, the leader of the x86 CPU school.

Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Figure 10. ARM CPU Instruction Set and Function Description (Source: Free Document Network)

Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Figure 11. ARM Cortex A9 Microarchitecture and Single-Core Interface (Source: Reference Material 11)

Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Figure 12. The logo map of users and partners utilizing ARM architecture processor technology (Source: Reference Material 11)

2. Instruction Set Schools in the CPU Chip Martial Arts World
1. MCU Class Instruction Sets
MCUs, as a reduced version of CPUs and SoCs, also have instruction sets and microarchitectures, serving as the control center for smart electronic products and systems. Therefore, they can be considered part of the CPU chip martial arts world, allowing readers to see what types of MCU instruction set schools exist.
The instruction sets in this school include: Zilog’s Z80 instruction set, Intel’s MCS-51 instruction set, MicroChip’s PIC instruction set, ATMEL’s AVR instruction set, TI’s MSP430 instruction set, Motorola’s 68K, ARM’s ARM-Thumb, and so on.
2. CISC Class Instruction Sets
CISC Class Instruction Sets are also known as Complex Instruction Sets. CISC refers to Complex Instruction Set Computer. In CISC instruction processors, the instructions are executed sequentially, and the operations within each instruction are also executed sequentially. The advantage of sequential execution is simple control, but the utilization of the computer’s various components is not high, and execution speed is relatively slow.
The instruction sets in this school include: Intel’s x86 instruction set (x86, x86-64, IA-32, IA-64, etc.), AMD’s compatible x86 instruction set (x86, AMD64, etc.), and VIA’s compatible x86 instruction set (x86, AIS, etc.).
3. RISC Class Instruction Sets
RISC Class Instruction Sets are also known as Reduced Instruction Sets. RISC refers to Reduced Instruction Set Computer. It developed based on CISC instruction systems, where the frequency of use of various CISC instructions is quite uneven; the most commonly used instructions account for only 20% of the total instruction count but appear in 80% of programs. The complexity of the instruction system inevitably increases the complexity of the microprocessor, and the instruction decoding and execution process becomes complex and time-consuming, which reduces the speed of the computer. RISC-type CPUs emerged in the 1980s. Compared to CISC-type CPUs, RISC-type CPUs not only simplify the instruction set but also adopt superscalar and super-pipelined structures, significantly increasing parallel processing capabilities. RISC instruction sets are the development direction for high-performance CPUs. RISC’s instruction formats are unified, with fewer types and simpler addressing methods than complex instruction sets. Naturally, the processing speed improves significantly.
The instruction sets in this school include: DEC’s Alpha instruction set, MIPS’s MIPS instruction set, Sun’s SPARC instruction set, IBM’s PowerPC instruction set developed jointly with Apple and Motorola, IBM’s POWER server CPU instruction set, ARM’s ARM32 and ARM64 instruction sets, and open-source RISC-V instruction sets, among others.
4. EPIC Class Instruction Sets
EPIC Class Instruction Sets are also known as Explicitly Parallel Instruction Sets. There is much debate about whether EPIC is an heir to RISC and CISC systems. The EPIC system design of CPUs is a significant step for Intel processors toward the RISC system. The Intel server CPU that employs EPIC technology is the Itanium (code-named Merced), which is a 64-bit processor and the first in the IA-64 series.

Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Figure 13. Classification of Popular CPU Instruction Sets Globally

3. Achieving the Dream of Autonomous Control of CPUs
To develop autonomous and controllable domestic CPUs, the first step is to solve the issue of autonomy in instruction sets and microarchitectures, followed by software ecology and production issues. Figure 14 lists the main domestic CPU instruction sets and their technological sources, which serve as the foundation for developing autonomous and controllable domestic CPUs. These resources have gained significant attention in the industry. It is hoped that the government can plan and coordinate at the top level, increase financial support, and realize the dream of autonomous and controllable domestic CPUs through joint efforts from the industry.

Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Figure 14. Main Domestic CPU Instruction Sets and Technological Sources (Source: Compiled from Reference Material 9)

How to achieve the autonomy and control of domestic CPU chips? The recognized pathways can be summarized as follows: first, purchase popular CPU architecture licenses to develop domestic CPU products; second, make good use of the open-source CPU instruction set RISC-V to pursue the self-strengthening path for domestic CPUs; third, leverage existing controllable CPU architectures, increase investment and research efforts to fill the gaps in domestic CPUs; fourth, establish a new CPU architecture to pursue the self-reliant development path for domestic CPUs. The ultimate goal is to develop domestic CPUs that are safe, autonomous, and controllable, and the methods can be diverse; it doesn’t necessarily mean starting from scratch, nor does it imply that a new CPU school must be established.
1. Purchase popular CPU architecture licenses to develop domestic CPU products.
Currently, most domestic MCU, CPU, and SoC chip R&D follows this path. This approach has been quite successful in the past, with domestic chip design and sales achieving double-digit growth for many years, and in 2020, it is expected to exceed 380 billion yuan. Most companies adopt this method of licensing, incorporating popular foreign CPU architectures into their self-developed CPU chips, including several SoC chips developed by Huawei HiSilicon, which are based on ARM architecture. Since the US-China technology war, many have realized that this is a path to developing autonomous but uncontrollable CPUs.
2. Make good use of the open-source CPU instruction set RISC-V to pursue the self-strengthening path for domestic CPUs.
US pressures on China’s chip industry have led the industry to place its hopes on developing autonomous and controllable domestic CPUs based on the open-source instruction set RISC-V, which may be the most promising path currently. The reason is that the RISC-V architecture is relatively mature, with excellent performance, and has been validated through numerous commercial applications. Additionally, there is already a certain accumulation of talent and technology domestically. Many believe that RISC-V has the potential to change the current dominance of ARM and x86 in the CPU chip martial arts world, posing a significant challenge to ARM’s advantageous position in consumer and IoT embedded CPU markets. RISC-V represents the dawn of autonomous and controllable domestic CPUs. The only shortcoming of RISC-V at present is that its software ecology is not yet fully developed, requiring efforts from industry peers.
3. Leverage existing controllable CPU architectures, increasing investment and research efforts to fill in the gaps in domestic CPUs.
Some domestic companies have inherited the rights to some outdated foreign CPU architectures over the years, while others have obtained permanent licenses for relatively mature foreign CPU architectures, and some companies have created their own CPU and GPU architectures. For example, Alibaba’s Pingtouge (acquired Zhongtian Micro), Suzhou Guoxin, Zhongsheng Hongxin, and Shenzhen Zhongwei, etc. Huawei HiSilicon has also acquired a permanent license for the ARM v8 architecture, which provides a good foundation for developing autonomous and controllable domestic CPUs. It is recommended that the government and enterprises increase investment and research efforts on these CPU architectures, innovate and develop, continuously improve the software ecology, and seize the opportunity for domestic replacement to achieve iterative upgrades and technological advancements of domestic CPU products.
4. Establish a new CPU architecture to pursue the self-reliant development path for domestic CPUs.
This is akin to designing a new CPU and establishing a new instruction set and microarchitecture. This is the most challenging path, reflecting two main aspects: first, whether the newly designed CPU can possess advanced performance and functionality, efficiency, and economy; second, how to quickly establish a software ecology for the new CPU, including assembly and high-level language programming tools, system development verification tools, etc. Building a software ecology requires time, market presence, and the collaborative efforts of research teams and users. Moreover, establishing a new school in the CPU industry and allowing it to flourish is not an easy task. Therefore, many people are skeptical about this path.
Conclusion: 1. Instruction sets are the links between software and hardware; the software in electronic products, whether in high-level languages or assembly, ultimately translates into a series of instructions that command the various hardware components to work in coordination, with hardware existing to accomplish the tasks assigned by the instructions. 2. Instruction sets are the laws that software and hardware engineers must follow; only by adhering to these laws can the developed software run compatibly on different models of hardware. 3. Instruction sets are the symbols of schools in the CPU chip martial arts world; company names and product names are merely their aliases. Only when instruction sets are the same can products be interchangeable. Only when everyone recognizes the instruction set and is willing to invest in that school can the software ecology be continuously improved, leading to more users and a more prosperous school. 4. Developing autonomous and controllable domestic CPUs, the most challenging aspect is building the software ecology for the CPUs. If we view the CPU instruction set as the symbol of a school, building the software ecology is akin to fostering the atmosphere of the school and expanding its influence.

White Paper Download

Understanding Instruction Sets: The Significance of CPU Architectures in Computing

Leave a Comment