Understanding Multi-Core CPUs and SoC Chips and Their Working Principles

Source: Chip Theory

Introduction: Nowadays, CPUs or SoCs are basically integrated with multiple CPU cores in a single chip, forming what is commonly referred to as 4-core, 8-core, or more core CPUs or SoC chips. Why adopt this approach? How do multiple CPU cores work together? Is more CPU cores always better? With these questions, I consulted some materials, learned relevant concepts and key points, and edited them into this article, attempting to answer these professional questions in layman’s terms. The article serves as a learning record for myself and hopes to provide reference for readers. Any inaccuracies are welcome for discussion and correction.

To explain what multi-core CPUs or SoC chips are, we must first start with CPU cores. We know that CPU stands for Central Processing Unit, which has the capability of control and information processing, serving as the control center of computers and smart devices. If we exclude the packaging and auxiliary circuits (such as pin interface circuits, power circuits, and clock circuits, etc.) from traditional CPU chips and only retain the core circuits that complete control and information processing functions, this part of the circuit is the CPU core, also referred to as CPU cores. A CPU core is essentially a fully independent processor that can read instructions from internal memory and execute control and computation tasks specified by the instructions.

If a CPU core and its related auxiliary circuits are packaged in one chip, this chip is a traditional single-core CPU chip, referred to as a single-core CPU. If multiple CPU cores and their related auxiliary circuits are packaged in one chip, this chip is a multi-core CPU chip, referred to as a multi-core CPU. Of course, multi-core CPU chips will contain more auxiliary circuits to solve communication and coordination issues between multiple CPU cores.

If some other functional components and interface circuits are integrated into a multi-core CPU chip, it forms a complete system, and this chip becomes a multi-core SoC chip, referred to as a multi-core SoC. In a non-strictly defined manner, SoC can also be referred to as CPU.

Figure 1 uses ARM’s single-core CPU and multi-core CPU as examples. The red dashed box marked in the figure represents individual CPU cores, where Figure 1a is a schematic diagram of the ARM Cortex-A8 single-core CPU chip based on the ARMv7 microarchitecture. Figures 1b and 1c are schematic diagrams of the ARM Cortex-A9 MPCore, which consists of 2-core and 4-core CPU chips made up of 2 and 4 Cortex-A9 cores respectively.

Figure 1. Schematic Diagram of ARM Single-Core and Multi-Core CPU Chips

The ARM Cortex-A8 CPU is the first application processor based on the new generation ARM v7 architecture, utilizing the higher performance, power efficiency, and code density of the Thumb-2 technology, but it has only a single-core architecture. The first company to obtain authorization for the Cortex-A8 CPU was Texas Instruments, followed by Freescale, Matsushita, and Samsung [3]. Application examples of Cortex-A8 include the MYS-S5PV210 development board, TI OMAP3 series, Apple A4 processor (iPhone 4), Samsung S5PC110 (Samsung I9000), Rockchip RK2918, MediaTek MT6575, etc. Additionally, Qualcomm’s MSM8255, MSM7230, etc., can also be considered as derived products.

The ARM Cortex-A9 MPCore CPU belongs to the Cortex-A series and is also based on the ARM v7-A microarchitecture, providing an optional architecture of 1 to 4 CPU cores. Most of the 4-core CPUs we see today belong to the Cortex-A9 series. Application examples of ARM Cortex-A9 include Texas Instruments OMAP 4430/4460, Tegra 2, Tegra 3, Newland NS115, Rockchip RK3066, MediaTek MT6577, Samsung Exynos 4210, 4412, Huawei K3V2, etc. Additionally, Qualcomm APQ8064, MSM8960, Apple A6, A6X, etc., can be seen as improved versions based on the A9 architecture [6].

1. Development History of Multi-Core CPUs

The original intention of developing multi-core CPUs comes from the simple principle that “many hands make light work”. From this perspective, when chip integration was low, Intel 8086 CPU and i8087 coprocessor should be considered the prototypes of multi-core CPUs, where multiple chips collaborated to form a processing core, requiring many technologies to solve cooperation and coordination issues between the CPU and coprocessor.

Today, chip integration is very high, with several or even dozens of CPU cores integrated into a single chip, but it still cannot meet the needs of supercomputing, which requires thousands of high-performance CPU chips to work together in supercomputers, which can be seen as a multi-core CPU cluster both internally and externally.

From an appearance perspective, a CPU chip is just a chip, but when the packaging is opened, it may only contain one die or multiple dies packaged together, referred to as a Multi-Chip Module (MCM), as shown in Figure 2b. However, from a software perspective, the packaging form is irrelevant; regardless of whether it is inside or outside the chip, the number of CPU cores is what matters, as they determine the system’s parallel computing and processing capabilities, and their clock frequency and communication methods between cores determine the system’s processing speed.

Figure 2. Schematic Diagram of Single Die Packaging, Multi-Die MCM Packaging, and Multi-Chip Systems (Source: Reference Material 14)

Moreover, today’s desktop computer CPUs and mobile SoCs also integrate many Graphics Processing Unit (GPU) cores, Artificial Intelligence Processor (APU) cores, etc. Should these also be considered “cores” in multi-core CPUs and SoCs? I believe they should be considered as cores from a broad perspective.

Therefore, reviewing the development of multi-core CPUs can be roughly divided into: 1. Prototype phase; 2. Single-chip single-core; 3. Single-chip multi-core; 4. Multi-core single-chip; 5. Multi-core multi-chip scenarios. These development stages do not necessarily follow this sequence, and there may be overlapping periods or reversed situations. The second and third scenarios are generally applied to CPU chips in desktop computers, smartphones, and other mobile terminals, while the fourth and fifth scenarios are applied to server and supercomputer CPU chips. This article is limited in scope and focuses on the third scenario of single-chip multi-core, where the CPU operates in a Chip Multi-Processor (CMP) mode.

From 1971 to 2004, single-core CPUs walked a solitary path. In 1971, Intel launched the world’s first CPU chip, the i4004, until 2004, when it released the hyper-threading Pentium 4 CPU series, spanning 33 years. During this period, CPU chips developed well along the trajectory predicted by Moore’s Law, continuously doubling integration, increasing clock frequency, and rapidly increasing transistor counts, leading to a path of continuous iteration and upgrading of single-core CPUs.

However, when the significant increase in transistor counts led to a dramatic rise in power consumption, making CPU chip heat unbearable, the reliability of CPU chips was also greatly affected. The development of single-core CPUs seemed to reach a dead end. Gordon Moore, the proposer of Moore’s Law, also vaguely felt that the path of “shrinking size” and “frequency dominance” was about to end. In April 2005, he publicly stated that Moore’s Law, which had led the chip industry for nearly 40 years, would fail within 10 to 20 years.

In fact, as early as the late 1990s, many industry professionals called for multi-core CPUs realized through CMP technology to replace single-threaded single-core CPUs. High-end server manufacturers such as IBM, HP, and Sun successively launched multi-core server CPUs. However, due to the high prices and narrow application scope of server CPU chips, they did not attract widespread attention from the public.

In early 2005, AMD took the lead in launching 64-bit CPU chips and was the first to get Intel to guarantee the stability and compatibility of its 64-bit CPUs, prompting Intel to think of using “multi-core” as a weapon for its “empire strike back.” In April 2005, Intel hastily launched the simple packaged 2-core Pentium D and Pentium 4 Extreme Edition 840. Shortly thereafter, AMD also released dual-core Opteron and Athlon CPU chips [9].

2006 is considered the year of multi-core CPUs. On July 23 of that year, Intel released its CPUs based on the Core architecture. In November, Intel launched the Xeon 5300 series CPUs and Core 2 dual-core and quad-core Extreme Edition series CPUs for servers, workstations, and high-end PCs. Compared to the previous generation desktop CPUs, the Core 2 dual-core CPU improved performance by 40% while reducing power consumption by 40%.

In response to Intel, on July 24, AMD announced a significant price reduction for its dual-core Athlon 64 X2 processor. Both CPU giants emphasized energy-saving effects when promoting multi-core CPUs. The low-voltage version of Intel’s quad-core Xeon CPU had a power consumption of only 50 watts, while AMD’s “Barcelona” quad-core CPU did not exceed 95 watts. According to Intel’s senior vice president Pat Gelsinger, Moore’s Law still has vitality because “the evolution from single-core to dual-core to multi-core may be the fastest period of CPU chip performance improvement since the advent of Moore’s Law” [9].

CPU technology develops faster than software technology, and software support for multi-core CPUs is relatively lagging. Without operating system support, the performance advantages of multi-core CPUs cannot be fully realized. For instance, under the same conditions running Windows 7, the difference in experience between a 4-core CPU and an 8-core CPU is not significant, as Windows 7 does not optimize for 8-core CPUs. However, after Windows 10 was released, the experience speed brought by 8-core CPUs was significantly faster than that of 4-core processors, due to Microsoft’s optimizations for multi-core CPUs in Windows 10. Furthermore, Microsoft will continue to optimize for multi-core CPUs in Windows 10.

Currently, the server CPU with the most cores is the Intel Xeon Platinum 9282, featuring 56 cores and 112 threads, with as many as 5903 solder balls, estimated to cost around $40,000; AMD’s Epyc 7H12 has 64 cores and 128 threads, with a thermal design power of 280W. Both CPUs require liquid cooling. The desktop CPU with the most cores is the Intel Core i9-7980XE Extreme Edition, with 18 cores and 36 threads, a thermal design power of 165W, priced at $1999; AMD’s Ryzen 9 5950X has 16 cores and 32 threads, a thermal design power of 105W, priced at 6049 yuan. The mobile SoCs with the most cores include Apple’s M1, Kirin 9000, Qualcomm Snapdragon 888, etc. Multi-core CPUs or multi-core SoCs seem to have become a trend, but is it true that more cores always mean better CPUs? Without considering other influencing factors, from a technical and integration perspective, some even predict that by 2050, people may use CPU chips with 1024 cores [17].

2. Examples of Multi-Core CPUs and SoC Chips

Examples of chips include: 1. Multi-core CPU chip for servers: Intel Xeon W-3175X; 2. Multi-core CPU chip for desktop PC applications: Intel Core i7-980X; 3. Multi-core SoC chip for smartphones: Huawei Kirin 9000/E; 4. Multi-core CPU chip for ARM architecture PC applications: Apple M1; 5. Multi-core AI CPU chip compatible with x86 architecture: VIA CHA; 6. Domestic multi-core server CPU chip: Tencent Cloud S2500.

1. Intel Xeon W-3175X: Launched by Intel in 2018, the Xeon W-3175X server CPU chip is manufactured using 14nm technology, featuring 28 cores and 56 threads, with a frequency of 3.1 to 4.3GHz, a level 3 cache of 38.5MB, and supports six-channel DDR4-2666 ECC/512GB memory, with a packaging interface of LGA3647, paired with the C621 chipset. Its price reaches $2999, equivalent to over 20,000 yuan.

This chip adopts a new 6×6 mesh architecture, with I/O located at the top and memory channels in the center on both sides, with a maximum of 28 CPU cores, quadrupling the L2 Cache (each core’s cache upgraded from 256KB to 1MB), reducing shared L3 Cache but improving utilization [11].

Figure 3. 28-Core Architecture Diagram of Intel Xeon W-3175X (Source: Reference Material 11)

The Xeon W-3175X is the top configuration of this architecture with 28 CPU cores. However, the cost is extremely high power consumption and heat generation, with a nominal thermal design power of 255W, and the default frequency tested can easily reach 380W, exceeding 500W when overclocked. In daily use, high-end water cooling is a must. When Intel released the Xeon W-3175X, it particularly recommended the Asetek 690LX-PN integrated water cooling solution, which is currently the only cooling solution designed for W-3175X. Asetek claims that this water cooler has a maximum cooling capacity of 500W, so as long as extreme overclocking is not performed, this solution should adequately address the W-3175X’s cooling issues [13]. Its price is $399.99, equivalent to 2680 yuan.

Figure 4. Asetek 690LX-PN Water Cooling Solution (Source: Reference Material 13)

2. Intel Core i7-980X: The Intel Core i7-980X adopts a 32nm manufacturing process and Gulftown core, and is the first 6-core CPU launched by Intel for the desktop PC market. Compared to the previous Bloomfield core CPUs, it has a more advanced process, more cores, larger cache, and maintains good backward compatibility, strictly speaking, it is also a derivative product of the Nehalem microarchitecture. Especially, the CPU still uses the LGA 1366 interface, allowing users who previously purchased the X58 motherboard to continue to support the new 32nm 6-core CPU by simply upgrading the BIOS, experiencing the supreme experience brought by 6 cores and 12 threads. It possesses all the features of Intel’s high-end desktop CPUs, including hyper-threading technology, turbo boost technology, a tri-channel DDR3 memory controller, and level 3 cache, particularly, the Core i7-980X is unlocked, allowing overclockers to challenge limits more easily [5].

Figure 5. 6-Core Architecture Diagram of Intel Core i7-980X (Source: Reference Material 5)

3. Huawei Kirin 9000: The Huawei Mate40 phone is equipped with the self-developed Kirin 9000 chip, which is currently the most powerful 5G multi-core SoC chip. Refer to the image below, this chip integrates 8 CPU cores, 3 NPU cores, and a GPU with 24 cores, manufactured using 5nm technology, integrating 15.3 billion transistors. It outperforms MediaTek’s strongest 5G mobile chip, Dimensity 2000, in performance tests. Unfortunately, the U.S. has unjustly blocked Huawei’s production channels for high-end smartphone chips, making Huawei’s flagship Mate40 series potentially a limited edition.

Figure 6. Multi-Core Architecture Diagram of Huawei’s Strongest 5G Smartphone Chip Kirin 9000

4. Apple M1: The image below shows Apple’s first self-developed Mac computer 8-core SoC chip layout. It features 4 high-efficiency Ice Storm small cores, 4 high-performance Fire Storm large cores, and an 8-core GPU. This chip is manufactured using 5nm technology, integrating 16 billion transistors. Apple has initiated a new naming scheme for this new processor series, calling it Apple M1.

Figure 7. Layout of Apple’s Self-Developed Mac Computer SoC Chip Apple M1 (Source: Reference Material 19)

5. VIA CHA: VIA’s latest x86-based AI processor is an 8-core SoC, manufactured using TSMC’s 16nm technology, with a chip area not exceeding 195 square millimeters, internally adopting a ring bus design, integrating eight x86 CPU cores, 16MB shared level 3 cache, four-channel DDR4-3200 memory controller, PCIe 3.0 controller (44 lanes), southbridge, and I/O functions, making it a complete SoC. It is temporarily named CHA.

Figure 8. VIA’s 8-Core AI Processor Chip Layout (Left) and Layout (Right) (Source: Reference Material 15)

6. Tencent Cloud S2500: Reportedly, domestic CPU manufacturer Phytium released a multi-core CPU chip aimed at server applications — the Tencent Cloud S2500 in July. This chip is manufactured using 16nm technology, with a chip area of 400mm2, configurable with up to 64 FTC663 architecture CPU cores, a frequency of 2.0 to 2.2GHz, a level 3 cache of 64MB, supporting eight-channel DDR4 memory, providing 800Gbps bandwidth through four direct connection ports, supporting 2 to 8-way parallelism, and a single system can provide configurations of 128 to 512 CPU cores, with a thermal design power of 150W. At the Phytium 2020 Ecological Partner Conference, 15 domestic manufacturers including Great Wall, Inspur, Tongfang, Shuguang, and ZTE also released their multi-route server products based on the Tencent Cloud S2500, achieving promising breakthroughs in software ecosystem development.

Figure 9. Phytium’s 64-Core CPU Chip Tencent Cloud S2500 (Source: Internet Image)

3. Why Use Multiple Cores?

We first look at this problem from the perspective of task processing. If we refer to the tasks handled by the CPU as tasks, previously, CPUs had only one core, and the CPU could only “focus on one thing” to handle one task, completing one task before proceeding to the next task. This is professionally referred to as serial single-task processing. This was suitable during the DOS operating system era, where the only pursuit for CPUs was to maximize processing speed. With the emergence of the Windows operating system, the demand for multi-task processing arose, requiring CPUs to be able to “multi-task” and handle multiple tasks simultaneously. This is professionally referred to as time-sharing multi-task processing. During this period, the pursuit for CPUs was not only to maximize processing speed but also to handle as many tasks as possible simultaneously. In fact, this “multi-tasking” processing method allocates time to multiple tasks, leading to an increase in the number of tasks processed by the CPU from a macro perspective, but the processing speed for any specific task may slow down.

To achieve more tasks and faster processing speeds, it is natural for people to consider integrating multiple CPU cores in a chip, adopting a “multi-core multi-task” approach, hence the demand for multi-core CPUs, which is particularly urgent in server CPU applications.

We can also view this problem from the perspective of increasing CPU clock frequency to accelerate processing speed. Regardless of whether it is “single-task”, “multi-task”, or “multi-core multi-task”, increasing the CPU clock frequency will accelerate processing speed. Whether it is single-task or multi-task, tasks will be completed in a shorter time. Therefore, the history of CPU development has been a history of continuous improvement in clock frequency alongside advancements in chip technology, from the early MHz level to the current GHz level, representing an approximate increase of 1000 times. Whether single-core or multi-core, CPU clock frequency is an important indicator for selecting CPU chips.

For a long time, as Intel and AMD CPUs became faster, the performance and speed of software on x86 operating systems naturally improved, allowing system manufacturers to benefit from overall performance improvements with minor software adjustments.

However, as chip technology progresses along Moore’s Law, increasing chip integration and transistor density, clock frequency increases, directly leading to a significant increase in CPU chip power, and heat dissipation has become an insurmountable obstacle. It has been estimated that for every 1GHz increase in CPU clock frequency, power consumption rises by 25 watts, and when chip power exceeds 150 watts, existing air cooling solutions can no longer meet the requirements. The Pentium 4 Extreme Edition CPU chip with a frequency of 3.4GHz launched by Intel around 2003 had a maximum power consumption of 135 watts, earning it the nickname “electric stove”, and some even jokingly used it to fry eggs. The current server CPU chip Xeon W-3175 has a nominal power consumption of 255W, with default frequencies tested reaching 380W, and exceeding 500W when overclocked, necessitating high-end water cooling systems for cooling.

Thus, extreme power consumption limits the improvement of CPU frequency. The following diagram shows the trend of CPU power density over time, where CPU chips after Intel Pentium have seen a sharp increase in power density due to increases in transistor density and clock frequency, leading to CPU heat generation exceeding that of the sun’s surface.

Figure 10. Trend of CPU Power Density Over Time (Source: Professor Wei Shaojun’s Lecture)

In summary, pursuing multi-task processing capabilities and seeking speed improvements are the two main objectives in CPU chip design. While improving CPU clock frequency to accelerate processing speed is constrained by CPU power limits, multi-core CPU chips have become the necessary path to resolve the above contradictions. Currently, multi-core CPUs and SoCs have become the mainstream in processor chip development.

4. Technologies Used in Multi-Core CPUs

Compared to single-core CPUs, multi-core CPUs face significant challenges in architecture, software, power consumption, and security design, but they also contain great potential. This article refers to the reference materials attached and provides a brief introduction to the technologies used in multi-core CPUs.

1. Hyper-Threading Technology

A traditional CPU core has one processing unit (PU) and one architectural state (AS), and can only handle one software thread (Thread) at the same time. A CPU core that adopts Hyper-Threading (HT) technology contains one PU and two AS, and the two AS share this PU. When software runs on the CPU core, the AS interfaces with the software thread and allocates the thread’s tasks to the relevant units within the PU. Thus, two AS can handle two software threads.

Using a factory analogy, the PU is the production department, with several machines for production; the AS is the order manager, who can only handle one task order at a time. Software threads are akin to task orders. If the production workshop only has one AS, it can only process one task order at the same time, leaving some machines idle. If there are two AS, it can handle two task orders and allocate tasks to different machines for completion.

Therefore, a CPU core with hyper-threading has a slight increase in integration but appears as two logical CPU cores, allowing it to handle two software threads simultaneously, roughly improving processing capability by 40%. Thus, we often see CPU chip advertisements stating that a multi-core CPU chip has N cores and 2×N threads, which is the benefit brought by hyper-threading. Otherwise, without hyper-threading technology, the parameters of multi-core CPU chips can only be written as N cores and N threads. The diagram below illustrates the difference between a 2-core CPU without hyper-threading and with hyper-threading.

Figure 11. Illustration of Hyper-Threading in Multi-Core CPUs (Source: Reference Material 20)

2. Core Structure Research

Multi-core CPU structures can be divided into homogeneous multi-core and heterogeneous multi-core. Homogeneous multi-core refers to multiple CPU cores within a chip having the same structure, while heterogeneous multi-core refers to multiple CPU cores having different structures. Researching core structure implementation methods for different application scenarios is crucial for overall CPU performance. The core’s structure directly influences the chip’s area, power consumption, and performance. How to inherit and develop traditional CPU achievements also directly affects multi-core performance and implementation cycles. Additionally, the instruction systems used by the cores are also important for system implementation, whether multi-core uses the same or different instruction systems, and whether they can run operating systems, etc., are critical considerations for designers.

3. Cache Design Technology

The speed gap between CPU and main memory is a prominent contradiction for multi-core CPUs, necessitating the use of multi-level cache to alleviate this. It can be divided into shared level 1 cache, shared level 2 cache, and shared main memory. Multi-core CPUs generally adopt a structure of shared level 2 cache, where each CPU core has its private level 1 cache, and all CPU cores share level 2 cache.

The design of the cache structure directly relates to overall system performance. However, in multi-core CPUs, whether shared cache or private cache is superior, whether to establish multi-level cache on-chip, and how many levels of cache to establish all significantly impact the chip’s size, power consumption, layout, performance, and operational efficiency, requiring careful study and consideration. Additionally, the consistency issues arising from multi-level caches must also be addressed.

4. Inter-Core Communication Technology

Multi-core CPUs execute programs simultaneously across cores and may require data sharing and synchronization between cores, thus the hardware structure must support communication between CPU cores. An efficient communication mechanism is crucial for the high performance of multi-core CPUs. The mainstream on-chip efficient communication mechanisms are of two types: one is based on a bus-shared cache structure, and the other is based on an on-chip interconnect structure.

The bus-shared cache structure means each CPU core has a shared level 2 or level 3 cache for storing frequently used data and communicates through inter-core connection buses. Its advantages are a simple structure and high communication speed, while its disadvantages are poor scalability. The on-chip interconnect structure means each CPU core has independent processing units and caches, with each CPU core connected through crossbar switches or on-chip networks. Each CPU core communicates through messages. This structure’s advantages are good scalability and guaranteed data bandwidth, while its disadvantages are complex hardware structure and significant software modifications.

5. Bus Design Technology

In traditional CPUs, cache misses or memory access events negatively impact CPU execution efficiency, and the efficiency of the bus interface unit (BIU) determines the extent of this impact. In multi-core CPUs, when multiple CPU cores simultaneously request memory access or when multiple CPU cores experience cache misses in their private caches, the arbitration mechanism of the BIU for these access requests and the efficiency of converting access to external memory determines the overall performance of the multi-core CPU system.

6. Operating Systems for Multi-Core CPUs

For multi-core CPUs, optimizing operating system task scheduling is key to enhancing execution efficiency. Task scheduling algorithms can be divided into global queue scheduling and local queue scheduling. The former refers to the operating system maintaining a global task waiting queue, where when a CPU core becomes idle, the operating system selects a ready task from the global waiting queue to execute on that core. The advantage is high CPU core utilization. The latter refers to the operating system maintaining a local task waiting queue for each CPU core, where when a CPU core becomes idle, it selects a ready task from that core’s waiting queue for execution. The advantage is that it helps improve the local cache hit rate for the CPU core. Most multi-core CPU operating systems adopt global queue task scheduling algorithms.

Multi-core CPU interrupt handling differs significantly from single-core CPUs. CPU cores need to communicate and coordinate through interrupts, so the local interrupt controller of CPU cores and the global interrupt controller that arbitrates interrupts among all CPU cores need to be encapsulated within the chip.

Additionally, multi-core CPU operating systems are multi-task systems. Since different tasks compete for shared resources, the system needs to provide synchronization and mutual exclusion mechanisms. The traditional solutions used for single-core CPUs cannot meet multi-core situations, requiring the use of hardware-provided “read-modify-write” primitive operations or other synchronization and mutual exclusion mechanisms to ensure this.

7. Low Power Design Technology

Every two to three years, the density of CPU transistors and power density doubles. Low power and thermal optimization design have become focal points in multi-core CPU design. Considerations must be made at multiple levels, including operating system level, algorithm level, structural level, and circuit level. The effects of implementations at each level differ, with higher abstraction levels leading to more significant reductions in power consumption and temperature.

8. Reliability and Security Design Technology

In today’s information society, the applications of CPUs are ubiquitous, raising higher requirements for CPU reliability and security. On one hand, the complexity of multi-core CPUs increases, and low voltage, high frequency, and high temperature pose challenges to maintaining safe chip operation. On the other hand, external malicious attacks are increasing in number and sophistication, making high reliability and security design technology increasingly important.

5. How Do Multi-Core CPUs Work?

To understand how multi-core CPUs work, we need to analyze the application programs, operating systems, and CPU cores together. Windows operating system, as the task scheduler, allocates hardware resources — CPU cores — for application programs based on processes (Process) and threads (Thread). One process corresponds to one application program, but one application program can correspond to multiple processes, completing the execution of this program through multiple processes.

When an application program is not executing, it is “static”; once the user starts executing the program, it is taken over by the operating system and becomes “dynamic.” The operating system manages a batch of programs started by users through one process at a time. Therefore, a process can be seen as an “executing program” that includes the basic resources allocated by the operating system for that program.

A process is further subdivided into multiple threads, and only threads can obtain CPU core usage permissions through the operating system to run. A process containing only one thread can be called a single-threaded program, while a process containing multiple threads can be called a multi-threaded program.

To obtain CPU time, a program’s threads must enter the operating system’s thread queue and wait for scheduling, after which they are allocated execution time on a specific CPU core. The allocation of CPU cores by the operating system is a very complex process, and it is impossible to detail the specific procedures in a short text. Below, we illustrate how the threads of a program process are dispatched to execute on CPU cores in the cases of single-core CPUs and 4-core CPUs [7].

If the CPU is single-core and does not adopt hyper-threading technology, the thread queue has only one thread, and the selection of threads is limited to one. If hyper-threading technology is adopted, the single core expands to two logical cores, doubling the thread queue and selection. As shown in the diagram.

Figure 12. Illustration of Thread Scheduling Execution of Application Programs on Single-Core CPUs (No Hyper-Threading)

Figure 13. Illustration of Thread Scheduling Execution of Application Programs on Single-Core CPUs (With Hyper-Threading)

If the CPU is 4-core and does not adopt hyper-threading technology, the thread queue has only four threads, and the selection of threads is limited to four. If hyper-threading technology is adopted, the four cores expand to eight logical cores, doubling the thread queue and selection. As shown in the diagram.

Figure 14. Illustration of Thread Scheduling Execution of Application Programs on 4-Core CPUs (No Hyper-Threading)

Figure 15. Illustration of Thread Scheduling Execution of Application Programs on 4-Core CPUs (With Hyper-Threading)

From the perspective of multi-core CPUs, each CPU core continuously receives software threads to execute from the operating system, following program instructions to complete specified tasks, which may require the use of memory, arithmetic units, input and output components, and communication with other CPU cores to transfer data. After completing the tasks, they must report back. These processes can be seen as a series of events that must be coordinated through the event interrupt handling components. The hardware scheduling processing modes of multi-core CPUs can be roughly divided into three types [8][18].

1. Symmetric Multi-Processing (SMP) is the most commonly used mode. In SMP mode, one operating system equally manages each CPU core and allocates workloads to each core. Currently, most operating systems support SMP mode, such as Linux, Windows, Vxworks, etc. Additionally, this mode is typically used in homogeneous multi-core CPUs as implementing SMP in heterogeneous multi-core CPUs is more complex due to their differing structures.

2. Asymmetric Multi-Processing (AMP) refers to multiple cores independently running different tasks, where each core may run different operating systems or bare-metal programs or different versions of operating systems, but there is a dominant CPU core that controls other subordinate CPU cores and the entire system. This mode is often the case with heterogeneous multi-core CPUs, such as MCU + DSP, MCU + FPGA, etc. Of course, homogeneous multi-core CPUs can also use this.

3. Bound Multi-Processing (BMP) is essentially the same as SMP, with the only difference being that developers can define that certain tasks are executed only on specific CPU cores.

The above is a simple theoretical introduction. To understand the hardware scheduling principles and implementation details of multi-core CPUs, one would likely need to delve into Intel or AMD’s internal workings to learn more technical details.

6. Perspectives on Multi-Core CPUs

Is it true that the more cores in a multi-core CPU, the better? Is it also true that the more CPU chips in a multi-CPU system, the better? Under the same conditions, does having hyper-threading provide an advantage over not having it? The answer is subjective and varies by individual. It is essential to distinguish the contexts in which these technologies are applied, as one cannot make blanket statements.

First, multi-core CPUs or multi-CPUs require synchronization and scheduling, which come at the cost of time overhead and computational power loss. If the increase in CPU core count or CPU chip count enhances system processing capabilities, the time overhead and computational power loss from synchronization and scheduling will detract from this. If the gains outweigh the losses and the cost increase is acceptable, then the solution is feasible; otherwise, it is not worth pursuing. Evaluating system solutions should consider not just the number of CPU cores but also the differences in operating systems, scheduling algorithms, application characteristics, and driver programs, as these factors collectively influence system processing speeds. Below are some discussion points from various articles.

1. More CPU cores do not necessarily translate to faster execution speeds. Here, “not necessarily” means that a thread may have to wait for other threads or processes to finish before it can continue executing. While waiting for other threads or processes, even if it is its turn in the queue, it may have to give up its execution rights and continue waiting, allowing subsequent threads to execute on the CPU. For that thread’s program, it may have slowed down, but from the system’s perspective, it at least made room for other threads to continue running. Multi-core CPUs can undoubtedly accelerate the execution of batch processes, but for a specific process or certain types of programs, it may not be the fastest.

2. Smartphones must provide users with an excellent experience, and this is not solely dependent on CPU performance. In addition to the number of CPU cores, factors such as the performance of the baseband chip that determines communication quality, GPU performance, and the performance of gaming and VR applications also play a role. Comprehensive system performance is what truly matters.

3. MediaTek launched 10-core, triple-architecture smartphone SoC technology in 2015, followed by the development of 10-core, quad-architecture Helio X30, utilizing multi-architecture approaches to reduce power consumption. While MediaTek’s technological advantage in multi-core SoCs is unquestionable, Qualcomm released the Snapdragon 820 chip with only four cores by the end of 2015, and Apple smartphones used SoC chips with only two cores earlier. These examples illustrate that the significance of multi-core CPUs or SoCs for smartphones cannot be definitively stated; a system-level analysis is necessary to draw accurate conclusions.

Conclusion: Multi-core CPUs and SoCs are designed to meet the increasing demands for processing power and speed from entire systems. When single-core CPUs advance along Moore’s Law but face limitations due to chip power constraints, the choice of multi-core CPUs becomes a breakthrough route. Multi-core CPUs drive the updates and upgrades of operating systems, which in turn determine the performance of multi-core CPUs. The challenges of multi-core CPU technology lie in information transfer, data synchronization, and task scheduling among cores. The performance of a system cannot be evaluated solely based on the number of CPU cores; factors such as operating systems, scheduling algorithms, applications, and driver programs must also be considered. Multi-core CPU technology and 3D chip technologies like FinFET can be seen as two key technologies extending the life of Moore’s Law.

References:

1. Popular Science China, Multi-Core Processors, Baidu Baike: https://baike.baidu.com

2. Anonymous, Learning ARM through Diagrams, KanZhun: https://www.kanzhun.com

3. Cortex-A8, Sogou Encyclopedia: https://baike.sogou.com/v54973111.htm

4. IT168 Longxing Tianxia, Flowing Golden Years: A Review of Intel Desktop Processor History, Sina: http://tech.sina.com.cn/n/2006-08-02/133156397.shtml, 2006.8.2

5. Anonymous, Global First 6-Core i7-980X Launch Detailed Test, Fast Technology: https://news.mydrivers.com/1/158/158379_1.htm, 2010.3.11

6. Anonymous, From ARM7, ARM9 to Cortex-A7, A8, A9, A12, A15 to Cortex-A53, A57, Electronic Products World: http://www.eepw.com.cn/article/215182_5.htm, 2014.1.6

7. Ada_today, How do Multi-Core and Single-Core CPUs Work? CSDN Blog: https://blog.csdn.net/u014414429/article/details/24875421/, 2014.5.2

8. zamely, Basics of Multi-Core Processors SMP & AMP & BMP, CN Blog: https://www.cnblogs.com/zamely/p/4334979.html, 2015.3.14

9. Weight V4216, Development History of Multi-Core Processors, Zhihu: https://zhidao.baidu.com/question/435213422142183884.html, 2016.5.14

10. Anonymous, Source from TechNews, Multi-Core Processor Development Faces Bottlenecks, MediaTek is Under Siege, Electronic Products World: http://www.eepw.com.cn/article/201607/294583.htm, 2016.7.27

11. Qishilu, 8 Charts to Quickly View New Xeon: Skylake Architecture, Appearance, and Model, Sohu: https://www.sohu.com/a/156424490_281404, 2017.7.12

12. Shangfangwen Q, Intel Xeon W-3175X Power Consumption Test: 28 Cores, Overclocking Exceeds 500W, Fast Technology: http://viewpoint.mydrivers.com/1/613/613590.htm, 2019.1.31

13. Driver Home, Asetek Releases 500W All-in-One Water Cooler: Exclusive for 28-Core W-3175X, Baidu: https://baijiahao.baidu.com/s?id=1624176225325117756&wfr=spider&for=pc, 2019.1.31

14. Old Wolf, What is the Difference Between Multi-Core CPUs and Multiple CPUs? Zhihu: https://www.zhihu.com/question/20998226, 2019.6.9

15. Driver Home, VIA x86 AI Processor Architecture and Performance Announced: Comparable to Intel’s 32-Core, Fast Point Report: https://kuaibao.qq.com/s/20191212A009EG00, 2019.12.12

16. allway2, Development of Multi-Core Processors, CSDN Blog: https://blog.csdn.net/allway2/article/details/103614463, 2019.12.19

17. Fast Technology, How Far Can Multi-Core Processors Go? By 2050, 1024-Core CPUs May Be Used, Baidu: https://baijiahao.baidu.com/s?id=1658158800729500154&wfr=spider&for=pc, 2020.2.10

18. tccxy_, Classification and Operating Methods of Multi-Core Processors, CSDN Blog: https://blog.csdn.net/juewukun4112/article/details/105537832, 2020.4.15

19. Anandtech, 16 Billion Transistors! Crushing Intel! Detailed Analysis and Review of Apple’s First Mac Processor! WeChat Official Account [EETOP], 2020.11.11

20. Tian Mengjie, What is Hyper-Threading Technology? Zhongguancun Online: https://m.zol.com.cn/article/2737340.html, 2012.2.17

21. Shangfangwen Q, The Strongest Domestic CPU is Here! 128 Cores, 16 Channels of DDR5, WeChat Official Account [Hardware World], 2020.12.29

Editor: Summer Solstice

Understanding Multi-Core CPUs and SoC Chips and Their Working Principles

Related posts

Leave a Comment Cancel reply