Understanding Multi-core Processors in ARM Architecture

18. Multi-core processors

So far, we have treated ARM processor cores as a single entity. However, most Cortex-A series processors can include up to four processing cores. Multi-core systems have the potential to provide higher performance because more processing units (cores) can be utilized. This allows multiple tasks to be executed simultaneously, potentially reducing the time required to complete a specified task.

Multi-processing can be defined as the simultaneous execution of two or more instruction sequences on a single device that contains two or more cores. The concept of multi-processing has been studied for decades and has seen widespread commercial application in the last 15 years. Today, multi-processing is widely adopted in systems used for general-purpose application processors and traditional definitions of embedded systems.

The overall energy consumption of multi-core systems can be significantly lower than that of systems based on a single processor core. Multiple cores can complete executions faster, allowing some parts of the system to potentially power down for longer periods. Alternatively, a system with multiple cores may be able to run at a lower frequency than a single processor requires to achieve the same throughput. Using lower power silicon processes or lower power voltages can reduce power consumption and energy use. Most existing systems do not allow independent frequency changes for the cores. However, each core can be dynamically clock-gated to achieve additional power and energy savings.

Multi-core systems also add flexibility and scalability to system design. A system with one or two cores can improve performance by adding additional cores without redesigning the entire system or making significant changes to the software. Having multiple cores also allows for more options in system configuration. For example, you might have a system using separate cores, one for handling hard real-time requirements, and another for applications that require high, continuous performance. These can be integrated into a single multi-processor system.

Multi-core devices may also be more responsive than single-core devices. When interrupts are distributed across multiple cores, there will be multiple cores available to respond to interrupts, and each core will have fewer interrupts to handle. Multi-core systems can also allow an important background process to run alongside an important but unrelated foreground process.

Multi-core systems can also achieve higher performance from high-latency memory systems (e.g., DDR memory) by allowing memory controllers to queue and optimize memory requests. Processors can reduce cache line fills and evictions when processing consistent data. Performance may suffer when data is not shared. L2 cache can improve the utilization of shared memory areas (including file caches), shared libraries, and kernel code. Additionally, performance can decline if the number of cores is increased without a corresponding increase in memory bandwidth.

In the past, many software applications were run in single-core environments. Some operating systems provided time-slicing support, giving the illusion of multiple processes or tasks running simultaneously. It is important to clearly understand the difference between multi-threading (e.g., POSIX threads or Java) and multi-processing. Multi-threaded applications can run on a single core, but it is only in multi-processing that threads can truly execute in parallel.

Moving multi-threaded software from a single-core system to a multi-core system may expose program errors that were not revealed when running on a single core, and it may cause very rare errors to trigger very frequently. However, it will not cause correctly written multi-threaded programs to behave abnormally; it will simply expose previously unnoticed bugs.

18.1 Multi-processing ARM systems

From the early history of the ARM architecture, ARM processors were likely to be used in systems that included other processors. This often means a heterogeneous system, which may include an ARM processor and a separate DSP (digital signal processing) processor. Such systems run different software on different cores, and each processor may have different permissions and views of memory. Many widely used ARM systems, such as TI’s OMAP series or Freescale’s i.MX series, are typical examples of such systems.We can distinguish between the following types of systems:

A processor with a single core, such as the Cortex-A8 processor.
A multi-core processor that contains multiple cores capable of independently executing instructions, which can be viewed as a single unit or cluster by system designers or operating systems that can abstract underlying resources.
Multiple clusters (as shown above), each containing multiple cores.

ARM was one of the first companies to introduce multi-core processors into the SoC (system-on-chip) market, launching the ARM11 MPCore processor as early as 2004. All processors described in this book (except for the Cortex-A8 processor) are examples of such multi-core systems. An ARM multi-core processor can contain 1 to 4 cores. Each core can be individually configured to participate (or not participate) in data cache consistency management schemes. The snoop control unit (SCU) inside the processor is responsible for automatically maintaining level 1 data cache consistency between the cores within a cluster without software intervention.

ARM multi-core processors include an integrated interrupt controller. Multiple external interrupt sources can be independently configured to target one or more individual processor cores. Additionally, each core can also issue or broadcast any interrupt to other cores or a group of cores in the system through software (software-triggered interrupts). These mechanisms enable the operating system to share and allocate interrupts across all cores and coordinate activities using low-overhead signaling mechanisms.

The Cortex-A MPCore processor also provides hardware mechanisms to accelerate operating system kernel operations, such as system-wide cache and TLB (Translation Lookaside Buffer) maintenance operations. (This feature is not present in the ARM11 MPCore.)

Each Cortex-A series multi-core processor has the following features:

Configurable from 1 to 4 cores (at design time).
Level 1 data cache consistency.
Integrated interrupt controller.
Local timers and watchdogs.
Optional accelerator consistency port (ACP).

The following diagram shows the structure of the Cortex-A9 MPCore processor, although this general description also applies to other multi-core processors.

18.2 Symmetric multi-processing

Symmetric multi-processing (SMP) is a software architecture that dynamically determines the roles of individual processors. Each core in the cluster has the same view of memory and shared hardware. Any application, process, or task can run on any core, and the operating system scheduler can dynamically migrate tasks between cores to achieve optimal system load.

We hope readers have a basic understanding of how operating systems work, but we will briefly review some operating system terminology here. Applications running under the operating system are called processes. Processes perform many operations by calling system libraries that provide certain functions, while also serving as wrappers for system call kernel operations. Each process has associated resources, including stacks, heaps, and constant data areas, as well as scheduling priority attributes. The kernel’s view of processes is called tasks.

To describe SMP operations, we will use the term

18. Multi-core processors

18.1 Multi-processing ARM systems

18.2 Symmetric multi-processing

Related posts

Leave a Comment Cancel reply