Understanding Automotive System-on-Chip (SoC): ARM’s Business Model and CPU Microarchitecture Overview

The core components of automotive system-on-chip (SoC) include CPU, GPU, AI accelerators, and on-chip buses and interconnects. Currently, the CPU is mainly based on ARM architecture, x86 architecture, and RISC-V architecture. ARM architecture occupies the vast majority of the market.

Understanding Automotive System-on-Chip (SoC): ARM's Business Model and CPU Microarchitecture Overview

Image source: Internet

This is ARM’s business model during its startup phase in 2013, which has not changed significantly to this day, except the update speed has greatly accelerated, essentially one new architecture per year. ARM’s semiconductor collaborators are mainly foundries, and they need to support each other. Currently, only TSMC and Samsung can keep up with ARM’s pace.

ARM licensing is divided into four levels.

Understanding Automotive System-on-Chip (SoC): ARM's Business Model and CPU Microarchitecture Overview

Image source: Internet

The licensing fees increase sequentially from 1 to 4, as does the depth of customization, and the development difficulty also increases. For out-of-the-box solutions, choose POP IP, which can essentially be directly used by foundries to fabricate chips, as ARM has optimized it in all aspects with no room for modification. For differentiated performance and experience that the reference architecture cannot provide, architecture licensing is the only option. The instruction set is embedded in the architecture through physical hardware transistor registers; it is not like software, but more like hardware. Therefore, to obtain architecture licensing, one must also obtain instruction set licensing; the two cannot be separated. However, if it is not architecture licensing, instruction set licensing is not required.

Comparison of Three Types of IP

Understanding Automotive System-on-Chip (SoC): ARM's Business Model and CPU Microarchitecture Overview

Image source: Internet

Level 2 Artisan Physical IP is sometimes abbreviated as IP. After manufacturers obtain ARM’s processor core licensing (Verilog-HDL format CPU design source code, which is the soft core part above), they enter the hard core design phase with their peripherals and memory in conjunction with EDA tools. The content of IP Core licensing includes not only processors but also ARM’s reference peripherals, memory controllers, etc., giving manufacturers some flexibility.

BoC is a licensing method specially created for Qualcomm in 2016, also known as Cortex licensing.

Understanding Automotive System-on-Chip (SoC): ARM's Business Model and CPU Microarchitecture Overview

Image source: Internet

The image above illustrates the difference between BoC and architecture licensing, which is said to be used only by Qualcomm. Architecture licensing is the highest level, the most difficult, the most expensive, and the most flexible. Apple excels in architecture licensing, making the most significant modifications, especially in the frontend and instruction set window, which have undergone substantial changes. Other manufacturers have also made modifications, but the extent is minimal, particularly in the frontend where there are almost no changes.

Understanding Automotive System-on-Chip (SoC): ARM's Business Model and CPU Microarchitecture Overview

Image source: Internet

Microarchitecture refers to architectures like A57 and A78. Microarchitecture consists of three components: frontend, execution engine, and storage system. The critical parameters include decode width, cache size, ALU count, dispatch ports, and instruction cache.

This section only discusses decode width. The most critical parameter affecting CPU computing power is Decode Width, which can be simply equated to the number of instructions per cycle (IPC), i.e., how many instructions are completed per cycle.

Understanding Automotive System-on-Chip (SoC): ARM's Business Model and CPU Microarchitecture Overview

Data source: Internet

Increasing decode width is very challenging; it is not a matter of how much one wants to increase it. Simply put, every increase in width raises system complexity by about 15%, and the bare die area, i.e., cost, increases by about 15-20%. If decode width is simply increased, costs will also rise, and manufacturers will lack motivation for updates. Therefore, ARM’s approach is to cooperate with TSMC and Samsung’s advanced processes, using increased transistor density to reduce die area and costs. As a result, each upgrade in decode width requires the cooperation of advanced manufacturing processes; otherwise, the cost increase is substantial. At the same time, ARM also considers commercial aspects, making small upgrades once a year, providing room for improvement each year. An 8-bit width is currently the limit; Apple has fully implemented an 8-bit width, but the downside is that it must use TSMC’s most advanced manufacturing process, yet Apple still uses ARM’s instruction set.

Additionally, there are differences between RISC and CISC; increasing width is more challenging for CISC, but one bit of CISC width can roughly equal 1.2-1.5 bits of RISC. Intel has the capability to suppress Apple, but its manufacturing process is not as good as TSMC’s. CISC instruction lengths are variable, while RISC lengths are fixed. Because the length is fixed, it can be split into eight parallel instructions entering eight decoders, but CISC cannot; it does not know the length of the instruction. Thus, the branch predictor for CISC is much more complex than for RISC. Of course, RISC also has variable-length instructions now. When encountering long instructions, CISC can complete them in one go, while RISC, due to fixed lengths, must stop at a bus stop, which is certainly not as fast as CISC. In other words, RISC must optimize with the instruction set and operating system; RISC is hardware designed with software at its core, while CISC is the opposite, hardware-centric, developed for all types of software.

To fully understand microarchitecture, we first take a simplified model, as shown below.

Typical CPU Simplified Architecture

Understanding Automotive System-on-Chip (SoC): ARM's Business Model and CPU Microarchitecture Overview

Data source: Internet

Program control: Controls the order of instruction execution in the program.

Operation control: Generates control signals required for instruction execution to control the operation of execution components.

Timing control: Controls the start and duration of each operation control signal.

Data processing: Performs calculations on data and transmits it between relevant components.

Interrupt handling: Responds promptly to internal exceptions and external interrupt requests.

Understanding Automotive System-on-Chip (SoC): ARM's Business Model and CPU Microarchitecture Overview

Simple instruction flow, data source: Internet

The execution of an instruction typically includes the following four steps: Fetch: The controller in the CPU queries the address of the next instruction and initiates a request to memory; memory returns the instruction corresponding to that address to the controller and stores it in the current instruction position, at which point the controller updates the address of the next instruction. Decode: The controller parses the instruction, converting it into a series of control signals (including where to fetch operands, the corresponding operators, where to store the results, etc.). Execute: According to the control signals from the decode step, the instruction is executed by reading operands from memory storage units or general-purpose registers within the CPU, and then performing calculations based on the operators specified by the instruction. Write-back: According to the instruction requirements, the results from the arithmetic unit are stored in a general-purpose register.

Register: Registers are small storage areas inside the CPU used to temporarily hold data involved in calculations and the results of those calculations. In fact, registers are a type of commonly used sequential logic circuit, but this sequential logic circuit only contains storage circuits. The storage circuit of a register is composed of latches or flip-flops, as a latch or flip-flop can store one bit of binary data; thus, N latches or flip-flops can form an N-bit register. Registers are components within the central processing unit. Registers are high-speed storage components with limited storage capacity, used to temporarily hold instructions, data, and addresses. In computing, registers are internal components of the CPU, including general-purpose registers, special-purpose registers, and control registers. Registers have very high read/write speeds, so data transfer between registers is very fast.

Typically, chip design only goes up to the register level, and the rest can be automatically generated by EDA tools.

Understanding Automotive System-on-Chip (SoC): ARM's Business Model and CPU Microarchitecture Overview

Image source: Internet

IR (Instruction Register): Stores the current instruction’s location

PC (Program Counter): Stores the address of the next instruction, which can auto-increment to update the next address, serving as a memo for instructions

MAR (Memory Address Register): Stores the address of the currently accessed memory unit

MDR (Memory Data Register): Stores the content of the currently accessed (read/write) memory unit

Understanding Automotive System-on-Chip (SoC): ARM's Business Model and CPU Microarchitecture Overview

Image source: Internet

These control signals synchronize with the continuous clock pulses to control the actions of various control components within the CPU

Understanding Automotive System-on-Chip (SoC): ARM's Business Model and CPU Microarchitecture Overview

Image source: Internet

ALU (Arithmetic Logic Unit) is used to perform arithmetic and logic operations, with two operands Y and X entering the ALU through ports A and B, respectively. After computation, the result is sent to output port Z; where X, Y, and Z are registers used by the ALU to temporarily hold data, and F is used to store the status of the computation result, such as whether a carry has occurred, whether there is an overflow, etc. The data required by the ALU comes from memory; however, if data is fetched from memory for every computation, the efficiency would be too low. Therefore, frequently used numbers need to be pre-fetched from memory, and correspondingly, temporary devices for storing these numbers must exist in the arithmetic unit, which are called general-purpose registers.

Arithmetic operations include addition, subtraction, multiplication, division, square root, square, and reciprocal; logic operations include AND, OR, NOT, and XOR. Arithmetic operations can be decomposed into numerous additions; more precisely, all calculations can be reduced to addition. Logic operations are similar to function operations. Computers generally use binary, where addition is performed by an ‘XOR’ gate, and carry is handled by an ‘AND’ gate. Thus, using gate circuits, an adder can be designed. This design considers only carry from the previous step and does not account for carry from the previous number, hence we call this device a half-adder. If we consider the carry from the previous step, we only need to add another half-adder, which is a full adder.

Understanding Automotive System-on-Chip (SoC): ARM's Business Model and CPU Microarchitecture Overview

Image source: Internet

Once a one-bit calculator is made, an 8-bit calculator can simply be constructed by piecing together full adders, and again abstracting the whole, we call it an 8-bit adder.
A one-bit full adder + function generator can form a one-bit fully functional adder that can perform not only arithmetic operations but also logic operations.
The next issue will introduce the CPU’s cache system, instruction-level parallelism, superscalar, and super pipelining.

Related Reading

Understanding Automotive System-on-Chip (SoC) Series Part 2: Automotive Chip Industry and Supply Chain

Understanding Automotive System-on-Chip (SoC) Series Part 1: Overview of Automotive System-on-Chip and AEC-Q100 Automotive Standards

More Zosi Reports

Contact for Report Orders and Cooperation Consultation:

Mr. Zhao: 18702148304 (same WeChat)

Mr. Fu: 15810027571 (same WeChat)

Mr. Zuo: 18600021096 (same WeChat)

Zosi 2022 Research Report Writing Plan

Smart Connected Vehicle Industry Chain Overview (January 2022 Edition)

OEM Autonomous Driving Automotive Vision (Upper) High-Precision Maps
Commercial Vehicle Autonomous Driving Automotive Vision (Lower) High-Precision Positioning
Low-Speed Autonomous Driving Automotive Simulation (Upper) OEM Information Security
ADAS and Autonomous Driving Tier 1 Automotive Simulation (Lower) Automotive Gateway
Automotive and Domain Controllers Millimeter-Wave Radar APA and AVP
Domain Controller Ranking Analysis Vehicle Laser Radar Driver Monitoring
Laser and Millimeter-Wave Radar Ranking Vehicle Ultrasonic Radar Infrared Night Vision
E/E Architecture Radar Disassembly In-Vehicle Voice
Automotive Car-Sharing Charging Infrastructure Human-Machine Interaction
Shared Mobility and Autonomous Driving Automotive Motor Controllers L4 Autonomous Driving
EV Thermal Management System Hybrid Power Report L2 Autonomous Driving
Automotive Power Electronics Automotive PCB Research Fuel Cells
Wireless Communication Modules Automotive IGBT Automotive OS Research
Automotive 5G Automotive Wiring Harness Steer-by-Wire Chassis
Joint Venture Brand Vehicle Networking V2X and Vehicle-Road Collaboration Steering System
Independent Brand Vehicle Networking Roadside Intelligent Perception Modular Reports
Independent Brand ADAS Research Commercial Vehicle Networking Commercial Vehicle ADAS
Automotive Multimodal Interaction Automotive Intelligent Cockpit In-Vehicle Displays
Tier 1 Intelligent Cockpit (Upper) Cockpit Multi-Screen and Linked Screens Smart Rearview Mirrors
Tier 1 Intelligent Cockpit (Lower) Intelligent Cockpit Design Automotive Lighting
Cockpit SoC Automotive VCU Research Automotive Seats
Automotive Digital Key TSP Manufacturers and Products HUD Industry Research
Automotive Cloud Service Platform

OTA Research

Automotive MCU Research
AUTOSAR Research Smart Parking Research Sensor Chips
Software-Defined Vehicles Waymo Smart Connected Layout ADAS/AD Main Control Chip
T-Box Market Research Autonomous Driving Regulations ADAS Data Annual Report
T-Box Ranking Analysis Smart Connected and Autonomous Driving Base Automotive Magnesium Alloy Die Casting
Passenger Vehicle Camera Quarterly Report Smart Vehicle Personalization Flying Cars
Special Vehicle Autonomous Driving Agricultural Machinery Autonomous Driving Mining Autonomous Driving
Port Autonomous Driving Autonomous Heavy Trucks

Unmanned Shuttle

Instrument and Central Control Display In-Vehicle DMS Joint Venture Brand ADAS
Surround View Market Research (Local Version) 800V High-Voltage Platform Automotive Wireless Modules
Surround View Market Research (Joint Venture Version)

Zosi Research Monthly Report

ADAS/Smart Vehicle Monthly Report | Automotive Cockpit Electronics Monthly Report | Automotive Vision and Radar Monthly Report | Battery, Motor, and Control Monthly Report | In-Vehicle Information System Monthly Report | Passenger Vehicle ACC Data Monthly Report | Front View Data Monthly Report | HUD Monthly Report | AEB Monthly Report | APA Data Monthly Report | LKS Data Monthly Report | Front Radar Data Monthly Report

Leave a Comment

×