Summary of RISC-V Microarchitecture (1) – Introduction to Pipeline (Differences with ARM)

Summary of RISC-V Microarchitecture (1) - Introduction to Pipeline (Differences with ARM)A brief block diagram of a certain RISC-V Level 1

The storage architecture of RISC-V (from inside to outside):

1)Reg

2)TCM (Tightly Coupled Memory, divided into I/D) / LM (Local Memory, divided into I/D)

3)TLB lookup (Translation Lookaside Buffer, a fast cache of recent address mappings)

– TLB hit: avoids Page Table Lookup

– TLB miss: perform Page Table Lookup, cache result for later

4)MMU (Memory Management Unit, includes Page Table Lookup)

5)L1 cache (I-cache, D-cache)

6)L2 cache (Way Prediction Table, WPT)

7)DDR

Summary of RISC-V Microarchitecture (1) - Introduction to Pipeline (Differences with ARM)Traditional pipeline

Predict: Prediction (not detailed yet)

Fetch: Request instruction fetch from memory

Decode: Instruction decode & register read

Execute: Execute operation or calculate address

Memory: Request memory read or write

Writeback: Write result (either from execute or memory) back to register

In practice, RISC-V may slightly adjust based on the traditional pipeline, such as adding instruction prefetching before the fetch stage or splitting a stage into multiple stages to increase frequency and achieve better performance.

Summary of RISC-V Microarchitecture (1) - Introduction to Pipeline (Differences with ARM)A certain RISC-V actual pipeline

Differences with ARM in pipeline:

There are significant differences in design philosophy and implementation flexibility. The following is a detailed comparison:

1. Typical pipeline stages and functions of RISC-V

As an open-source instruction set architecture (ISA), the pipeline design of RISC-V varies by implementation, but common stages include:

Basic pipeline (5-stage classic RISC)

1.1Fetch (IF, Instruction Fetch)

Read instruction from instruction memory and update the program counter (PC).

1.2Decode (ID, Instruction Decode)

Parse instruction, read from register file, identify operation type (e.g., arithmetic, branch, etc.).

RISC-V’s instruction format is regular, and the decoding complexity is low.

1.3Execute (EX, Execute)

Execute arithmetic logic operations (ALU), address calculation, or branch judgment.

1.4Memory Access (MEM)

Access data memory (Load/Store operations), or bypass (for non-memory instructions).

1.5Write Back (WB)

Write results back to the register file.

Advanced pipeline extensions

Multi-issue: Supports superscalar (e.g., dual-issue, out-of-order execution).

Branch prediction: Dynamic or static prediction to reduce pipeline stalls.

Pipeline depth: Can be up to 10+ stages (e.g., high-performance implementations) or simplified to 3 stages (embedded scenarios).

Characteristics

Modular design: Supports optional extensions (e.g., multiplication, division, vector instructions), and the pipeline can be adjusted accordingly.

No microcode: Directly executes instructions without a complex decoding layer.

2. ARM microarchitecture pipeline design

As a commercial architecture, ARM’s pipeline design is highly optimized and diverse, with typical representatives:

Classic pipeline (e.g., Cortex-A7/A53)

8-15 stage pipelines: Deeper pipelines to increase frequency (e.g., Cortex-A15).

Complex branch prediction: Multi-level predictors (e.g., TAGE, Loop Buffer).

Register renaming: Supports out-of-order execution (e.g., Cortex-A7x series).

Microcode support: Complex instructions (e.g., NEON/SVE) may be decomposed into micro-operations.

Modern ARM designs (e.g., Cortex-X/A7xx)

Superscalar + out-of-order execution: Typically 4-6 issue width.

Memory prefetch: Aggressive data prefetch logic.

Adaptive clock gating: Dynamically adjusts pipeline power consumption.

3. Key differences between RISC-V and ARM microarchitectures

Feature

RISC-V

ARM

Flexibility

Open-source, customizable pipeline (e.g., 3-stage/10+ stage)

Closed-source, fixed microarchitecture (e.g., Cortex-A78)

Instruction Decoding

Fixed-length instructions,

simple decoding

Supports Thumb-2 (variable-length instructions), more complex decoding

Microcode Layer

None, directly executes instructions

Complex instructions may use microcode (e.g., SVE2)

Branch Prediction

Implementation dependent (can be simple or complex)

Highly optimized predictors (hardware patented)

Memory Model

Loose, implementation dependent

Strict consistency (e.g., ARMv8 MMU specification)

Toolchain Ecosystem

Basic, community dependent

Mature (e.g., DS-5, Keil)

Typical Applications

From embedded to high-performance (custom)

Mobile/server (standardized IP cores)

4. Conclusion

RISC-V: The advantage lies in flexibility, suitable for customized scenarios (e.g., IoT, dedicated accelerators), with pipeline designs ranging from minimal to high performance.

ARM: The advantage lies in proven efficient designs (e.g., branch prediction, memory systems), suitable for commercial scenarios with strict performance and power requirements (e.g., smartphones).

Example comparison:

A RISC-V open-source implementation (e.g., BOOM) may adopt a 6-stage out-of-order pipeline, while a similarly performing ARM Cortex-A55 uses an 8-stage pipeline + out-of-order execution, but the latter may be more thoroughly optimized at the same process.

Leave a Comment