How Does The Highest Performance RISC-V Processor Compare To Arm?

How Does The Highest Performance RISC-V Processor Compare To Arm?

Author: Huang Yefeng

Original from EET Electronic Engineering Magazine

Processors based on the RISC-V architecture are increasingly appearing in everyday electronic products, becoming more commonplace: not only have some typical MCU manufacturers begun to embrace RISC-V, such as the previously interviewed Tsinghua Unigroup, but also companies like GigaDevice have launched RISC-V product lines;

Moreover, this is reflected in some mature products, such as at last year’s China IC Leaders Summit, where we engaged in dialogue with Silicon Valley’s analog and digital industry, learning that a very mature TCON chip has quietly incorporated RISC-V small cores; companies like Western Digital have also started to adopt RISC-V cores across their entire product line.

How Does The Highest Performance RISC-V Processor Compare To Arm?

This has made us increasingly interested in the microarchitecture of processors based on the RISC-V instruction set: In October of last year, SiFive released the first RISC-V out-of-order CPU core, the U8 series processor IP. SiFive mentioned in its promotion that the U8 series is currently the highest performance RISC-V instruction set core IP (seemingly later than the XuanTie 910), “based on superscalar out-of-order pipelines, with configurable pipeline depth and issue width.”

SiFive is quite active in RISC-V microarchitecture IP licensing, and its 7 and 8 series IP cores serve as a significant window to observe the current development of RISC-V, although microarchitecture analysis may not particularly well present the characteristics of the instruction set itself, but comparing them with Arm provides an opportunity to understand the RISC-V ecosystem.

How Does The Highest Performance RISC-V Core Compare To Arm?

First, it is necessary to clarify the positioning of SiFive’s product lines for different series. In an online conference titled “Embedding Intelligence Everywhere with SiFive 7 Series Core IP,” SiFive summarized its product lines. The E series cores focus on 32-bit embedded usage scenarios; the 64-bit S cores are aimed at scenarios with greater computational needs; while the U series cores are positioned as the highest performance, targeting high-end computing.

If we cut by numbers, SiFive’s 2 series is the most efficient and smallest processor IP in RISC-V processors; the 3 and 5 series are more widely deployed in some multi-core configurations and scenarios requiring high real-time processing capabilities; the 7 and 8 series, as mentioned above, focus on high performance. The combination of numbers and letters forms different products, such as the E3 providing 32-bit performance for mid-range embedded applications; the S7 is clearly focused on performance; the U8 achieves scalable configuration with high-performance cores, achieving the highest performance per watt.

Before the U8 was released, SiFive’s high-performance products were primarily U5 and U7—both of which are still in-order architectures, targeting Arm’s low-end and microcontroller cores, and still lacking in handling more complex loads and demanding scenarios. The U8 series is a product that fills this gap. SiFive claimed that the U8 would greatly expand the survival capabilities of SiFive and RISC-V in terminal products’ ecosystems.

How Does The Highest Performance RISC-V Processor Compare To Arm?

Under the SiFive U8 series products, the main cores currently include U84 and U87. SiFive’s own data indicates that the performance of the U84 core is 3.1 times that of the U74, with an IPC improvement of 2.3 times (with a maximum frequency increase of 1.4 times). In the comparison above, under the same process, the performance of the U84 is 5.3 times that of the U54; if we factor in the process differences, the 7nm U84 core achieves 7.2 times the performance of the 28nm U54 core. This data indicates that the performance potential of RISC-V has still been under exploration in recent years, hence the significant performance and efficiency gap compared to lower-end products.

This is the performance of the U84 IP running on the FPGA platform under RISC-V, comparing SPECint2006 scores.

SiFive previously established the direct competitor of the U8 in press releases as the Arm Cortex-A72, mentioning that the SiFive U84 offers comparable performance to the Arm Cortex-A72 core. However, in terms of area efficiency and performance per watt, the U84 still has advantages; of course, we know that the A72 is already an architecture from two years ago.

Under the premise of the same 7nm process, each core of the U84 occupies an area of 0.28mm², and four cores plus a 2MB L2 cache form a cluster, with an area of 2.63mm². Previously, Huawei’s Kirin 980 Cortex-A55 small core, with each core paired with a 128KB L2 cache, had an area of 0.36mm²—knowing that the performance of A72 is more than double that of A55, it is evident that in terms of PPA, SiFive U84 performs well.

It is important to emphasize that the U8 series, as IP, has significant configurability and scalability when targeting specific chip products; the U84 exists as a standard IP, so the discussion here is based solely on SiFive’s standard IP, and there will still be differences in actual products.

7 Series and 8 Series Microarchitecture

Perhaps many people do not consider RISC-V, as an instruction set, to compete with Arm in the high-performance field, as RISC-V’s current primary market is not here—IOT products are less sensitive to fragmentation issues and do not have as high requirements for processor performance as mobile phones; in many cases, RISC-V tends to exist in the form of microcontrollers. However, in our view, this is still an important part of demonstrating the technical capabilities of the RISC-V camp.

Documentation available on RISC-V is certainly far less abundant than that of the Arm world; the commercially available RISC-V processors—whether in MCUs or as controllers in certain hardware—do not provide many public technical details, and the level of detail is naturally not comparable to the widely available Arm. Understanding the microarchitecture of SiFive’s 7 and 8 series processors from limited information is still valuable for understanding the RISC-V ecosystem.

How Does The Highest Performance RISC-V Processor Compare To Arm?

As previously mentioned, the U8 series is SiFive’s first out-of-order core, with the U84 pipeline depth of 12 stages and three execution units in the back end—still a fairly traditional out-of-order execution design. The register file design here is quite distinctive.

How Does The Highest Performance RISC-V Processor Compare To Arm?

From the front end, the core’s instruction fetch unit (Fetch Queue) can fetch 16 bytes from L1 per cycle, placing them into the instruction fetch queue. The RISC-V ISA has variable instruction encoding lengths; assuming an average of 32 bits, this corresponds to four instructions per cycle. Therefore, the U8’s decoder is also a 4-wide design, passing them to the instruction queue.

Further along, the instruction queue can issue three instructions to the renaming stage at once, which is narrower than the previous fetch width. Foreign media AnandTech previously commented on this part, noting that the fetch stage’s width being greater than that of the issue design aims to keep the front end in sync with the back end in the event of branch prediction errors; however, this design, where the decode width is greater than the issue, has never been seen before. It is speculated that this may be a kind of architectural balancing strategy or a preparation for future wider issuing designs in the U8 series IP. According to SiFive’s official introduction, the number of issue queues should be configurable and extendable by chip designers.

How Does The Highest Performance RISC-V Processor Compare To Arm?

The renaming stage’s design is quite conventional, including a reorder buffer and three dispatch engines. Then it enters the execution back end.

How Does The Highest Performance RISC-V Processor Compare To Arm?

In the back end, SiFive only provided information on the integer execution units, which overall include three execution pipelines. Each has its own issue queue, filling into three ALU pipelines. One pipeline is a conventional ALU, one shares a port with the branch unit, and the third is more complex, capable of performing integer multiplication and division operations.

A core like the U84 does not yet support SIMD or vector instructions, seemingly because the extended instructions are not fully ready. SiFive explained that this part will be ready by the end of this year; possibly the U87 will have this capability—currently, we have not seen updates on this aspect from SiFive’s official website.

Scalability

From the higher levels of constituting SoC/MCU, SiFive adopts a heterogeneous design called “Mix+Match” with big and small cores sharing an L2 cache, configurable to a maximum of nine cores. The combination of cores can be U8, U7, S2, and other combinations of different cores.

How Does The Highest Performance RISC-V Processor Compare To Arm?

Source: WikiChip

Referring to the earlier 7 series, which adopted an 8+1 design, it appears similar to this design, but there may be some differences in the combination approach. The diagram made by WikiChip more clearly expresses this structure—comprising cores, cache, etc., forming a cluster. There is also support for custom instructions (Custom Instruction Extensions), allowing each core to implement support for specific instructions, accelerating certain workloads—this is also a capability of many RISC-V instruction microarchitectures today.

The introduction of the 7 series mentioned that through TileLink, up to 64 such clusters could be placed on a single chip. The U8 series core IP introduction only mentioned using TileLink to connect third-party accelerator IP for core-to-core communication or ChipLink for chip-to-chip communication. There are few details regarding the storage subsystem, but SiFive mentioned providing high-bandwidth storage interface IP, which is helpful for demands like HBM2E+, although this work seems to be ongoing.

The cores of the 7 series have an optional FIO port (Fast I/O), directly connected to the core, serving as a low-latency interface between the core and large SRAM or third-party accelerators—this FIO port is also connected to the main core complex bus, allowing other cores to access SRAM or third-party accelerators. It is unclear how this FIO port relates to SiFive’s TileLink technology.

How Does The Highest Performance RISC-V Processor Compare To Arm?

Regarding scalability, the U8 series has several features: (1) Support for different process nodes; (2) Configurable out-of-order design; as mentioned earlier, the U84 standard core has a 12-stage pipeline and three issue ports, but the pipeline depth, issue queue count, etc., can be specifically adjusted for different applications; (3) Cross-issue capability from integer units to floating-point units; (4) “Composable cache” for real-time operation support.

As previously mentioned, the SiFive 7 and 8 series may not be fully representative in today’s RISC-V ecosystem, but as SiFive stated, these high-performance core IPs are quite valuable for expanding the boundaries of the RISC-V ecosystem. Even now, the Arm ecosystem still holds a significant advantage in performance and technology in the mid-to-high-end market.

Additionally, as an important component of the IoT and embedded fields, RISC-V is increasingly occupying a place in many MCU products today. For instance, GigaDevice’s launch of the world’s first RISC-V universal MCU (Bumblebee core) is an important part of building the RISC-V ecosystem.

Reference Sources:

[1] Incredibly Scalable High-Performance RISC-V Core IP – SiFive

https://www.sifive.com/blog/incredibly-scalable-high-performance-risc-v-core-ip

[2] SiFive’s Approach to Embedding Intelligence Everywhere – SiFive

https://www.sifive.com/blog/sifives-approach-to-embedding-intelligence-everywhere

[3] SiFive Announces First RISC-V OoO CPU Core: The U8-Series Processor IP – AnandTech

https://www.anandtech.com/show/15036/sifive-announces-first-riscv-ooo-cpu-core-the-u8series-processor-ip/3

[4] SiFive Launches 7 Series, Their Highest Performance RISC-V Cores – Wichichip

https://fuse.wikichip.org/news/1775/sifive-launches-7-series-their-highest-performance-risc-v-cores/

[5] SiFive’s Approach to Embedding Intelligence Everywhere – SemiWiki

https://semiwiki.com/ip/sifive/285092-sifives-approach-to-embedding-intelligence-everywhere/

[6] RISC-V grows globally as an alternative to Arm and its license fees – VentureBeat

https://venturebeat.com/2019/12/11/risc-v-grows-globally-as-an-alternative-to-arm-and-its-license-fees/

Click to read the original text, and apply for free ↓↓

How Does The Highest Performance RISC-V Processor Compare To Arm?

Leave a Comment