Innovations in RISC-V: Standing on the Shoulders of Giants

RISC-V is also known as “the Linux of CPUs.” For some, this title feels like a legacy and an innovation at the same time, especially if you are a staunch believer in open source. However, I am a true pragmatist, and the excessive marketing of RISC-V has made me lose interest in the term.

It wasn’t until I began to study RISC-V in detail that I realized that becoming the Linux of some microprocessors might be one of RISC-V’s least publicized advantages.

Innovations in RISC-V: Standing on the Shoulders of Giants

In the following article, I will delve deeper into the innovations of RISC-V. Standardizing open source is something many frameworks are already doing, so it’s not new; it’s the characteristics and openness based on its architecture design that highlight RISC-V’s innovative power.

A Scalable Instruction Set Architecture

Every CPU has an instruction set, which is a list of all the machine instructions that can be executed, such as adding two numbers, loading data from memory, storing it back in memory, or jumping to another location in the program.

Today, most instruction set architectures follow an incremental evolution path, including the most popular X86, MIPS, and ARM. This means that the creators of these architectures continuously add instructions over time. But they never remove anything, leading to a mountain of redundant and useless instructions accumulating. And that’s why X86 has over 1500 different instructions, and ARM-64 has over 1000.

In contrast, RISC-V was created to prevent redundant instructions from taking up resources, allowing precious silicon resources used to make chips to be utilized effectively.

How is this achieved? RISC-V has a small base instruction set consisting of only 47 instructions. Other instructions are offered as extensions. Each extension is a collection of related instructions.

For example, SIMD instructions are one extension. Floating-point related instructions are another. Even integer multiplication and division are in separate extensions.

From the very beginning, RISC-V established a system to manage these extensions, arranged from A to Z in 26 letters. There are special bits in the CPU that programs can check to see which extensions have been implemented. If a program forgets the management system and tries to run an instruction that a specific RISC-V architecture CPU does not support, it can still handle that instruction. Unsupported instructions may cause traps, similar to interrupts. The current position is saved for later return and jumps to a kernel subroutine. This allows RISC-V processors to implement every extension they do not support in software. Therefore, when certain extensions are no longer in use, old code can still run on new CPUs.

Why Can’t X86 and ARM Reuse in a Similar Way?

Theoretically, Intel and ARM could argue that many redundant instructions have been deprecated and need to be simulated in software. They could also say that new instructions are part of some optional extensions.

But the problem is: ISA is like a contract. The RISC-V Foundation does not manufacture any chips or products; they only create specifications.This specification acts as a contract between the RISC-V ecosystem, which includes software tool manufacturers, software developers, and hardware chip manufacturers.These parties agree to adhere to the RISC-V specifications. No one is holding a gun to their heads, nor threatening to fine them in court if they do not comply with this “contract.” So how do you ensure participants adhere to the contract?

Because of interests. If software developers and hardware developers know that they are correctly following the specifications, they know their products will work together. Hardware manufacturers do not want to produce chips that cannot run RISC-V code. Software developers also do not want to release code that cannot run on RISC-V processors.

Unlike ARM and X86, RISC-V has established a program to ensure compatibility between software and hardware for handling different extensions. The hardware provides a way to query whether extensions exist, and if illegal instructions are encountered, it may trigger a trap.

How Software Supports Multiple Extensions

Software developers ensure that software is added to simulate unsupported instructions when needed. Or, compiler writers can ensure that different code is generated for programs based on supported extensions. Specifically, this means generating subroutines that perform the same operation using different extension names. The compiler creates all these subroutines and ensures code checks for supported extensions before jumping to the best subroutine.

For example, suppose you have two arrays for vector addition, each containing three elements.

sum = [3, 4, 1] + [2, 1, 2]

The compiler can create two versions of this code. One uses the base instruction set with loops for repetitive addition. The other version can be based on RISC-V vector instructions, which do not require any loops. You just specify that each element is an integer and that each vector has three elements. Next, addition is performed between the vectors.

In both cases, you will get the same result, but using vector instruction extensions can significantly speed up the process.

Issues with X86 and ARM Extensions

The world is already filled with ARM and X86 code, but they do not check for extensions. Therefore, Intel cannot simply stop supporting certain instructions in future microchips, as existing software would not gracefully handle this. The process would cause it to crash.

What RISC-V can do would have to be done in a clumsy manner on X86 and ARM. Additionally, every platform must mindlessly support over 1000 instructions. This is why RISC-V has a promising future. Innovations in microarchitecture may render existing instructions obsolete and necessitate a new set of instructions, which will be easy for RISC-V. They have left enough room for new instructions. In contrast, Intel and ARM cannot easily do this.

Using the Minimalism of the Instruction Set as a Feature

The minimalism of the instruction set is often severely underestimated. People tend to nitpick RISC-V.

There is a comment that states:

In fact, all the features in RISC-V have existed since the 1980s; they are very old ideas. The launch of the RISC-V specification simply adopted old ideas: load/store architecture, a large number of registers, simple instruction encoding, separating target fields in instructions, and discarding some dross.

However, adhering to old ideas is not due to a lack of imagination or innovation failure among RISC-V designers. On the contrary, they deliberately want to use proven instructions and designs from existing architectures.

There are many reasons for this. For instance, many novel and overly “clever” choices in the past have become outdated in future microarchitecture innovations.

In contrast, RISC-V’s innovations perfectly embody the idea of taking the essence and discarding the dross to ensure the minimalism of the instruction set. Lessons learned from the past have made them aware that future support for both compressed instructions and 64-bit instructions is needed.

There are few variations in RISC-V instruction encoding,

which makes them very easy to decode.

Thus, the encoding of RISC-V instructions is very refined. The above image shows how they are encoded. Below, we try to explain its meaning and significance.

A RISC-V instruction is 32 bits long, which is common across all RISC-V architectures. MIPS, PowerPC, and ARM are the same. In contrast, X86 has variable instruction lengths from 8 to 120 bits.

A bit is just a number in a binary number. The first seven bits in a RISC-V instruction specify the instruction to be executed (the yellow area in the image). The opcode is the operation to be performed, such as addition, subtraction, multiplication, shifting, or jumping to another location in the program. In the red area, we have the target register (bits 7 to 11, totaling five bits). These five bits are sufficient to encode numbers from 0 to 31. Therefore, the target register can be one of 32 different registers.

Skipping the details, those who understand will understand. The key point is the embodiment of normativity. The various encoding variants are located in the same position, which makes it easy to create hardware that decodes RISC-V instructions.

Minimalism means fewer transistor sizes. The RISC-V manual is an excellent example:

As a specific example of the impact of simplicity, we compare the “Demo-level” RISC-V open-source project Rocket-chip processor with the ARM-32 Cortex-A5 processor using the same cache size (16 KiB) and the same technology (TSMC40GPLUS). For ARM-32, the RISC-V chip size is 0.27 mm2, while the precision is 0.53 mm2. The cost of the ARM-32 Cortex-A5 chip is about twice that of the RISC-V Rocket-chip chip (22). Even if the mold is 10% smaller, the cost can be reduced by 1.2 times (1.12).

Innovations in RISC-V: Standing on the Shoulders of Giants

Secondly, minimalism benefits performance because smaller and simpler chips are easier to increase clock frequency. A small company named Micro Magic has manufactured a RISC-V chip that can run at 0.07w of power.

In contrast, the Apple M1 chip runs at 10w of power. The Micro Magic chip can achieve clock frequencies up to 5 GHz.

The RISC-V-based BOOM project is even more notable. We can compare it with the ARM-32 Cortex-A9.

CoreMark is a benchmark used to measure CPU performance in embedded systems. When both processors run 100,000 iterations (repeats) on this benchmark, the ARM CPU completes it in 18.5 seconds, while the BOOM processor only takes 14.26 seconds.

This proves that a simpler implementation allows for higher clock frequencies. Ironically, even though the instruction set is simpler, the ARM CPU does not have an advantage in the number of instructions. The number of instructions required for the CoreMark test suite is reduced by 10% compared to the ARM version.

Using smaller chips also allows for the creation of many chips to handle highly parallel tasks. For example, Esperanto Technologies produced a system-on-chip with over 1000 RISC-V coprocessors for machine learning tasks.

Concise Design

At first glance, the term “concise” should not be used to describe RISC-V. But in reality, the compressed instruction set and 64 bits instructions were preset for scalability from the beginning. The basic instruction level was designed with the consideration that there would be a 64-bit extension, which previous designs did not account for, so 32-bit instructions had to be repeated for 64 bits. In contrast, most existing instructions on RISC-V only work on 64-bit registers rather than using 32-bit registers on a 64-bit RISC-V CPU. Therefore, the 64-bit extension on RISC-V is actually just adding special instructions to handle the 32-bit part of the 64-bit registers.

For example, ADDW and SUBW instructions are used to store 32-bit results in the target register. The normal ADD and SUB instructions add and subtract 64-bit numbers on a 64-bit CPU and 32-bit numbers on a 32-bit CPU.

This means that the 64-bit code on RISC-V looks almost the same as the 32-bit code.

The compressed instruction set is a similar extension instruction set that allows two instructions to fit within a 32-bit byte, while other frameworks can only squeeze them in a clumsy way. For instance, in ARM, the Thumb2 compressed instruction format is essentially a different ISA, rather than an extension on RISC-V. This means the CPU must switch modes internally and use different decoders. This adds complexity. In contrast, decoding compressed RISC-V instructions is very simple. Converting them to 32-bit instructions only requires 400 logic gates (AND, OR, NOR, NAND gates). This is just the tip of the iceberg.

The minimal RISC-V CPU implementing the basic instruction set only used 8000 logic gates.

Vector Instruction Set

Although this is not unique to RISC-V, the characteristics of the RISC-V instruction set can certainly demonstrate the latecomer advantage of empirical validation. While other CPUs continuously add one SIMD instruction after another, and each time deciding to extend the operation time of SIMD instructions, a new set of instructions is needed.

In contrast, with RISC-V, the CPU can inform the code at runtime what it supports and allows programmers to specify the length of the vectors themselves. This greatly simplifies vector code.

The vector instruction set is actually an old and more understandable technique, and compiler optimization has a good understanding of this. For example, Esperanto Technologies uses the vector instruction set in its RISC-V-based dedicated coprocessors to accelerate machine learning tasks. They claim performance is 30-50 times higher than competitors.

Conclusion

Let’s see if I can condense some of the content I just introduced regarding RISC-V innovations:

Non-incremental ISA. Previously added instructions will not forever inflate the ISA. Software developers, tool developers, and hardware manufacturers must ensure that the presence or absence of optional extensions is controllable.

Actively remove everything that is strictly unnecessary to keep complexity at a minimum. This means it is easy to implement RISC-V chips, and can be done with a few transistors, making them cheaper and easier to increase clock frequencies, etc.

Click to read the original text at the end:

What Is Innovative About RISC-V?

Source: Cool Silicon Microelectronics

Author: Erik Engheim

Translation: Cool Silicon PR Team

Proofreading: Cool Silicon Chip Engineering Department

Shanghai Cool Silicon Microelectronics Co., Ltd.

Innovations in RISC-V: Standing on the Shoulders of Giants

[Phone]+86 2161422387

[Fax]+86 2161807625

[Email][email protected]

[Address]6th Floor, Building 9, No. 308 Songhu Road, Yangpu District, Shanghai

END

Innovations in RISC-V: Standing on the Shoulders of Giants

Related posts

Leave a Comment Cancel reply