This article is one of the series in the “RISC-V CPU Design” column.
Note: This article is excerpted from the first domestic book that systematically introduces CPU and RISC-V design, titled “Hands-on Guide to CPU Design: RISC-V Processor Edition” by “Silicon Farmer Alexander” (expected to be released in March-April 2018).
The phrase “dynamic as a running rabbit, static as a virgin” emphasizes the importance of low-power mechanisms for processors. This chapter will provide an overview of low-power technologies for processors.
For processors, while we pay great attention to their clock frequency and performance, it is an undeniable fact that processors spend the vast majority of their time in standby or sleep mode. For example, the smartphones we use daily are mostly in a sleep state. Even during operation, most of the time is spent in scenarios with low performance requirements. A well-known example is the ARM big.LITTLE architecture, which uses more energy-efficient small cores for low-performance scenarios and only activates the higher power-consuming big cores at critical moments.
Overview of Low-Power Technologies for Processors
Low-power technologies for processors can be explored from multiple levels, including high-level software, system, and low-level hardware processes.
1. Low Power at the Software Level
The software programs running on the processor give it its soul. The flexibility at the software level is very high, and the effects of discovering low power at the software level are often more significant than the effects of hardware low power itself. In simple terms, the effort spent optimizing the underlying hardware to save power is far less effective than the power saved by allowing the software to enter sleep mode more frequently. To minimize power consumption, a good software program should reasonably utilize the processor’s hardware resources, for example:
-
Only invoke high-power-consuming hardware in critical scenarios, and use low-power-consuming hardware in general scenarios.
-
During idle times, the processor should enter low-power sleep mode as much as possible to save power. Since this article focuses on hardware design, we will not elaborate further on the mechanisms at the software level.
2. Low Power at the System Level
Low-power technologies at the system level can involve board-level hardware systems and SoC systems, with similar principles. Taking SoC systems as an example, common low-power technologies include:
-
SoC systems often divide different power domains, allowing most hardware in the SoC to be powered down.
-
SoC systems often divide different clock domains, allowing some circuits to operate at low speed and low power.
-
By combining different power and clock domains, various low-power modes can be defined. SoCs are equipped with PMUs (Power Management Units) to control entry into or exit from different low-power modes.
-
Software can use the PMU’s functions to enter and exit different low-power modes in various scenarios.
3. Low Power at the Processor Level
Common low-power technologies at the processor level include:
-
The processor instruction set defines a sleep instruction, which puts the processor core into sleep mode after execution.
For example, RISC-V defines the WFI instruction, which stands for Wait For Interrupt, specifically designed for sleep. When the processor executes the WFI instruction, it stops executing the current instruction stream and enters an idle state, referred to as the “sleep” state, until the processor receives an interrupt (the interrupt local switch must be enabled, controlled by the mie register), at which point the processor is awakened. After being awakened, if interrupts are globally enabled (controlled by the MIE field of the mstatus register), it enters the interrupt service routine; if interrupts are globally disabled, it continues executing the previously halted instruction stream.
-
The sleep state can be divided into shallow sleep and deep sleep.
Shallow sleep often turns off the entire clock of the processor core but still maintains power supply, thus saving dynamic power, although static leakage power still consumes energy;
Deep sleep often turns off not only the processor core’s clock but also the power supply, thus saving both dynamic and static power.
-
After the processor core is powered down in deep sleep, its internal context state can be saved and restored using two strategies.
Strategy one uses retention-capable registers or SRAM within the processor core to save the processor state; retention cells or SRAM can save the processor’s state with very low leakage consumption;
Strategy two uses a software Save-and-Restore mechanism, which saves the processor’s context state in the SoC’s always-on power domain before power down, and upon waking up and restoring power, the software reads it back from the external always-on domain for restoration.
Strategy one has the advantage of very fast sleep and wake-up speeds, but the complexity of ASIC design is high; strategy two is very simple to implement but has relatively slower sleep and wake-up speeds.
-
In processor architecture, heterogeneous methods can be used to save power.
A well-known example is the ARM big.LITTLE architecture, which uses more energy-efficient small cores for low-performance scenarios and only activates the higher power-consuming big cores at critical moments, thus saving dynamic power.
4. Low Power at the Unit Level
Low-power technologies at the module and unit level have entered the realm of IC design microarchitecture, with common technologies being essentially the same as those at the SoC system level, just on a smaller scale:
-
A fully functional unit often requires a separate independent Clock Gate, which can be used to turn off its clock when the module or unit is idle to save dynamic power.
-
Some relatively independent and larger modules can even be assigned independent power domains to support power shutdown, further saving static power.
5. Low Power at the Register Level
Low-power technologies at the register level have entered the realm of IC design coding styles, and power consumption can be reduced in the following ways:
-
Clock Gating:
Currently, mainstream logic synthesis tools can directly infer ICG (Integrated Clock Gating) from coding styles. Therefore, as long as a certain coding style is followed, the clock of a group of registers can be automatically inferred to save dynamic power.
After logic synthesis, tools can generate the Clock Gating Rate for the entire circuit, and developers can use this data to determine whether their design has sufficient automatic ICG inferred. A good circuit generally has a Gating Rate of over 90%; otherwise, it may indicate that there are fewer data paths in the circuit (mainly based on narrow-width registers) or issues with the coding style.
-
Reduce Data Path Flips:
To reduce unnecessary dynamic power, the flipping of registers should be minimized.
Example one: In the processor pipeline, each stage usually requires a control bit (Valid bit) to indicate whether there is a valid instruction in that stage. When an instruction is loaded into this stage, the Valid bit is set high, and when it leaves this stage, the Valid bit is cleared. However, for the data path carrier part (Payload part) of this stage, instruction information is only loaded into the Payload part’s registers when an instruction is loaded into this stage (usually several tens of bits), and the registers do not need to be cleared when the instruction leaves this stage. This method can greatly reduce the flip rate of the data path registers.
Example two: For designs based on register FIFOs, although theoretically, a simple data item shifting method can be used to implement FIFO functionality, it is better to use a method that maintains read and write pointers (without shifting data items) to achieve FIFO functionality. This is because the shifting method causes a lot of register flips, while using read and write pointers keeps the values in the item registers static, thus significantly reducing dynamic power consumption; therefore, this method should be prioritized.
-
Data Path Without Reset:
Similarly, for the data path registers, registers without reset signals can even be used. Registers without reset signals have smaller area, better timing, and lower power consumption. For example, registers in certain Buffers, FIFOs, and Regfiles often use registers without reset.
However, caution must be exercised when using registers without reset to ensure they are not used as any other control signals, which could lead to undefined state propagation. A robust mechanism for capturing undefined states must be in place during pre-simulation to identify these issues; otherwise, it may cause serious bugs in the chip.
6. Low Power at the Latch Level
Latches are smaller and consume less power than registers. They can reduce chip power consumption in certain specific situations. However, latches can cause significant complications in the digital ASIC process, so they should be used with caution.
7. Low Power at the SRAM Level
SRAM is frequently used in chip design, and power consumption can be reduced in the following ways:
-
Selecting Appropriate SRAM:
Conventional SRAM is usually divided into “Single Port SRAM”, “Two-Port Regfile”, and “Dual-Port SRAM”. Other types of SRAM require special customization.
In terms of power consumption and area, Single Port SRAM is the smallest, Two-Port Regfile is next, and Dual-Port SRAM is the largest. The smaller power consumption and area should be prioritized, and high-power SRAM types should be avoided.
The data width of SRAM also affects its area. For example, for SRAM of the same size, assuming a total capacity of 16KB, if the data width is 32 bits, the depth is 4096; if the data width is 64 bits, the depth is 2048. Different width-depth ratios can lead to significantly different SRAM areas, so this should also be weighed.
-
Minimize SRAM Reads and Writes:
Dynamic power consumption from SRAM reads and writes is considerable, so these should be minimized.
For example, in the case of a processor fetching instructions, since processors usually fetch instructions sequentially, it is better to read back multiple instructions from SRAM at once rather than repeatedly reading small amounts of instructions, thus saving dynamic power.
-
Turn Off SRAM When Idle:
Similar to unit clock gating, the SRAM clock should be turned off during idle times to save dynamic power.
SRAM leakage power is also considerable, so in power-saving mode, the SRAM power supply can be turned off to save leakage.
8. Low Power at the Combinational Logic Level
Combinational logic is the basic logic in chips, and power consumption can be reduced in the following ways:
-
Reduce Area:
By using as little combinational logic area as possible, static power consumption can be reduced, which is a basic understanding of digital logic design and does not require further elaboration. Therefore, from the design concept and coding style, large data paths (or arithmetic units) should be reused to reduce area. Additionally, operations such as multiplication and division, which require large area arithmetic units, should be avoided and converted to addition and subtraction operations whenever possible.
-
Reduce Flip Rate:
By using logic gating, an additional level of AND gate can be added to the data path, preventing unused combinational logic from flipping during idle times, thus reducing dynamic power consumption.
However, due to the additional AND gate, this may not be acceptable in very tight timing scenarios, so caution is advised.
9. Low Power at the Process Level
Low power at the process level generally involves using special process cell libraries, which will not be discussed further in this article.
Conclusion
In summary, low-power mechanisms are crucial for processors. IC designers can adopt various low-power mechanisms from software, system, processor, unit, register, latch, SRAM, combinational logic, and process levels to reduce processor power consumption.
What low-power mechanisms have you implemented in your design?
More Information
Interested readers can follow the WeChat public account “Silicon Farmer Alexander” through the QR code below to learn more design tips and experience sharing related to Verilog, IC design, CPU, RISC-V, and artificial intelligence AI. Note: Due to the abundance of valuable content, please prepare some tea.