Nine Key Techniques for Low-Power Processor Design

Low-power mechanisms are crucial for processors. This article provides an overview of low-power technologies for processors.

For processors, while we pay great attention to their clock frequency and performance, one undeniable fact is that processors spend the vast majority of their time in standby or sleep mode. For example, the smartphones we use daily are mostly in a sleep state. Even during operation, the majority of the time is spent in scenarios with low performance requirements.

Taking the well-known ARM big.LITTLE architecture as an example, it uses energy-efficient small cores to operate in scenarios with low performance demands, activating the power-hungry big cores only at critical moments.

The low-power technology of processors can be discussed from multiple levels, from high-level software and systems to low-level hardware processes.

1. Low Power at the Software Level

The software programs running on processors give them their essence. The flexibility of the software layer is very high, and the effects of discovering low power consumption at the software level are more significant than the effects of hardware low power consumption itself. In simple terms, the effort of optimizing the underlying hardware to save power is far less effective than having the software put the processor into sleep mode more often.

To ensure that the processor consumes as little power as possible, a good software program should reasonably utilize the hardware resources of the processor, such as:

  • Only invoke power-hungry hardware in the most critical scenarios, and use low-power hardware in general scenarios as much as possible;
  • During idle times, the processor should enter low-power sleep mode whenever possible to save power.

Since this article focuses on hardware design, we will not elaborate on the mechanisms at the software level.

2. Low Power at the System Level

Low-power technology at the system level can involve board-level hardware systems and SoC (System on Chip) systems, with the principles being fundamentally similar. Taking the SoC system as an example, common low-power technologies include:

  • SoC systems often divide different power domains, allowing most of the hardware in the SoC to be powered down;

  • SoC systems often divide different clock domains, allowing a small portion of circuits to run at low speed and low power;

  • By combining different power and clock domains, various low-power modes can be defined, with the SoC equipped with a PMU (Power Management Unit) to control entering or exiting different low-power modes;

  • Software can use PMU functions to enter and exit different low-power modes in various scenarios.

3. Low Power at the Processor Level

Common low-power technologies at the processor level include:

  • A sleep instruction defined in the processor instruction set, which puts the processor core into a sleep state after execution.

For example, RISC-V defines the WFI instruction, which stands for Wait For Interrupt, specifically designed for sleep. When the processor reaches the WFI instruction, it stops executing the current instruction stream and enters an idle state, referred to as a “sleep” state, until the processor receives an interrupt (the interrupt local switch must be enabled, controlled by the mie register), at which point the processor wakes up. After waking up, if interrupts are globally enabled (controlled by the MIE field of the mstatus register), it enters the interrupt service routine; if interrupts are globally disabled, it continues executing the previously halted instruction stream.

  • The sleep state can be divided into shallow sleep and deep sleep.

  • Shallow sleep often shuts down the entire clock of the processor core while keeping the power supply on, thus saving dynamic power, but static leakage power still consumes energy;

  • Deep sleep often shuts down not only the clock of the processor core but also the power supply, thereby saving both dynamic and static power.

  • After deep sleep power-off, the internal context state of the processor core can be saved and restored using two strategies.

  • Strategy one uses retention-capable registers or SRAM inside the processor core to save the processor’s state. Retention Cells or SRAM can maintain the processor’s state with very low leakage consumption;

  • Strategy two employs a software Save-and-Restore mechanism, which saves the processor’s context state in the SoC’s always-on power domain before power-off, and after waking up and restoring power, the software reads back the state from the external always-on domain for restoration.

  • Strategy one has the advantage of extremely fast sleep and wake-up speeds, but the complexity of ASIC design is high. Strategy two is very simple to implement, but the sleep and wake-up speeds are relatively slower.

  • In processor architecture, a heterogeneous approach can be adopted to save power.

A well-known example is the ARM big.LITTLE architecture, which uses energy-efficient small cores to operate in scenarios with low performance requirements, activating the power-hungry big cores only at critical moments to save dynamic power.

4. Low Power at the Unit Level

Low-power technologies at the module and unit level have entered the realm of IC design microarchitecture, and their common technologies are essentially the same as those at the SoC system level, just on a smaller scale:

  • A fully functional unit often requires independent Clock Gates. When the module or unit is idle, the clock can be turned off using the Clock Gate to save dynamic power;
  • Some relatively independent and larger modules can even be divided into independent power domains to support power-off, further saving static power.

5. Low Power at the Register Level

Low-power technologies at the register level have entered the realm of IC design coding styles, and power consumption at the register level can be reduced in the following ways:

  • Clock Gating

  • Currently, mainstream logic synthesis tools can directly infer the ICG (Integrated Clock Gating) capability from the coding style. Therefore, by following certain coding styles, a set of registers can automatically be inferred for ICG to save dynamic power.

  • After logic synthesis is complete, the tools can generate the Clock Gating Rate for the entire circuit. Developers can assess whether their designed circuit has sufficient automatically inferred ICG based on this Clock Gating Rate data. A good circuit generally has a Gating Rate exceeding 90%; otherwise, it might indicate either too few data paths in the circuit (mainly based on small-width registers for control circuits) or issues with coding style.

  • Reduce Data Path Flips

  • To reduce unnecessary dynamic power, the flipping of registers should be minimized.

    • Example one: In the processor pipeline, each pipeline stage usually requires a control bit (Valid bit) to indicate whether there is a valid instruction in that stage. When an instruction is loaded into this pipeline stage, the Valid bit is set high, and when it exits, the Valid bit is cleared. However, for the data path carrier part of this pipeline stage (Payload part), the register only needs to load instruction information when the instruction is loaded into this stage (usually several tens of bits), and there is no need to clear the register of the Payload part when the instruction leaves this stage. This method can greatly reduce the flip rate of the data path registers.

    • Example two: For register-based FIFO designs, while it is theoretically possible to implement FIFO functionality using a simple data item shifting method, it is better to use a read-write pointer maintenance method (without shifting data item registers) to achieve FIFO functionality. This is because the data item shifting method can cause a significant amount of register flipping, while using the read-write pointer method keeps the values in the item registers static, thereby greatly reducing dynamic power consumption, so this method should be prioritized.

  • Data Path Without Reset

  • Similarly, for the registers in the data path, registers without reset signals can even be used. Registers without reset signals occupy less area, have better timing, and consume less power. For example, in certain Buffer, FIFO, and Regfile register parts, registers without reset signals are often used.

  • However, caution must be exercised when using registers without reset signals to ensure they are not used as any other control signals to avoid the propagation of undefined states. A robust mechanism for capturing undefined states must be in place during pre-simulation to identify these issues; otherwise, it may lead to serious bugs in the chip.

6. Low Power at the Latch Level

Latches are smaller and consume less power compared to registers. They can reduce chip power consumption when used in specific scenarios. However, latches can cause significant complications in digital ASIC processes, so they should be used cautiously.

7. Low Power at the SRAM Level

SRAM is frequently used in chip design and can reduce power consumption in the following ways:

  • Select Appropriate SRAM

  • Conventional SRAM is usually divided into “Single Port SRAM,” “Two-Port Regfile,” and “Dual-Port SRAM.” Other types of SRAM require special customization.

  • From the perspective of power consumption and area, Single Port SRAM is the smallest, followed by Two-Port Regfile, and Dual-Port SRAM is the largest. One should prioritize selecting the smallest power and area options and avoid using high-power SRAM types.

  • The data width of SRAM also affects its area. For example, with the same size of SRAM, assuming a total capacity of 16KB, if the SRAM data width is 32 bits, the depth is 4096, whereas if the data width is 64 bits, the depth is 2048. Different width-depth ratios may result in significantly different SRAM areas, so comprehensive consideration is necessary.

  • Minimize SRAM Read/Write

  • Dynamic power consumption during SRAM read/write is considerable, so it should be minimized.

  • For example, when the processor fetches instructions, since most processors fetch sequentially, it is better to read back multiple instructions from SRAM at once rather than repeatedly reading SRAM (reading one instruction at a time), thus saving dynamic power consumption.

  • Turn Off SRAM When Idle

  • Similar to unit clock gating, the SRAM clock should be turned off when idle to save dynamic power.

  • SRAM’s leakage power consumption is considerable; therefore, in power-saving mode, the power supply to SRAM can be turned off to save leakage.

8. Low Power at the Combinational Logic Level

Combinational logic is fundamental in chips and can reduce power consumption in the following ways:

  • Reduce Area

  • By using as little combinational logic area as possible, static power consumption can be reduced, which is a basic understanding of digital logic design, and does not require further elaboration. Therefore, from the design approach and coding style, larger data paths (or computational units) should be reused to minimize area. Additionally, one should avoid using large-area computational units like division and multiplication and instead convert them into addition and subtraction operations.

  • Reduce Flip Rate

  • By using logic gating methods, an additional level of “AND” gate can be added to the data path to prevent unused combinational logic from flipping when idle, thereby reducing dynamic power consumption.

  • However, since adding an extra level of AND gate may not be acceptable in very tight timing scenarios, caution is advised.

9. Low Power at the Process Level

Low power at the process level generally involves using special process cell libraries, which will not be discussed further in this article.

In summary, low-power mechanisms are crucial for processors. IC designers can adopt various low-power mechanisms at the software, system, processor, unit, register, latch, SRAM, combinational logic, and process levels to reduce processor power consumption.

What low-power mechanisms have you adopted in your design?

Source: Silicon Farmer Alexander
Friendly Reminder:

Due to recent changes in WeChat public platform push rules, many readers have reported not seeing updated articles in a timely manner. According to the latest rules, it is recommended to click on “Recommended Reading, Share, Collect,” etc., to become a regular reader.

Recommended Reading:

  • Apple iPhone 14 Series Teardown (Caution: Many Images)

  • Another Executive from the Big Fund Under Investigation, Involving Multiple Listed Companies!

  • The Burning Telecom Building in Changsha, Recently Released Fire Repair Tender Announcement…

Please click 【View】 to give the editor a thumbs up

Nine Key Techniques for Low-Power Processor Design

Leave a Comment