Summary
Following last week’s fix for the power management performance regression affecting some Intel Chromebooks, Linux 6.18-rc4 brings a critical patch for CPU idle state management (cpuidle menu governor) to address another performance decline issue that has persisted since 6.17. This regression caused a performance drop of approximately 11% in multi-threaded throughput tests on desktop platforms such as the Intel Core i5-10600K. The patch by Rafael Wysocki adds an “exit latency” check to prevent the governor from selecting inappropriate idle states. Additionally, 6.18-rc4 also fixes the GFP mask restriction logic and timing issues during the system suspend (suspend/hibernate) path, enhancing system stability and maintainability.
Review: Where the Performance Regression Came From
An optimization introduced last year (during the 6.17 development cycle) aimed to:
<span>cpuidle: governors: menu: Avoid selecting states with too much latency</span>was intended to prevent the CPU from entering high-latency idle states, thereby improving responsiveness.
However, the side effect was that: in certain medium-load scenarios, the menu governor incorrectly abandoned the polling state (active polling) in favor of deeper idle states. As a result, the CPU spent more time in frequent wake-sleep transitions, leading to an overall decrease in throughput performance.
Intel engineer Doug Smythies discovered that this logic introduced in 6.17 caused approximately an 11% drop in throughput during desktop testing. Rafael Wysocki (Linux power management maintainer) subsequently confirmed the issue and submitted a fix in 6.18-rc4.
Core Fix: Making the Governor Smarter in State Selection
Patch Location:
<span>drivers/cpuidle/governors/menu.c</span>
Key Changes:
Added a comparison condition between exit latency (<span>exit_latency_ns</span>) and predicted idle time (<span>predicted_ns</span>). When the exit latency of an idle state exceeds the predicted idle time, that state will no longer be selected, and polling will be used instead.
if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
s->target_residency_ns <= data->next_timer_ns &&
s->exit_latency_ns <= predicted_ns) {
predicted_ns = s->target_residency_ns;
idx = i;
break;
}
Technical Key Points:
- target_residency_ns: The desired minimum residency time for this idle state.
- exit_latency_ns: The delay required to wake from this state.
- predicted_ns: The CPU’s predicted next wake-up time (based on timer predictions).
With the addition of the exit latency condition, the governor can more reasonably assess whether “entering a deeper idle state is worth it”. In simple terms—if waking up takes too long, it’s better not to sleep.
Improvements in System Sleep Path: GFP Mask and Wake State Corrections
In addition to the CPUidle fix, this round of power management updates includes three system-level fixes:
1. <span>pm_restrict_gfp_mask()</span> stacking call support
Involved Files:
<span>kernel/power/main.c</span>
Key Changes:
- Introduced a
<span>saved_gfp_count</span>counter to allow multiple calls to<span>pm_restrict_gfp_mask()</span>. - Avoided repeated restoration or incorrect overwriting of
<span>gfp_allowed_mask</span>, enhancing the robustness of the suspend/hibernate call chain.
Background: The previous logic assumed the call would only occur once, which could lead to confusion if the path was nested (e.g., suspend → hibernate → resume).
if (saved_gfp_count++) {
WARN_ON((saved_gfp_mask & ~(__GFP_IO | __GFP_FS)) != gfp_allowed_mask);
return;
}
saved_gfp_mask = gfp_allowed_mask;
gfp_allowed_mask &= ~(__GFP_IO | __GFP_FS);
2. Revert “PM: sleep: Make pm_wakeup_clear() call more clear”
Involved Files:
<span>kernel/power/suspend.c</span>, <span>kernel/power/process.c</span>
Key Changes:
- Adjusted the order of
<span>pm_wakeup_clear(0)</span>calls, moving it to an earlier freeze stage. - Avoided repeated calls before
<span>suspend_prepare()</span>, correcting the race condition caused by previous cleanup logic.
This prevents the system from prematurely clearing wake flags during the device freeze (freeze process) stage, thus avoiding the loss of wake interrupts.
3. Clean up redundant GFP restoration calls in Hibernate path
Involved Files:
<span>kernel/power/hibernate.c</span>
Removed redundant <span>pm_restore_gfp_mask()</span> calls, in conjunction with the aforementioned stacking mechanism, to make GFP mask restoration logic more consistent.
Performance Validation and Results
According to community testing and submission notes:
- On the Intel Core i5-10600K platform, 6.18-rc4 has restored throughput performance to baseline levels compared to the 6.17 regression fix.
- Interrupt counts in the BIO write path have decreased by approximately 8%, with a significant reduction in CPU idle→wake transitions.
- Power management logs show a decrease of about 15% in the triggering rate of exit latency determination logic, indicating a more robust governor.
These minor adjustments, while small, are particularly critical for high-frequency sleep devices like notebooks and Chromebooks—avoiding performance regressions while improving power consistency.
In-depth Analysis and Insights
-
The “Precision” Issue of cpuidle PolicyThis fix reflects a long-standing topic: how the idle governor quantifies the balance between performance and power consumption. Being too conservative wastes performance, while being too aggressive delays responsiveness.
-
The Complexity of Power Management Code Lies in Multi-Subsystem InteractionsThis update involves cpuidle, suspend, and hibernate paths. Minor logical regressions can have cascading effects across different CPU architectures and system states.
-
The GFP Mask Changes are a Model of Engineering Reliability ImprovementWhile not directly impacting performance, they make the complex suspend/hibernate process safer and more debuggable. Rafael’s team is clearly pursuing a “maintainability-first” approach.
-
The Chromebook Incident Prompted a Rapid Fix CycleUser feedback is a significant driver for the evolution of the kernel’s energy management system—indicating that the Linux PM subsystem’s responsiveness has matured.
Conclusion
The power management fixes in Linux 6.18-rc4, while not extensive (approximately 23 lines inserted, 12 lines deleted), address a real performance pain point. The improvement in the cpuidle menu governor has restored balance between “performance and power consumption”. From a broader perspective, this reflects the Linux kernel’s increasingly mature capability for fine-grained energy scheduling, laying the groundwork for implementing smarter energy models on heterogeneous CPUs (such as Intel hybrid, ARM big.LITTLE) in the future.
📚 References
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a5dbbb39e11d50a8c426b8d88f5b12031fee49f3