When writing a “dead loop” in the Linux kernel (<span><span>while(1);</span></span> or <span><span>for(;;);</span></span>), what happens depends on three key factors:
-
On which CPU the code runs and at what priority (interrupt context or process context);
-
Whether kernel preemption is enabled;
-
Whether the CPU is responsible for maintaining system heartbeat (tick, RCU, scheduling clock, IPI, etc.).
Let’s break down the scenarios step by step.
1. Process Context (Most Common Driver Implementation)
-
Code Form directly written in system calls like
<span><span>xxx_read()/ioctl()/mmap()</span></span>:while(1)cpu_relax();At this point, the thread belongs to the “kernel thread” or “user process in kernel mode”, and is schedulable and preemptible.
-
Preemption Disabled (CONFIG_PREEMPT_NONE / Older Kernel Versions)
-
The CPU is immediately occupied 100% by this thread until manually
<span><span>kill -9</span></span>or reboot. -
Since kernel preemption is disabled, even the scheduler cannot enter, and other threads (including
<span><span>kworker</span></span>,<span><span>ksoftirqd</span></span>,<span><span>sshd</span></span>) are completely starved on that CPU. -
If the system has only 1 CPU, the entire machine hangs; if multi-core, only that core dies while the others continue to work.
-
If the mouse, keyboard, or network packets rely on the deadlocked core to handle soft interrupts, it will also “appear” as if the entire machine has crashed.
Preemption Enabled (CONFIG_PREEMPT_VOLUNTARY / CONFIG_PREEMPT / RT Kernel)
-
If there are no explicit scheduling points inside the loop (
<span><span>cond_resched()</span></span>,<span><span>schedule()</span></span>,<span><span>might_sleep()</span></span>), it will still be considered as “involuntary” CPU occupation. -
However, the preemption count
<span><span>thread_info->preempt_count</span></span>is 0, soevery clock interrupt (HZ) will trigger<span><span>scheduler_tick()</span></span>, which finds that the current task’s time slice has expired and directly preempts. -
Result: – A single-core system will not completely die, but
<span><span>sshd/bash</span></span>and other interactive processes will respond very slowly (because 99% of the time is still taken by the dead loop task). –<span><span>/proc/loadavg</span></span>will immediately spike to 1.0 (single-core) or all N cores will be fully utilized. -
If
<span><span>cond_resched()</span></span>is added inside the loop, then every clock interrupt will actively yield the CPU, and the system will be almost imperceptible — but the loop itself will never end.
How to “self-rescue”
-
Multi-core machines: From another core,
<span><span>echo l > /proc/sysrq-trigger</span></span>to print the backtrace of all CPUs, which can immediately locate the dead loop’s PC. -
Single-core machines: Can only rely on NMI watchdog (see Section 4) to automatically reboot or manually trigger via serial port
<span><span>sysrq</span></span><code><span><span>.</span></span>
2. Interrupt Context (hardirq / softirq / tasklet)
-
Writing
<span><span>while(1);</span></span>in<span><span>irqreturn_t irq_handler()</span></span>
-
This interrupt line is masked on all CPUs (
<span><span>mask_irq()</span></span>), until the handler returns. -
If this interrupt is a clock interrupt (IRQ0), the system loses its tick, and the scheduler, RCU, and jiffies all freeze,the entire machine instantly dies.
-
If it is a network card interrupt, only the network card “appears” to be disconnected, while other services continue.
Dead loop in <span><span>tasklet</span></span> / <span><span>timer</span></span> callback
-
Tasklets run as soft interrupts, and preemption is still disabled, resulting in the same effect —the CPU is permanently trapped.
-
Since soft interrupts cannot sleep, even
<span><span>schedule()</span></span>cannot be called, the only recovery method is the NMI watchdog.
3. Dead Loop Inside Spin Lock Critical Section
spin_lock(&lock);while(1);
-
During the lock holding, preemption is disabled by default (
<span><span>PREEMPT_COUNT</span></span>is incremented by 1). -
Any subsequent attempts to acquire the same lock (including interrupt paths) will busy wait,causing cascading deadlocks, and the system will freeze immediately.
-
This is one of the “most cost-effective” ways to commit suicide in a driver.
4. NMI Watchdog — The Last Lifeline of the Kernel
Modern kernels have CONFIG_LOCKUP_DETECTOR enabled by default<span><span>(x86 is called nmi_watchdog, arm64 is called hardlockup_detector).</span></span> The principle is:
-
Each CPU maintains a per-CPU variable
<span><span>watchdog_touch_ts</span></span>, which is updated to<span><span>jiffies</span></span>on every clock interrupt. -
If > 20s (default) have not been updated, another alive CPU will grab the registers of that CPU via NMI (non-maskable interrupt), print the stack trace, and then call
<span><span>panic()</span></span>. -
Thus, even a single-core machine can automatically reboot after 20 seconds, leaving a message
<span><span>BUG: soft lockup - CPU</span><span>#0</span><span> stuck for 22s! [foo/1234]</span></span>for post-mortem analysis using
<span><span>crash</span></span>tools to analyze<span><span>vmcore</span></span>.
5. Practical: How to “Safely” Yield the CPU
If the driver indeed needs to poll a hardware bit, please follow the template below:
/* 1. Polling with scheduling points (sleepable context) */while (readl(reg)& BUSY) { cpu_relax(); /* Only saves power, no impact on correctness */ cond_resched(); /* Check if preemption is needed every round */ if (time_after(jiffies, timeout)) return -ETIMEDOUT;}/* 2. Polling in interrupt context (cannot sleep) */u64 end =get_cycles() + timeout_cycles;while (readl(reg)& BUSY) { if(get_cycles()> end) return-ETIMEDOUT; cpu_relax(); /* Cannot call schedule() or cond_resched() */ }
6. Summary in One Sentence
-
Process context + preemption disabled → that CPU is permanently starved, while multi-core can survive.
-
Interrupt context → the same CPU can never escape, and if the tick is lost, the entire machine dies.
-
Preemption enabled → clock interrupts can preempt, but load will spike;
-
NMI watchdog will help you reboot after 20 seconds and leave a crash report.
Therefore, writing a dead loop in the kernel is not simply “occupying the CPU”, but rather a high-risk operation that directly violates the fundamental assumptions of the scheduler; any polling in driver programming must have <span><span>cond_resched()</span></span><span><span> or timeout mechanisms, otherwise it is a ticking time bomb.</span></span>