What Happens When a Deadlock Occurs in Linux Driver Code?

When writing a “dead loop” in the Linux kernel (<span><span>while(1);</span></span> or <span><span>for(;;);</span></span>), what happens depends on three key factors:

  1. On which CPU the code runs and at what priority (interrupt context or process context);

  2. Whether kernel preemption is enabled;

  3. Whether the CPU is responsible for maintaining system heartbeat (tick, RCU, scheduling clock, IPI, etc.).

Let’s break down the scenarios step by step.

1. Process Context (Most Common Driver Implementation)

  1. Code Form directly written in system calls like <span><span>xxx_read()/ioctl()/mmap()</span></span>:

    while(1)cpu_relax();

    At this point, the thread belongs to the “kernel thread” or “user process in kernel mode”, and is schedulable and preemptible.

  2. Preemption Disabled (CONFIG_PREEMPT_NONE / Older Kernel Versions)

  • The CPU is immediately occupied 100% by this thread until manually <span><span>kill -9</span></span> or reboot.

  • Since kernel preemption is disabled, even the scheduler cannot enter, and other threads (including <span><span>kworker</span></span>, <span><span>ksoftirqd</span></span>, <span><span>sshd</span></span>) are completely starved on that CPU.

  • If the system has only 1 CPU, the entire machine hangs; if multi-core, only that core dies while the others continue to work.

  • If the mouse, keyboard, or network packets rely on the deadlocked core to handle soft interrupts, it will also “appear” as if the entire machine has crashed.

  • Preemption Enabled (CONFIG_PREEMPT_VOLUNTARY / CONFIG_PREEMPT / RT Kernel)

    • If there are no explicit scheduling points inside the loop (<span><span>cond_resched()</span></span>, <span><span>schedule()</span></span>, <span><span>might_sleep()</span></span>), it will still be considered as “involuntary” CPU occupation.

    • However, the preemption count <span><span>thread_info->preempt_count</span></span> is 0, soevery clock interrupt (HZ) will trigger <span><span>scheduler_tick()</span></span>, which finds that the current task’s time slice has expired and directly preempts.

    • Result: – A single-core system will not completely die, but <span><span>sshd/bash</span></span> and other interactive processes will respond very slowly (because 99% of the time is still taken by the dead loop task).<span><span>/proc/loadavg</span></span> will immediately spike to 1.0 (single-core) or all N cores will be fully utilized.

    • If <span><span>cond_resched()</span></span> is added inside the loop, then every clock interrupt will actively yield the CPU, and the system will be almost imperceptible — but the loop itself will never end.

  • How to “self-rescue”

    • Multi-core machines: From another core, <span><span>echo l > /proc/sysrq-trigger</span></span> to print the backtrace of all CPUs, which can immediately locate the dead loop’s PC.

    • Single-core machines: Can only rely on NMI watchdog (see Section 4) to automatically reboot or manually trigger via serial port <span><span>sysrq</span></span><code><span><span>.</span></span>

    2. Interrupt Context (hardirq / softirq / tasklet)

    1. Writing <span><span>while(1);</span></span> in <span><span>irqreturn_t irq_handler()</span></span>

    • This interrupt line is masked on all CPUs (<span><span>mask_irq()</span></span>), until the handler returns.

    • If this interrupt is a clock interrupt (IRQ0), the system loses its tick, and the scheduler, RCU, and jiffies all freeze,the entire machine instantly dies.

    • If it is a network card interrupt, only the network card “appears” to be disconnected, while other services continue.

  • Dead loop in <span><span>tasklet</span></span> / <span><span>timer</span></span> callback

    • Tasklets run as soft interrupts, and preemption is still disabled, resulting in the same effect —the CPU is permanently trapped.

    • Since soft interrupts cannot sleep, even <span><span>schedule()</span></span> cannot be called, the only recovery method is the NMI watchdog.

    3. Dead Loop Inside Spin Lock Critical Section

    spin_lock(&amp;lock);while(1);
    • During the lock holding, preemption is disabled by default (<span><span>PREEMPT_COUNT</span></span> is incremented by 1).

    • Any subsequent attempts to acquire the same lock (including interrupt paths) will busy wait,causing cascading deadlocks, and the system will freeze immediately.

    • This is one of the “most cost-effective” ways to commit suicide in a driver.

    4. NMI Watchdog — The Last Lifeline of the Kernel

    Modern kernels have CONFIG_LOCKUP_DETECTOR enabled by default<span><span>(x86 is called nmi_watchdog, arm64 is called hardlockup_detector).</span></span> The principle is:

    • Each CPU maintains a per-CPU variable <span><span>watchdog_touch_ts</span></span>, which is updated to <span><span>jiffies</span></span> on every clock interrupt.

    • If > 20s (default) have not been updated, another alive CPU will grab the registers of that CPU via NMI (non-maskable interrupt), print the stack trace, and then call <span><span>panic()</span></span>.

    • Thus, even a single-core machine can automatically reboot after 20 seconds, leaving a message

      <span><span>BUG: soft lockup - CPU</span><span>#0</span><span> stuck for 22s! [foo/1234]</span></span>

      for post-mortem analysis using <span><span>crash</span></span> tools to analyze <span><span>vmcore</span></span>.

    5. Practical: How to “Safely” Yield the CPU

    If the driver indeed needs to poll a hardware bit, please follow the template below:

    /* 1. Polling with scheduling points (sleepable context) */while (readl(reg)&amp; BUSY) {  cpu_relax();  /* Only saves power, no impact on correctness */  cond_resched();  /* Check if preemption is needed every round */  if (time_after(jiffies, timeout))     return -ETIMEDOUT;}/* 2. Polling in interrupt context (cannot sleep) */u64 end =get_cycles() + timeout_cycles;while (readl(reg)&amp; BUSY) {  if(get_cycles()&gt; end)      return-ETIMEDOUT;  cpu_relax();  /* Cannot call schedule() or cond_resched() */ }
    
    

    6. Summary in One Sentence

    • Process context + preemption disabled → that CPU is permanently starved, while multi-core can survive.

    • Interrupt context → the same CPU can never escape, and if the tick is lost, the entire machine dies.

    • Preemption enabled → clock interrupts can preempt, but load will spike;

    • NMI watchdog will help you reboot after 20 seconds and leave a crash report.

    Therefore, writing a dead loop in the kernel is not simply “occupying the CPU”, but rather a high-risk operation that directly violates the fundamental assumptions of the scheduler; any polling in driver programming must have <span><span>cond_resched()</span></span><span><span> or timeout mechanisms, otherwise it is a ticking time bomb.</span></span>

    Leave a Comment