stop_cpus_hard when multiple CPUs are panicking from an NMI

Thu Nov 15 22:58:26 UTC 2012

At work we have some custom watchdog hardware that sends an NMI upon
expiry.  We've modified the kernel to panic when it receives the watchdog
NMI.  I've been trying the "stop_scheduler_on_panic" mode, and I've
discovered that when my watchdog expires, the system gets completely
wedged.  After some digging, I've discovered is that I have multiple CPUs
getting the watchdog NMI and trying to panic concurrently.  One of the CPUs
wins, and the rest spin forever in this code:

/*
     * We don't want multiple CPU's to panic at the same time, so we
     * use panic_cpu as a simple spinlock.  We have to keep checking
     * panic_cpu if we are spinning in case the panic on the first
     * CPU is canceled.
     */
    if (panic_cpu != PCPU_GET(cpuid))
        while (atomic_cmpset_int(&panic_cpu, NOCPU,
            PCPU_GET(cpuid)) == 0)
            while (panic_cpu != NOCPU)
                ; /* nothing */

The system wedges when stop_cpus_hard() is called, which sends NMIs to all
of the other CPUs and waits for them to acknowledge that they are stopped
before returning.  However the CPU will not deliver an NMI to a CPU that is
already handling an NMI, so the other CPUs that got a watchdog NMI and are
spinning will never go into the NMI handler and acknowledge that they are
stopped.

I've been able to work around this with the following hideous hack:

--- kern_shutdown.c     2012-08-17 10:25:02.000000000 -0400
+++ kern_shutdown.c     2012-11-15 17:04:10.000000000 -0500
@@ -658,11 +658,15 @@
         * panic_cpu if we are spinning in case the panic on the first
         * CPU is canceled.
         */
-       if (panic_cpu != PCPU_GET(cpuid))
+       if (panic_cpu != PCPU_GET(cpuid)) {
                while (atomic_cmpset_int(&panic_cpu, NOCPU,
-                   PCPU_GET(cpuid)) == 0)
+                   PCPU_GET(cpuid)) == 0) {
+                       atomic_set_int(&stopped_cpus, PCPU_GET(cpumask));
                        while (panic_cpu != NOCPU)
                                ; /* nothing */
+               }
+               atomic_clear_int(&stopped_cpus, PCPU_GET(cpumask));
+       }

        if (stop_scheduler_on_panic) {
                if (panicstr == NULL && !kdb_active)


But I'm hoping that somebody has some ideas on a better way to fix this
kind of problem.