stop_cpus_hard when multiple CPUs are panicking from an NMI

Attilio Rao attilio at freebsd.org
Sun Nov 25 12:29:51 UTC 2012


On Fri, Nov 16, 2012 at 2:47 PM, Andriy Gapon <avg at freebsd.org> wrote:
> on 16/11/2012 16:41 Attilio Rao said the following:
>> On Fri, Nov 16, 2012 at 1:18 PM, Andriy Gapon <avg at freebsd.org> wrote:
>>> on 16/11/2012 14:30 Attilio Rao said the following:
>>>> On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon <avg at freebsd.org> wrote:
>>>>> on 16/11/2012 00:58 Ryan Stone said the following:
>>>>>> At work we have some custom watchdog hardware that sends an NMI upon
>>>>>> expiry.  We've modified the kernel to panic when it receives the watchdog
>>>>>> NMI.  I've been trying the "stop_scheduler_on_panic" mode, and I've
>>>>>> discovered that when my watchdog expires, the system gets completely
>>>>>> wedged.  After some digging, I've discovered is that I have multiple CPUs
>>>>>> getting the watchdog NMI and trying to panic concurrently.  One of the CPUs
>>>>>> wins, and the rest spin forever in this code:
>>>>>>
>>>>>> /*
>>>>>>      * We don't want multiple CPU's to panic at the same time, so we
>>>>>>      * use panic_cpu as a simple spinlock.  We have to keep checking
>>>>>>      * panic_cpu if we are spinning in case the panic on the first
>>>>>>      * CPU is canceled.
>>>>>>      */
>>>>>>     if (panic_cpu != PCPU_GET(cpuid))
>>>>>>         while (atomic_cmpset_int(&panic_cpu, NOCPU,
>>>>>>             PCPU_GET(cpuid)) == 0)
>>>>>>             while (panic_cpu != NOCPU)
>>>>>>                 ; /* nothing */
>>>>>>
>>>>>> The system wedges when stop_cpus_hard() is called, which sends NMIs to all
>>>>>> of the other CPUs and waits for them to acknowledge that they are stopped
>>>>>> before returning.  However the CPU will not deliver an NMI to a CPU that is
>>>>>> already handling an NMI, so the other CPUs that got a watchdog NMI and are
>>>>>> spinning will never go into the NMI handler and acknowledge that they are
>>>>>> stopped.
>>>>>
>>>>> I thought about this issue and fixed (in my tree) in a different way:
>>>>> http://people.freebsd.org/~avg/cpu_stop-race.diff
>>>>>
>>>>> The need for spinlock_enter in the patch in not entirely clear.
>>>>> The main idea is that a CPU which calls cpu_stop and loses a race should
>>>>> voluntary enter cpustop_handler.
>>>>> I am also not sure about MI-cleanness of this patch.
>>>>
>>>> It is similar to what I propose but with some differences:
>>>> - It is not clean from MI perspective
>>>
>>> OK.
>>>
>>>> - I don't think we need to treact it specially, I would just
>>>> unconditionally stop all the CPUs entering in the "spinlock zone",
>>>> making the patch simpler.
>>>
>>> I couldn't understand this part.
>>
>> I'm sorry, I think I misread your patch.
>>
>> I was basically suggesting to fix Ryan case by calling
>> cpustop_handler() (or the new MI interface) into the panic() function,
>> in the case the CPU don't win the race for panic_cpu.
>> Basically doing:
>> Index: sys/kern/kern_shutdown.c
>> ===================================================================
>> --- sys/kern/kern_shutdown.c    (revision 243154)
>> +++ sys/kern/kern_shutdown.c    (working copy)
>> @@ -568,15 +568,11 @@ panic(const char *fmt, ...)
>>  #ifdef SMP
>>         /*
>>          * We don't want multiple CPU's to panic at the same time, so we
>> -        * use panic_cpu as a simple spinlock.  We have to keep checking
>> -        * panic_cpu if we are spinning in case the panic on the first
>> -        * CPU is canceled.
>> +        * use panic_cpu as a simple lock.
>>          */
>>         if (panic_cpu != PCPU_GET(cpuid))
>> -               while (atomic_cmpset_int(&panic_cpu, NOCPU,
>> -                   PCPU_GET(cpuid)) == 0)
>> -                       while (panic_cpu != NOCPU)
>> -                               ; /* nothing */
>> +               if (atomic_cmpset_int(&panic_cpu, NOCPU, PCPU_GET(cpuid)) == 0)
>> +                       cpustop_handler();
>>
>>         if (stop_scheduler_on_panic) {
>>                 if (panicstr == NULL && !kdb_active) {
>>
>>
>> Infact it seems to me that the comment is outdated and no longer
>> represent truth.
>
> Ah, I see.  Thank you.
>
> My older plan was to get rid of stop_scheduler_on_panic, that is to make the
> behavior unconditional.  And then to use stop_cpus_hard instead of the hand-rolled
> 'panic_cpu' spinlock.  This way the whichever CPU wins stop_cpus_hard would be the
> only CPU to enter panic.

So, assuming we are not yet defaulting to stop cpus behaviour for
panic, I think we have 2 things to take care:
1) Handling the simultaneous panics
2) Handling deadlocks/starvation among simultaneous cpu_stops

In particular, case 2 is more tricky than it seems, it is not only
related to NMI but also to the case where you send an IPI_STOP via a
normal IPI and interrupts are disabled on one of the channels. Infact,
I think the patch you propose makes such effects even worse, because
it disables interrupts in generic_stop_cpus().
What I suggest to do, is the following:
- The CPU which wins the race for generic_stop_cpus also signals the
CPUs it is willing to stop on a global mask
- Another CPU entering generic_stop_cpus() and loosing the race,
checks the mask of cpus which might be stopped and stops itself if
necessary (ie. not yet done). We must be careful with races here, but
I'm confindent this can be done easily enough.

About the case 1 I think that my patch should be good enough, if you
agree. We can worry about making the stop_cpu default case later on (I
don't recall if we had any real objection to it or we just wanted to
be protective).

I've also recently thought about the suspended_cpus mask. I know I'm
the one who pushed for it and you are the one who didn't like it, but
I'm caming to think that you may have been right with this. I want to
review some code and eventually we can merge suspended_cpus with
stopped_cpus eventually.

What do you think?

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein


More information about the freebsd-hackers mailing list