stop_cpus_hard when multiple CPUs are panicking from an NMI

Sun Nov 25 12:31:44 UTC 2012

On Sun, Nov 25, 2012 at 12:29 PM, Attilio Rao <attilio at freebsd.org> wrote:
> On Fri, Nov 16, 2012 at 2:47 PM, Andriy Gapon <avg at freebsd.org> wrote:
>> on 16/11/2012 16:41 Attilio Rao said the following:
>>> On Fri, Nov 16, 2012 at 1:18 PM, Andriy Gapon <avg at freebsd.org> wrote:
>>>> on 16/11/2012 14:30 Attilio Rao said the following:
>>>>> On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon <avg at freebsd.org> wrote:
>>>>>> on 16/11/2012 00:58 Ryan Stone said the following:
>>>>>>> At work we have some custom watchdog hardware that sends an NMI upon
>>>>>>> expiry.  We've modified the kernel to panic when it receives the watchdog
>>>>>>> NMI.  I've been trying the "stop_scheduler_on_panic" mode, and I've
>>>>>>> discovered that when my watchdog expires, the system gets completely
>>>>>>> wedged.  After some digging, I've discovered is that I have multiple CPUs
>>>>>>> getting the watchdog NMI and trying to panic concurrently.  One of the CPUs
>>>>>>> wins, and the rest spin forever in this code:
>>>>>>>
>>>>>>> /*
>>>>>>>      * We don't want multiple CPU's to panic at the same time, so we
>>>>>>>      * use panic_cpu as a simple spinlock.  We have to keep checking
>>>>>>>      * panic_cpu if we are spinning in case the panic on the first
>>>>>>>      * CPU is canceled.
>>>>>>>      */
>>>>>>>     if (panic_cpu != PCPU_GET(cpuid))
>>>>>>>         while (atomic_cmpset_int(&panic_cpu, NOCPU,
>>>>>>>             PCPU_GET(cpuid)) == 0)
>>>>>>>             while (panic_cpu != NOCPU)
>>>>>>>                 ; /* nothing */
>>>>>>>
>>>>>>> The system wedges when stop_cpus_hard() is called, which sends NMIs to all
>>>>>>> of the other CPUs and waits for them to acknowledge that they are stopped
>>>>>>> before returning.  However the CPU will not deliver an NMI to a CPU that is
>>>>>>> already handling an NMI, so the other CPUs that got a watchdog NMI and are
>>>>>>> spinning will never go into the NMI handler and acknowledge that they are
>>>>>>> stopped.
>>>>>>
>>>>>> I thought about this issue and fixed (in my tree) in a different way:
>>>>>> http://people.freebsd.org/~avg/cpu_stop-race.diff
>>>>>>
>>>>>> The need for spinlock_enter in the patch in not entirely clear.
>>>>>> The main idea is that a CPU which calls cpu_stop and loses a race should
>>>>>> voluntary enter cpustop_handler.
>>>>>> I am also not sure about MI-cleanness of this patch.
>>>>>
>>>>> It is similar to what I propose but with some differences:
>>>>> - It is not clean from MI perspective
>>>>
>>>> OK.
>>>>
>>>>> - I don't think we need to treact it specially, I would just
>>>>> unconditionally stop all the CPUs entering in the "spinlock zone",
>>>>> making the patch simpler.
>>>>
>>>> I couldn't understand this part.
>>>
>>> I'm sorry, I think I misread your patch.
>>>
>>> I was basically suggesting to fix Ryan case by calling
>>> cpustop_handler() (or the new MI interface) into the panic() function,
>>> in the case the CPU don't win the race for panic_cpu.
>>> Basically doing:
>>> Index: sys/kern/kern_shutdown.c
>>> ===================================================================
>>> --- sys/kern/kern_shutdown.c    (revision 243154)
>>> +++ sys/kern/kern_shutdown.c    (working copy)
>>> @@ -568,15 +568,11 @@ panic(const char *fmt, ...)
>>>  #ifdef SMP
>>>         /*
>>>          * We don't want multiple CPU's to panic at the same time, so we
>>> -        * use panic_cpu as a simple spinlock.  We have to keep checking
>>> -        * panic_cpu if we are spinning in case the panic on the first
>>> -        * CPU is canceled.
>>> +        * use panic_cpu as a simple lock.
>>>          */
>>>         if (panic_cpu != PCPU_GET(cpuid))
>>> -               while (atomic_cmpset_int(&panic_cpu, NOCPU,
>>> -                   PCPU_GET(cpuid)) == 0)
>>> -                       while (panic_cpu != NOCPU)
>>> -                               ; /* nothing */
>>> +               if (atomic_cmpset_int(&panic_cpu, NOCPU, PCPU_GET(cpuid)) == 0)
>>> +                       cpustop_handler();
>>>
>>>         if (stop_scheduler_on_panic) {
>>>                 if (panicstr == NULL && !kdb_active) {
>>>
>>>
>>> Infact it seems to me that the comment is outdated and no longer
>>> represent truth.
>>
>> Ah, I see.  Thank you.
>>
>> My older plan was to get rid of stop_scheduler_on_panic, that is to make the
>> behavior unconditional.  And then to use stop_cpus_hard instead of the hand-rolled
>> 'panic_cpu' spinlock.  This way the whichever CPU wins stop_cpus_hard would be the
>> only CPU to enter panic.
>
> So, assuming we are not yet defaulting to stop cpus behaviour for
> panic, I think we have 2 things to take care:
> 1) Handling the simultaneous panics
> 2) Handling deadlocks/starvation among simultaneous cpu_stops
>
> In particular, case 2 is more tricky than it seems, it is not only
> related to NMI but also to the case where you send an IPI_STOP via a
> normal IPI and interrupts are disabled on one of the channels. Infact,
> I think the patch you propose makes such effects even worse, because
> it disables interrupts in generic_stop_cpus().
> What I suggest to do, is the following:
> - The CPU which wins the race for generic_stop_cpus also signals the
> CPUs it is willing to stop on a global mask
> - Another CPU entering generic_stop_cpus() and loosing the race,
> checks the mask of cpus which might be stopped and stops itself if
> necessary (ie. not yet done). We must be careful with races here, but
> I'm confindent this can be done easily enough.

BTW; I'm aware such case is not fatal because of the safety belt, but
I think it would be too easy to get wrong code if we get simultaneous
generic_stop_cpus() to race in the "wrong way" (race loser with
interrupts/NMI disabled).

Thanks,
Attilio

-- 
Peace can only be achieved by understanding - A. Einstein