stop_cpus_hard when multiple CPUs are panicking from an NMI

Sun Nov 25 14:01:19 UTC 2012

On Sun, Nov 25, 2012 at 12:55 PM, Andriy Gapon <avg at freebsd.org> wrote:
> on 25/11/2012 14:29 Attilio Rao said the following:
>> I think the patch you propose makes such effects even worse, because
>> it disables interrupts in generic_stop_cpus().
>> What I suggest to do, is the following:
>> - The CPU which wins the race for generic_stop_cpus also signals the
>> CPUs it is willing to stop on a global mask
>> - Another CPU entering generic_stop_cpus() and loosing the race,
>> checks the mask of cpus which might be stopped and stops itself if
>> necessary (ie. not yet done). We must be careful with races here, but
>> I'm confindent this can be done easily enough.
>
> I think that you either misunderstood my patch or I misunderstand your
> suggestion, because my patch does exactly what you wrote above.

The patch is someway incomplete:
- I don't think that we need specific checks in cpustop_handler() (and
if you have added them to prevent races, I don't think they are
enough, see below)
- setting of "stopping_cpus" map must happen atomically/before the
stopper_cpu cpuid setting, otherwise some CPUs may end up using a NULL
mask in the check
- Did you consider the races about when a stop and restart request
happen just after the CPU_ISSET() check? I think CPUs can deadlock
there.
- I'm very doubious about the spinlock_enter() stuff, I think I can
just make the problem worse atm.

However you are right, the concept of your patch is the same I really
wanted to get, we maybe need to just lift it up a bit.

In the while I also double-checked suspended_cpus and I don't think
there are real showstoppers to have it in stopped_cpus map.

Thanks,
Attilio

-- 
Peace can only be achieved by understanding - A. Einstein