Stop scheduler on panic
Attilio Rao
attilio at freebsd.org
Tue Dec 6 22:11:03 UTC 2011
2011/12/6 Andriy Gapon <avg at freebsd.org>:
> on 06/12/2011 20:34 Attilio Rao said the following:
> [snip]
>> - I'm not entirely sure, why we want to disable interrupts at this
>> moment (before to stop other CPUs)?:
>
> Because I believe that stop_cpus_hard() should run in a context with interrupts
> and preemption disabled. Also, I believe that the whole panic handling code
> should run in the same context. So it was only natural for me to enter that
> context at this point.
I'm not against that, I don't see anything wrong with having
interrupts disabled at that point.
>> @@ -547,13 +555,18 @@ panic(const char *fmt, ...)
>> {
>> #ifdef SMP
>> static volatile u_int panic_cpu = NOCPU;
>> + cpuset_t other_cpus;
>> #endif
>> struct thread *td = curthread;
>> int bootopt, newpanic;
>> va_list ap;
>> static char buf[256];
>>
>> - critical_enter();
>> + if (stop_scheduler_on_panic)
>> + spinlock_enter();
>> + else
>> + critical_enter();
>> +
>>
>> - In this chunk I don't entirely understand the kdb_active check:
>> @@ -566,11 +579,18 @@ panic(const char *fmt, ...)
>> PCPU_GET(cpuid)) == 0)
>> while (panic_cpu != NOCPU)
>> ; /* nothing */
>> + if (stop_scheduler_on_panic) {
>> + if (panicstr == NULL && !kdb_active) {
>> + other_cpus = all_cpus;
>> + CPU_CLR(PCPU_GET(cpuid), &other_cpus);
>> + stop_cpus_hard(other_cpus);
>> + }
>> + }
>> #endif
>>
>> bootopt = RB_AUTOBOOT;
>> newpanic = 0;
>> - if (panicstr)
>> + if (panicstr != NULL)
>> bootopt |= RB_NOSYNC;
>> else {
>> bootopt |= RB_DUMP;
>>
>> Is it for avoiding to pass an empty mask to stop_cpus() in kdb_trap()
>> (I saw you changed the policy there)?
>
> Yes. You know my position about elimination of the cpuset parameter to
> stop_cpus_* and my intention to do so. This is related to that. Right now that
> check is not strictly necessary, but it doesn't do any harm either. We know
> that all other CPUs are already stopped when kdb_active (ditto for panicstr !=
> NULL).
I see there could be races with disabiling interrupts and having 2
different stopping mechanisms that want to stop cpus, even using
stop_cpus_hard(), on architectures that don't use a privileged channel
(NMI) as mostly of our !x86.
They are mostly very rare races (requiring kdb_trap() and panic() to
happen in parallel on different CPUs).
>> Maybe we can find a better integration among the two.
>
> What kind of integration?
> Right now I have simple model - both stop all other CPUs.
Well, there is no synchronization atm between panic stopping cpus and
the kdb_trap(). When kdb_trap() stop cpus it has interrupts disabled
and if panic already started they will both deadlock because IPI_STOP
won't be properly delivered.
However, I see all this as a problem with other arches not having/not
implementing a real dedicated channel for cpu_stop_hard(), so we
should not think about it now.
I think we may need some sort of control as panic already does with
panic_cpu before to disable interrupts, but don't worry about it now.
>> I'd also move the setting of stop_scheduler variable in the "if", it
>> seems a bug to me to have it set otherwise.
>
> Can you please explain what bug do you suspect here?
> I do not see any.
I just see more natural to move it within the above if (panicstr ==
NULL ...) condition.
> [snip]
>> - I'm not sure I like to change the policies on cpu stopping for KDB
>> with this patchset.
>
> I am not sure what exactly you mean by change in policies. I do not see any
> such change, entering kdb always stops all other CPUs, with or without the patch.
Yes, I was confused by older code did just stopped CPUs before
kdb_trap() manually, I think what kdb_trap() does now is ok (and you
just retain it).
I'd just change this check on panicstr:
@@ -606,9 +603,13 @@ kdb_trap(int type, int code, struct trapframe *tf)
intr = intr_disable();
#ifdef SMP
- other_cpus = all_cpus;
- CPU_CLR(PCPU_GET(cpuid), &other_cpus);
- stop_cpus_hard(other_cpus);
+ if (panicstr == NULL) {
+ other_cpus = all_cpus;
+ CPU_CLR(PCPU_GET(cpuid), &other_cpus);
+ stop_cpus_hard(other_cpus);
+ did_stop_cpus = 1;
+ } else
+ did_stop_cpus = 0;
to be SCHEDULER_STOPPED().
If you agree I can fix the kern_mutex, kern_sx and kern_rwlock parts
and it should be done.
Attilio
--
Peace can only be achieved by understanding - A. Einstein
More information about the freebsd-current
mailing list