cvs commit: src/sys/i386/i386 vm_machdep.c
Bruce Evans
bde at zeta.org.au
Thu Dec 16 23:06:38 PST 2004
On Thu, 16 Dec 2004, John Baldwin wrote:
> On Wednesday 15 December 2004 10:51 pm, Bruce Evans wrote:
> > On Wed, 15 Dec 2004, Kris Kennaway wrote:
> > ...
> > > I often
> > > get overlapping panics from the other CPUs on this machine, and it
> > > often locks up when trying to enter DDB, or while printing the panic
> > > string (the other day it only got as far as 'p' before hanging).
> >
> > panic() needs much the same locking as ddb to prevent concurrent entry.
> > It must be fairly likely for all CPUs to panic on the same asertion.
> > This is like all CPUs entering ddb on the same breakpoint.
>
> The thing is, panic does have locking, but it appears to be ineffective:
>
> #ifdef SMP
> /*
> * We don't want multiple CPU's to panic at the same time, so we
> * use panic_cpu as a simple spinlock. We have to keep checking
> * panic_cpu if we are spinning in case the panic on the first
> * CPU is canceled.
> */
> if (panic_cpu != PCPU_GET(cpuid))
> while (atomic_cmpset_int(&panic_cpu, NOCPU,
> PCPU_GET(cpuid)) == 0)
> while (panic_cpu != NOCPU)
> ; /* nothing */
> #endif
Oops, I forgot bout that (and never really noticed that it was the very
first thing in panic() as it needs to be).
> In the smpng branch in p4, I have the lock changed to be based on the thread
> rather than the CPU to account for problems coming from migration due to
> preemption while in a panic, but I haven't observed any noticeable
> improvement from the change:
I can't see any serious problems with the above. Perhaps it should try
to stop the other CPUs like ddb, since panic() is going to do drastic
things. I think it needs to use atomic_load_acq_int() in the inner loop,
but that only affects the RESTARTABLE_PANICS case.
> --- //depot/vendor/freebsd/src/sys/kern/kern_shutdown.c 2004/11/05 19:00:32
> +++ //depot/projects/smpng/sys/kern/kern_shutdown.c 2004/11/05 19:22:55
> @@ -473,7 +473,7 @@
> }
>
> #ifdef SMP
> -static u_int panic_cpu = NOCPU;
> +static struct thread *panic_thread = NULL;
> #endif
>
> /*
> @@ -494,15 +494,14 @@
> #ifdef SMP
> /*
> * We don't want multiple CPU's to panic at the same time, so we
> - * use panic_cpu as a simple spinlock. We have to keep checking
> - * panic_cpu if we are spinning in case the panic on the first
> + * use panic_thread as a simple spinlock. We have to keep checking
> + * panic_thread if we are spinning in case the panic on the first
> * CPU is canceled.
> */
> - if (panic_cpu != PCPU_GET(cpuid))
> - while (atomic_cmpset_int(&panic_cpu, NOCPU,
> - PCPU_GET(cpuid)) == 0)
> - while (panic_cpu != NOCPU)
> - ; /* nothing */
> + if (panic_thread != curthread)
> + while (atomic_cmpset_ptr(&panic_thread, NULL, curthread) == 0)
> + while (panic_thread != NULL)
> + cpu_spinwait();
> #endif
>
> bootopt = RB_AUTOBOOT | RB_DUMP;
> @@ -538,7 +537,7 @@
> /* See if the user aborted the panic, in which case we continue. */
> if (panicstr == NULL) {
> #ifdef SMP
> - atomic_store_rel_int(&panic_cpu, NOCPU);
> + atomic_store_rel_ptr(&panic_thread, NULL);
> #endif
> return;
> }
Hmm, this has a long history. It was committed more than 2 years ago
in rev.1.132, then immediately backed out since it wasn't ready. I
slightly prefer to use the cpuid, since panic() can occur in non-thread
context and during context switching. PCPU_GET(cpuid) is also better
than curthread->td_oncpu, since changing the latter is not atomic with
changing curthread.
Bruce
More information about the cvs-all
mailing list