cvs commit: src/sys/i386/i386 vm_machdep.c
John Baldwin
jhb at FreeBSD.org
Thu Dec 16 12:31:00 PST 2004
On Wednesday 15 December 2004 10:51 pm, Bruce Evans wrote:
> On Wed, 15 Dec 2004, Kris Kennaway wrote:
> > On Tue, Dec 14, 2004 at 09:48:48PM -0500, John Baldwin wrote:
> > > On Tuesday 14 December 2004 07:10 pm, Kris Kennaway wrote:
> > > > NB: DDB often isn't usable on SMP machines thesedays, and will hang
> > > > when a panic tries to enter it.
> > >
> > > Try debug.kdb.stop_cpus=0 (sysctl and tunable) to prevent KDB from
> > > trying to stop the other CPUs. Another possible fix that ups@ has
> > > talked about is changing IPI_STOP to use an NMI rather than a vector
> > > (you can send NMI IPIs via the local APIC) so that IPI_STOP is more
> > > reliable.
> >
> > This is already set, and it doesn't always fix the problem.
>
> debug.kdb.stop_cpus=0 should be expected to increase problems. Given time,
> the other CPU are quite likely to enter ddb for whatever reason the first
> one did. Then they stomp on ddb's global state (starting with ddb_regs).
>
> The NMI would need locking to prevent the CPUs stopping each other.
>
> > I often
> > get overlapping panics from the other CPUs on this machine, and it
> > often locks up when trying to enter DDB, or while printing the panic
> > string (the other day it only got as far as 'p' before hanging).
>
> panic() needs much the same locking as ddb to prevent concurrent entry.
> It must be fairly likely for all CPUs to panic on the same asertion.
> This is like all CPUs entering ddb on the same breakpoint.
The thing is, panic does have locking, but it appears to be ineffective:
#ifdef SMP
/*
* We don't want multiple CPU's to panic at the same time, so we
* use panic_cpu as a simple spinlock. We have to keep checking
* panic_cpu if we are spinning in case the panic on the first
* CPU is canceled.
*/
if (panic_cpu != PCPU_GET(cpuid))
while (atomic_cmpset_int(&panic_cpu, NOCPU,
PCPU_GET(cpuid)) == 0)
while (panic_cpu != NOCPU)
; /* nothing */
#endif
In the smpng branch in p4, I have the lock changed to be based on the thread
rather than the CPU to account for problems coming from migration due to
preemption while in a panic, but I haven't observed any noticeable
improvement from the change:
--- //depot/vendor/freebsd/src/sys/kern/kern_shutdown.c 2004/11/05 19:00:32
+++ //depot/projects/smpng/sys/kern/kern_shutdown.c 2004/11/05 19:22:55
@@ -473,7 +473,7 @@
}
#ifdef SMP
-static u_int panic_cpu = NOCPU;
+static struct thread *panic_thread = NULL;
#endif
/*
@@ -494,15 +494,14 @@
#ifdef SMP
/*
* We don't want multiple CPU's to panic at the same time, so we
- * use panic_cpu as a simple spinlock. We have to keep checking
- * panic_cpu if we are spinning in case the panic on the first
+ * use panic_thread as a simple spinlock. We have to keep checking
+ * panic_thread if we are spinning in case the panic on the first
* CPU is canceled.
*/
- if (panic_cpu != PCPU_GET(cpuid))
- while (atomic_cmpset_int(&panic_cpu, NOCPU,
- PCPU_GET(cpuid)) == 0)
- while (panic_cpu != NOCPU)
- ; /* nothing */
+ if (panic_thread != curthread)
+ while (atomic_cmpset_ptr(&panic_thread, NULL, curthread) == 0)
+ while (panic_thread != NULL)
+ cpu_spinwait();
#endif
bootopt = RB_AUTOBOOT | RB_DUMP;
@@ -538,7 +537,7 @@
/* See if the user aborted the panic, in which case we continue. */
if (panicstr == NULL) {
#ifdef SMP
- atomic_store_rel_int(&panic_cpu, NOCPU);
+ atomic_store_rel_ptr(&panic_thread, NULL);
#endif
return;
}
--
John Baldwin <jhb at FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve" = http://www.FreeBSD.org
More information about the cvs-all
mailing list