smp_tlb_shootdown loop (Re: spin lock smp rendezvous held by
0xffffff01250a7980 for > 5 seconds)
Kris Kennaway
kris at obsecurity.org
Sun Nov 27 03:20:32 GMT 2005
On Sat, Nov 26, 2005 at 06:22:45PM -0500, Kris Kennaway wrote:
> On Thu, Nov 24, 2005 at 06:26:16PM -0500, Kris Kennaway wrote:
> > I got this on a quad amd64 machine running 6.0-STABLE. At the time it
> > was running 21 simultaneous tar extractions onto a sync-mounted md.
> >
> > panic() at panic+0x1e6
> > _mtx_lock_spin() at _mtx_lock_spin+0xad
> > pmap_invalidate_range() at pmap_invalidate_range+0xb3
> > pmap_qremove() at pmap_qremove+0x53
> > vfs_vmio_release() at vfs_vmio_release+0x1e0
> > getnewbuf() at getnewbuf+0x368
> > getblk() at getblk+0x3d9
> > ffs_balloc_ufs1() at ffs_balloc_ufs1+0x662
> > ffs_write() at ffs_write+0x31b
> > VOP_WRITE_APV() at VOP_WRITE_APV+0xed
> > vn_write() at vn_write+0x228
> > dofilewrite() at dofilewrite+0x90
> > kern_writev() at kern_writev+0x54
> > write() at write+0x4b
Another CPU is here:
smp_tlb_shootdown() at smp_tlb_shootdown+0x40
smp_invlpg_range() at smp_invlpg_range+0x1e
pmap_invalidate_range() at pmap_invalidate_range+0xf9
pmap_qenter() at pmap_qenter+0x64
allocbuf() at allocbuf+0x9a0
getblk() at getblk+0x52d
ffs_balloc_ufs1() at ffs_balloc_ufs1+0x662
ffs_write() at ffs_write+0x31b
VOP_WRITE_APV() at VOP_WRITE_APV+0xed
vn_write() at vn_write+0x228
dofilewrite() at dofilewrite+0x90
kern_writev() at kern_writev+0x54
write() at write+0x4b
syscall() at syscall+0x404
Xfast_syscall() at Xfast_syscall+0xa8
--- syscall (4, FreeBSD ELF64, write), rip = 0x80070ea6c, rsp = 0x7fffffffe6a8, rbp = 0x52a800 ---
-
It is looping:
smp_tlb_shootdown+0x40: repe nop
smp_tlb_shootdown+0x42: movl 0x21c4f8,%eax
smp_tlb_shootdown+0x48: cmpl %ebx,%eax
smp_tlb_shootdown+0x4a: jb smp_tlb_shootdown+0x40
smp_tlb_shootdown(u_int vector, vm_offset_t addr1, vm_offset_t addr2)
{
u_int ncpu;
ncpu = mp_ncpus - 1; /* does not shootdown self */
if (ncpu < 1)
return; /* no other cpus */
mtx_assert(&smp_ipi_mtx, MA_OWNED);
smp_tlb_addr1 = addr1;
smp_tlb_addr2 = addr2;
atomic_store_rel_int(&smp_tlb_wait, 0);
ipi_all_but_self(vector);
while (smp_tlb_wait < ncpu)
ia32_pause();
}
which seems to be the while loop at the end.
db> x/x smp_tlb_wait
smp_tlb_wait: 1
db> x mp_ncpus
mp_ncpus: 4
So it looks like it's stuck waiting for the tlb shootdown on the other
processors. However, the 3 other CPUs are all in the same place:
> _mtx_lock_spin() at _mtx_lock_spin+0x6b
> getit() at getit+0x6f
> DELAY() at DELAY+0x44
> _mtx_lock_spin() at _mtx_lock_spin+0x6b
> pmap_invalidate_range() at pmap_invalidate_range+0xb3
> pmap_qremove() at pmap_qremove+0x53
> vfs_vmio_release() at vfs_vmio_release+0x1e0
> getnewbuf() at getnewbuf+0x368
> getblk() at getblk+0x3d9
> ffs_balloc_ufs1() at ffs_balloc_ufs1+0x662
> ffs_write() at ffs_write+0x31b
> VOP_WRITE_APV() at VOP_WRITE_APV+0xed
> vn_write() at vn_write+0x228
> dofilewrite() at dofilewrite+0x90
> kern_writev() at kern_writev+0x54
> write() at write+0x4b
> syscall() at syscall+0x404
> Xfast_syscall() at Xfast_syscall+0xa8
> --- syscall (4, FreeBSD ELF64, write), rip = 0x80070ea6c, rsp = 0x7fffffffe6a8, rbp = 0x52ae00 ---
>
> i.e. the first _mtx_lock_spin() tried to acquire the ipi lock and
> spun, which called DELAY and getit, which tried to acquire the clock
> lock:
>
> mtx_lock_spin(&clock_lock);
>
> which *also* spun, and called DELAY...and at that point things went to
> hell and it recursed until it blew out the stack.
So why aren't they processing the IPI? Was the IPI lost somehow?
Kris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-amd64/attachments/20051126/f4886543/attachment.bin
More information about the freebsd-amd64
mailing list