NFS deadlock on 9.2-Beta1
J David
j.david.lists at gmail.com
Thu Aug 22 16:16:10 UTC 2013
One deadlocked process cropped up overnight, but I managed to panic
the box before getting too much debugging info. :(
The process was in state T instead of D, which I guess must be a side
effect of some of the debugging code compiled in.
Here are the details I was able to capture:
db> show proc 7692
Process 7692 (httpd) at 0xfffffe0158793000:
state: NORMAL
uid: 25000 gids: 25000
parent: pid 1 at 0xfffffe00039c3950
ABI: FreeBSD ELF64
arguments: /nfsn/apps/tapache22/bin/httpd
threads: 3
100674 D newnfs 0xfffffe021cdd9848 httpd
100597 D pgrbwt 0xfffffe02fda788b8 httpd
100910 s httpd
db> show thread 100674
Thread 100674 at 0xfffffe0108c79480:
proc (pid 7692): 0xfffffe0158793000
name: httpd
stack: 0xffffff834c80f000-0xffffff834c812fff
flags: 0x2a804 pflags: 0
state: INHIBITED: {SLEEPING}
wmesg: newnfs wchan: 0xfffffe021cdd9848
priority: 96
container lock: sleepq chain (0xffffffff813c03c8)
db> tr 100674
Tracing pid 7692 tid 100674 td 0xfffffe0108c79480
sched_switch() at sched_switch+0x234/frame 0xffffff834c812360
mi_switch() at mi_switch+0x15c/frame 0xffffff834c8123a0
sleepq_switch() at sleepq_switch+0x17d/frame 0xffffff834c8123e0
sleepq_wait() at sleepq_wait+0x43/frame 0xffffff834c812410
sleeplk() at sleeplk+0x11a/frame 0xffffff834c812460
__lockmgr_args() at __lockmgr_args+0x9a9/frame 0xffffff834c812580
nfs_lock1() at nfs_lock1+0x87/frame 0xffffff834c8125b0
VOP_LOCK1_APV() at VOP_LOCK1_APV+0xbe/frame 0xffffff834c8125e0
_vn_lock() at _vn_lock+0x63/frame 0xffffff834c812640
ncl_upgrade_vnlock() at ncl_upgrade_vnlock+0x5e/frame 0xffffff834c812670
ncl_bioread() at ncl_bioread+0x195/frame 0xffffff834c8127e0
VOP_READ_APV() at VOP_READ_APV+0xd1/frame 0xffffff834c812810
vn_rdwr() at vn_rdwr+0x2bc/frame 0xffffff834c8128d0
kern_sendfile() at kern_sendfile+0xa90/frame 0xffffff834c812ac0
do_sendfile() at do_sendfile+0x92/frame 0xffffff834c812b20
amd64_syscall() at amd64_syscall+0x259/frame 0xffffff834c812c30
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff834c812c30
--- syscall (393, FreeBSD ELF64, sys_sendfile), rip = 0x801b26f4c, rsp
= 0x7ffffe9f43c8, rbp = 0x7ffffe9f4700 ---
db> show lockchain 100674
thread 100674 (pid 7692, httpd) inhibited
db> show thread 100597
Thread 100597 at 0xfffffe021c976000:
proc (pid 7692): 0xfffffe0158793000
name: httpd
stack: 0xffffff834c80a000-0xffffff834c80dfff
flags: 0x28804 pflags: 0
state: INHIBITED: {SLEEPING}
wmesg: pgrbwt wchan: 0xfffffe02fda788b8
priority: 84
container lock: sleepq chain (0xffffffff813c0148)
db> tr 100597
Tracing pid 7692 tid 100597 td 0xfffffe021c976000
sched_switch() at sched_switch+0x234/frame 0xffffff834c80d750
mi_switch() at mi_switch+0x15c/frame 0xffffff834c80d790
sleepq_switch() at sleepq_switch+0x17d/frame 0xffffff834c80d7d0
sleepq_wait() at sleepq_wait+0x43/frame 0xffffff834c80d800
_sleep() at _sleep+0x30f/frame 0xffffff834c80d890
vm_page_grab() at vm_page_grab+0x120/frame 0xffffff834c80d8d0
kern_sendfile() at kern_sendfile+0x992/frame 0xffffff834c80dac0
do_sendfile() at do_sendfile+0x92/frame 0xffffff834c80db20
amd64_syscall() at amd64_syscall+0x259/frame 0xffffff834c80dc30
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff834c80dc30
--- syscall (393, FreeBSD ELF64, sys_sendfile), rip = 0x801b26f4c, rsp
= 0x7ffffebf53c8, rbp = 0x7ffffebf5700 ---
db> show lockchain 100597
thread 100597 (pid 7692, httpd) inhibited
The "inhibited" is not something I'm familiar with and didn't match
the example output; I thought that maybe the T state was overpowering
the locks, and that maybe I should continue the system and then -CONT
the process. However, a few seconds after I issued "c" at the DDB
prompt, the system panicked in the console driver ("mtx_lock_spin:
recursed on non-recursive mutex cnputs_mtx @
/usr/src/sys/kern/kern_cons.c:500"), so I guess that's not a thing to
do. :(
Sorry my stupidity and ignorance is dragging this out. :( This is all
well outside my comfort zone, but next time I'll get it for sure.
Thanks!
More information about the freebsd-stable
mailing list