network lock manager (lockd) deadlocked in 'rpcrecv'
John Hein
jhein at timing.com
Thu Jul 9 04:27:54 UTC 2009
John Hein wrote at 15:31 -0600 on Jul 8, 2009:
> I have a home directory on FreeBSD 7.2-stable (20090705), amd64.
> It is serving up the directory over nfs (v3, tcp), and now
> I'm seeing lots of 'lockd not responding' on Fedora 10 & 11 systems.
>
> USER PID PPID SID NI %CPU %MEM VSZ RSS TT WCHAN STAT STARTED TIME COMMAND
> root 791 1 791 0 0.0 0.0 6748 1500 ?? rpcrec Ds 2:45PM 0:05.80 /usr/sbin/rpc.lockd
>
> Once lockd gets in this state, doing a test lock on a file
> from a FreeBSD box locks with 'lockd not responding', too
> (and ctrl-c and kill -9 does nothing).
>
> USER PID PPID SID NI %CPU %MEM VSZ RSS TT WCHAN STAT STARTED TIME COMMAND
> jhein 6297 3491 3491 0 0.0 0.0 1412 604 p5 nlmrcv T+ 3:18PM 0:00.00 /h/jhein/nfslocktest /nfs/locktest
>
>
> I see this on an i386 6.4-stable, too.
Also in dmesg:
NLM: failed to contact remote rpcbind, stat = 5, port = 28416
And from ddb...
Tracing command rpc.lockd pid 791 tid 100176 td 0xffffff00069dd720
sched_switch() at 0xffffffff8037df95 = sched_switch+0x1d5
mi_switch() at 0xffffffff803656fb = mi_switch+0x18b
sleepq_timedwait() at 0xffffffff80390aeb = sleepq_timedwait+0x3b
_sleep() at 0xffffffff80365cd4 = _sleep+0x324
clnt_dg_call() at 0xffffffff80504a0b = clnt_dg_call+0x4fb
nlm_get_rpc() at 0xffffffff804f3ef7 = nlm_get_rpc+0x147
nlm_host_get_rpc() at 0xffffffff804f430e = nlm_host_get_rpc+0x10e
nlm_do_lock() at 0xffffffff804f58be = nlm_do_lock+0x1ce
nlm4_lock_4_svc() at 0xffffffff804f6c91 = nlm4_lock_4_svc+0x11
nlm_prog_4() at 0xffffffff804f8098 = nlm_prog_4+0x308
svc_run() at 0xffffffff8050c1f3 = svc_run+0x293
nlm_syscall() at 0xffffffff804f675c = nlm_syscall+0x79c
syscall() at 0xffffffff805818f4 = syscall+0x1b4
Xfast_syscall() at 0xffffffff8056d35b = Xfast_syscall+0xab
--- syscall (154, FreeBSD ELF64, nlm_syscall), rip = 0x8008a91ec, rsp = 0x7fffffffed08, rbp = 0x7fffffffe
e20 ---
More information about the freebsd-net
mailing list