NFS Locking Issue
Robert Watson
rwatson at FreeBSD.org
Wed Jul 5 22:49:14 UTC 2006
On Wed, 5 Jul 2006, Francisco Reyes wrote:
>> can you trigger it using work on just one client against a server, without
>> client<->client interactions? This makes tracking and reproduction a lot
>> easier
>
> Personally I am experiencing two problems.
> 1- NFS clients freeze/hang if the server goes away.
> We have clients with several mounts so if one of the servers dies then the
> entire operation of the client is put in jeopardy.
>
> This I can reproduce every single time with a 6.X client.. with both a 5.X
> and a 6.X server.
>
> "umount -f" hangs too.
The problems you are experiencing are almost certainly not related to
rpc.lockd, rather, bugs in the NFS client.
Let's just look at the normal use hang for now, and revisit umount -f after
that.
>> as multi-client test cases are really tricky!
>
> The second case only happens under heavy load and restarting nfsd makes it
> go away. Basically 'b' column in vmstat goes high and the performnance of
> the machine falls to the floor.
>
> Going to try
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneld
> ebug-deadlocks.html
>
> And reading up on how to debug with DDB. Have another user who volunteered
> to give me some pointers.. so will try that.. so I am able to actually
> produce more helpfull info.
If you can get into DDB when the hang has occurred, output via serial console
for the following commands would be very helpful:
show pcpu
show allpcpu
ps
trace
traceall
show locks
show alllocks
show uma
show malloc
show lockedvnods
Note that the last two will only work if you compile WITNESS in -- WITNESS
significantly changes kernel timing, so you may find it closes whatever race
you're running into. If you can reproduce the problem with WITNESS and
INVARIANTS, that would be very useful. The above output will hopefully tell
us the basic state of the system with respect to processes, threads, locking,
and so on, and may help us track things down. For the above, you definitely
want a serial console as it will be quite a bit of output.
Also, can you send the output of the 'mount' command from the un-hung state?
I notice a lot of threads stuck in 'ufs'.
Finally, during the above, if you could disable background file system
checking by placing the following in /etc/rc.conf:
background_fsck="NO"
And boot to single user mode, doing a full fsck -p before booting up, in order
to make sure the file system is in a good state before beginning.
Robert N M Watson
Computer Laboratory
University of Cambridge
More information about the freebsd-stable
mailing list