NFS locking problem with RELENG_6 client on RELENG_5 server
Oliver Brandmueller
ob at e-Gitt.NET
Wed Dec 14 03:03:22 PST 2005
Hi,
I have a setup with an 5.4-STABLE (July, 10th 2005) NFS server and about
10 FreeBSD clients. Most of the clients are still running on RELENG_5,
but I recently started updating to RELENG_6. Shortly after updating the
first client I ran into a problem with a spinning rpc.lockd on the NFS
servers. While rpc.lockd in normal circumstances runs at about 0.1% to
0.7% CPU it the starts using more and more CPU (about 1% more CPU per
minute in my setup, when it's using about 20 to 25 percent I get
problems with locking). If I restart rpc.lockd on the server it starts
spinning again immediately. If I restart rpc.lockd on the RELENG_6
client everything is fine again for some time. I cannot reproduce the
behaviour by certain actions, it seems to be related to load. We have to
weekdays, where workload is high and filesystem load on the NFS server
is also high due to long running backup processes. I only saw the lockd
problem on these days ("load" means about 60 MBit/s Traffic from the
NFS clients to the server, about 30 MBit/s for the backup [which is
writing with dump to a NFS mounted partition]).
I looked through the sources and updated my RELENG_6 clients with
downgraded versions of:
src/sys/nfsclient/nfs_lock.c (1.40 now instead of 1.40.2.1)
src/sys/nfsclient/nlminfo.h (1.2 now instead of 1.2.14.1)
src/sys/sys/lockf.h (1.18 now instead of 1.18.2.1)
since these seem to be the changes from RELENG_5 on the NFS clients that
make a difference for the locking.
We had the problem about once or twice a week. Now everything is fine
for about one week (the second "high load" day is today). I'm not a
programmer and especially I can only do very limited debugging on the
prod systems (and I did not manage to produce the load in NFS and
locking on our test systems). This means: I cannot be sure 100% that
this commit is the root of the problem, but I have enough evidence to
believe so.
If someones willing and interested in debugging, I have (from the NFS
server) a few minutes of debugging output after a restart from rpc.lockd
- since it is long and I don't know for what to look exactly it's not
attached, but I can grep (or even make it available) if it's of any
help. I don't have debugging output of the NFS client rpc.lockd, though,
because I cannot let it run with debugging on all the time and
restarting the client fixed the problem :-/
Thanx,
Oliver
--
| Oliver Brandmueller | Offenbacher Str. 1 | Germany D-14197 Berlin |
| Fon +49-172-3130856 | Fax +49-172-3145027 | WWW: http://the.addict.de/ |
| Ich bin das Internet. Sowahr ich Gott helfe. |
| Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! |
More information about the freebsd-stable
mailing list