NFS-Locking problem with 6.4/7.1-RELEASE
Matthias Schuendehuette
msch at snafu.de
Wed Jan 21 11:49:27 PST 2009
Hi,
one of our FreeBSD-Servers is acting as NFS-Server for $HOME for
approx. 50 HP-UX Workstations, since the WS itself and the disks in
there become quite old in the meantime.
That works quite good with FreeBSD 6.3-RELEASE-pxx but doesn't work
with 6.4/7.1 any more.
I looked with 'wireshark' on the problem and it seems to be a locking
problen, probably related to PR 'kern/130628', but I'm not sure.
Here what I know so far:
Server-OS: FreeBSD 6.4-RELEASE/7.1-RELEASE (same problems)
Workstation-OS: HP-UX 11iv1 (11.11)
NFS-Version: V3/tcp or V3/udp (NFS-V2 works!)
I found no records of the problem on the client side (HP-UX) whereas
on FreeBSD 'rpc.lockd -d 3'
produces the following entries in /var/log/messages:
Jan 21 12:07:33 bsd1dw kernel: NLM: new host hp13 (sysid 5)
Jan 21 12:07:33 bsd1dw kernel: nlm_do_cancel(): caller_name = hp13
(sysid = 5)
Jan 21 12:07:53 bsd1dw kernel: nlm_do_cancel(): caller_name = hp13
(sysid = 5)
Jan 21 12:08:13 bsd1dw kernel: nlm_do_cancel(): caller_name = hp13
(sysid = 5)
Jan 21 12:08:32 bsd1dw kernel: nlm_do_lock(): caller_name = hp13
(sysid = 5)
Jan 21 12:08:33 bsd1dw kernel: nlm_do_cancel(): caller_name = hp13
(sysid = 5)
Jan 21 12:08:43 bsd1dw kernel: nlm_do_lock(): caller_name = hp13
(sysid = 5)
Jan 21 12:08:53 bsd1dw kernel: nlm_do_cancel(): caller_name = hp13
(sysid = 5)
Jan 21 12:09:03 bsd1dw kernel: nlm_do_lock(): caller_name = hp13
(sysid = 5)
Jan 21 12:09:13 bsd1dw kernel: nlm_do_cancel(): caller_name = hp13
(sysid = 5)
Jan 21 12:09:13 bsd1dw kernel: nlm_do_lock(): caller_name = hp13
(sysid = 5)
Jan 21 12:09:23 bsd1dw kernel: nlm_do_lock(): caller_name = hp13
(sysid = 5)
Jan 21 12:09:33 bsd1dw kernel: nlm_do_cancel(): caller_name = hp13
(sysid = 5)
What happens is as follows:
When logging in to an account with the home directory on the NFS-
Server, the shell
reads '.profile' and the tries to get a lock on '.sh_history'. From a
FreeBSD 6.3 server the shell gets the lock whereas a 6.4/7.1 server
replies with "V4 LOCK_RES Call NLM_FAILED".
Of course the HP-UX shell assumes the file is already locked, waits
some time and tries again. This game leads to a complete lock of the
account... :-( This does not happen if commandline-history is disabled
but nontheless it's an error anyway.
I have recorded the network traffic for a NFSv2 session, a NFSv3/tcp
session with a 6.3 server and a NFSv3/tcp session with a 7-STABLE
server. If the wireshark dumps are of interest beyond of what I
described here they are available on request.
I hope my informations help those who are able to fix it...
Matthew
--
Ciao/BSD - Matthias
Matthias Schuendehuette <msch [at] snafu.de>, Berlin (Germany)
More information about the freebsd-net
mailing list