Trouble with NFSd under 6.1-Stable, any ideas?
Konstantin Belousov
kostikbel at gmail.com
Tue May 23 01:10:49 PDT 2006
On Mon, May 22, 2006 at 05:43:32PM -0400, Rong-en Fan wrote:
> On 5/14/06, Kris Kennaway <kris at obsecurity.org> wrote:
> >On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
> >>
> >> Hello All,
> >>
> >> I have been running FBSD a long while, and actually running since the
> >5.x
> >> releases on the server I am having troubles with. I basically have a
> >small
> >> network and just use NIS/NFS to link my various FBSD and Solaris machines
> >> together.
> >>
> >> This has all been running fine up till a few days ago, when all of a
> >sudden
> >> NFS came to a crawl, and CPU usage so high the box appears to freeze
> >almost.
> >> When I had 6.1-RC running all seemed well, then came the announcement
> >for the
> >> official 6.1 release, so I did the cvs updates, made world, kernel, and
> >ran
> >> mergemaster to get everything up to the 6.1 stable version.
> >>
> >> Now after doing this, something is wrong with NFS. It works, it will
> >return
> >> information and open files, just it's very very slow, and while
> >performing a
> >> request the CPU spike is astounding. A simple du of my home directory
> >can
> >> take minutes, and machine all but locks up if the request is done over
> >NFS.
> >> Here is top snip:
> >>
> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
> >COMMAND
> >> 497 root 1 4 0 1252K 780K - 2 50:42 188.48% nfsd
> >>
> >>
> >> This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM
> >on a
> >> disk array, and locally is screams, heck NFS used to scream till I
> >updated. I
> >> am not really sure what info would be useful in debugging, so won't post
> >tons
> >> of misc junk in this eMail, but if anyone has any ideas as to how best to
> >> figure out and resolve this issue it would sure be appreicated...
> >
> >Use tcpdump and related tools to find out what traffic is being sent.
> >
> >Also verify that you did not change your system configuration in any
> >way: there have been no changes to NFS since the release, so it is
> >unclear why an update would cause the problem to suddenly occur.
> >
> >Kris
>
> Hi Kris and Howard,
>
> As I posted few days ago, I have similar problems like Howard's
> (some details in the thread "6.1-RELEASE, em0 high interrupt rate
> and nfsd eats lots of cpu" on stable@). After binary searching
> the source tree, I found that
>
> RELENG_6_1, 2006.04.30.03.57 ok
> RELENG_6_1, 2006.04.30.04.00 bad
>
> The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
> With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
> the same problem occurs.
>
> Let me refresh what problems I'm seeing
>
> 1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on
> a nfs directory
> 2. on server-side, nfsd starts to eats lots of CPU
> 3. the du finishes
> 4. on server-side, nfsd still eats lots of CPU, but there is no
> nfs traffic. Wait for 5 minutes, you can still see that nfsd is
> "running" and eats lots of CPU.
>
> On FreeBSD 6.1R client, it uses UDP mount and fstab is like
> "rw,-L,nosuid,bg,nodev". On Linux cleint, it uses UDP mount and
> fstab is like "defaults,udp,hard,intr,nfsvers=3,rsize=8192,wsize=8192".
> The server's kernel conf is at
>
> http://www.rafan.org/FreeBSD/nfs/KERNEL
>
> Some related configuration files:
>
> /etc/export
> /export/dir1 host1 host2...
> /export/dir2 host1 host2...
>
> /etc/rc.conf
> nfs_server_enable="YES"
> nfs_server_flags="-u -t -n 16"
> mountd_enable="YES"
> mountd_flags="-r -l -n"
> rpc_lockd_enable="YES"
> rpc_statd_enable="YES"
> rpcbind_enable="YES"
>
> /etc/fstab:
> /dev/... /export/dir1 ufs rw,nosuid,noexec 2 2
> /dev/... /export/dir2 ufs rw,nosuid,noexec,userquota 2 2
>
> The NFS server is also using amd to mount some backup directories
> from another NFS server. the amd.conf is
>
> [global]
> browsable_dirs = yes
> map_type = file
> mount_type = nfs
> auto_dir = /nfs
> fully_qualified_hosts = no
> log_file = syslog
> nfs_proto = udp
> nfs_allow_insecure_port = no
> nfs_vers = 3
> # plock = yes
> selectors_on_default = yes
> restart_mounts = yes
>
> [/backup]
> map_options = type:=direct
> map_name = /etc/amd.direct
>
> /etc/amd.direct:
> /defaults
> opts:=rw,grpid,resvport,vers=3,proto=udp,nosuid,nodev,rsize=8192,wsize=8192
> backup type:=nfs;rhost:=nfs2;rfs:=/nfs2/${host}
>
>
> If there are any thing I can provide to help tracking this down. Please
> let me know. By the way, I tried with truss/kdump to see what happens
> when nfsd eats lot of CPUs, but in vain. They do not return anything.
>
I tried your recipe on 7-CURRENT with locally exported fs, remounted
over nfs. I did not get the behaviour your described.
Could you, please, provide the backtrace for the nfsd that
eats the CPU (from the ddb). I think it would be helpful to get several
backtraces (i.e., bt <nfsd pid>, cont, bt <nfsd pid> ...) to
see where it running.
Also, just in case, does filesystem that is exported and shows problem,
have quotas enabled ? One line of your fstab has userquotas, other does not.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060523/94c6cd1e/attachment.pgp
More information about the freebsd-stable
mailing list