amd64/161493: NFS v3 directory structure update slow
Rick Macklem
rmacklem at uoguelph.ca
Thu Oct 13 00:54:13 UTC 2011
John Baldwin wrote:
> On Tuesday, October 11, 2011 11:07:13 am George Breahna wrote:
> >
> > >Number: 161493
> > >Category: amd64
> > >Synopsis: NFS v3 directory structure update slow
> > >Confidential: no
> > >Severity: critical
> > >Priority: high
> > >Responsible: freebsd-amd64
> > >State: open
> > >Quarter:
> > >Keywords:
> > >Date-Required:
> > >Class: sw-bug
> > >Submitter-Id: current-users
> > >Arrival-Date: Tue Oct 11 15:10:07 UTC 2011
> > >Closed-Date:
> > >Last-Modified:
> > >Originator: George Breahna
> > >Release: 9.0 Beta 2
> > >Organization:
> > >Environment:
> > FreeBSD store2 9.0-BETA2 FreeBSD 9.0-BETA2 #0: Sun Sep 18 22:02:45
> > EDT 2011
> pulsar at store2.emailarray.com:/usr/obj/usr/src/sys/PULSAR amd64
> > >Description:
> > We used to run a NFS server on FreeBSD 6.2 but we built a new box
> > recently
> and installed 9.0 Beta 2 on it. The data was moved over as it serves
> as the
> back-end for a mail system. It runs NFS v3 over TCP only and all the
> NFS-
> related processes (rpcbind, mountd, lockd, etc ) run with the -h
> switch and
> bind to the local IP address.
> >
> > The NFS server exports the data to 7 NFS clients ranging from
> > FreeBSD 6.1 to
> 8.2, the majority being 8.2 The mount on the NFS clients is done
> simply with -
> o tcp,rsize=32768,wsize=32768
> >
> > Usual file operations, such as accessing files, creating
> > directories,
> removing files, chmod, chown, etc work perfectly but we noticed there
> were
> issues in removing directories that contained data. We had a strange
> error:
> >
> > rm -rf nick/
> > rm: fts_read: Input/output error
> >
> > Using 'truss' on rm revealed this:
> >
> > open("..",O_RDONLY,00) ERR#5 'Input/output error'
> >
> > After much testing and debugging we realized the problem is in the
> > NFS
> protocol. ( either server or client but we assume server since this
> used to
> work very well with FreeBSD 6.2 ). The problem appears to be that NFS
> does not
> show the '..' after modifying a directory structure. Take the
> following
> example executed on a FreeBSD 8.2 client accessing the NFS share from
> the
> 9.0B2 server:
> >
> > imap5# mkdir test1
> > imap5# cd test1
> > imap5# touch file1
> > imap5# touch file2
> > imap5# ls -la
> > ls: ..: Input/output error
> > total 4
> > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:55 .
> > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file1
> > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file2
> >
> > Notice the '..' is missing from the display. If we now try and
> > remove the
> directory 'test1' it will throw the "rm: fts_read: Input/output error"
> error.
> >
> > If we wait in between 1 minute and 5 minutes, '..' will eventually
> > appear by
> itself. During this whole time, '..' effectively exists on the NFS
> server but
> it's not displayed by any of the NFS clients.
> >
> > I can force the NFS client to show it faster by doing an ls -la from
> > the
> parent level. For example:
> >
> > imap5# mkdir test1
> > imap5# touch test1/file1
> > imap5# touch test1/file2
> > imap5# touch test1/file3
> > imap5# ls -la test1
> > total 8
> > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:59 .
> > drwx------ 10 vpopmail vchkpw 1024 Oct 11 10:59 ..
> > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file1
> > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file2
> > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file3
> > imap5# cd test1
> > imap5# ls -la
> > total 8
> > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:59 .
> > drwx------ 10 vpopmail vchkpw 1024 Oct 11 10:59 ..
> > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file1
> > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file2
> > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file3
> >
> > but if we wait 5 seconds after that display and try again:
> >
> > ls -la
> > ls: ..: Input/output error
> > total 4
> > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:59 .
> > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file1
> > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file2
> > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file3
> >
> > Again, if we wait longer ( 1-5 minutes ), the '..' will properly
> > appear in
> there.
> >
> > There are no error messages on the console or other log files. This
> > is
> reproducible 100% of the time with any FreeBSD client. Have tried
> unmounting/remounting several times without any effect. Also tried
> different
> rsize/wsize, no effect. I think there is some delay in updating the
> directory
> structure and it's causing this bug.
> >
> > Here's also some output from nfsstat on the server:
> >
> >
> > Server Info:
> > Getattr Setattr Lookup Readlink Read Write Create
> Remove
> > 114731225 20496896 254966151 133 11697392 19963641 0
> 9228861
> > Rename Link Symlink Mkdir Rmdir Readdir RdirPlus
> Access
> > 4313471 1157651 39 1955 16511932 15479669 0
> 116927742
> > Mknod Fsstat Fsinfo PathConf Commit
> > 0 4748487 48 0 14921747
> > Server Ret-Failed
> > 0
> > Server Faults
> > 0
> > Server Cache Stats:
> > Inprog Idem Non-idem Misses
> > 0 0 0 613368147
> > Server Write Gathering:
> > WriteOps WriteRPC Opsaved
> > 19963641 19963641 0
> >
> > >How-To-Repeat:
> > imap5# mkdir test1
> > imap5# cd test1
> > imap5# touch file1
> > imap5# touch file2
> > imap5# ls -la
> > ls: ..: Input/output error
> > total 4
> > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:55 .
> > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file1
> > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file2
> > >Fix:
>
> Can you try using the "old" NFS server as a test?
>
Please make sure you have the patch in r225356 in your server's
kernel sources (it went into head on Sep. 3, but I don't know if
your Sep. 11 build would have it?). It fixed a problem that would
cause lookup of ".." to fail intermittently, because a field in
struct nameidata added on Aug. 13 wasn't initialized.
You can find the one line patch here:
http://svnweb.freebsd.org/base/head/sys/fs/nfsserver/nfs_nfsdport.c?r1=224911&r2=225356
Please let us know if you have this patch and, if not, apply it
and see if the problem goes away.
Thanks, rick
More information about the freebsd-amd64
mailing list