Deadlock in the NFS client

Fri Mar 15 01:45:37 UTC 2013

John Baldwin wrote:
> On Thursday, March 14, 2013 1:22:39 pm Konstantin Belousov wrote:
> > On Thu, Mar 14, 2013 at 10:57:13AM -0400, John Baldwin wrote:
> > > On Thursday, March 14, 2013 5:27:28 am Konstantin Belousov wrote:
> > > > On Wed, Mar 13, 2013 at 07:33:35PM -0400, Rick Macklem wrote:
> > > > > John Baldwin wrote:
> > > > > > I ran into a machine that had a deadlock among certain files
> > > > > > on a
> > > > > > given NFS
> > > > > > mount today. I'm not sure how best to resolve it, though it
> > > > > > seems like
> > > > > > perhaps there is a bug with how the pool of nfsiod threads
> > > > > > is managed.
> > > > > > Anyway, more details on the actual hang below. This was on
> > > > > > 8.x with
> > > > > > the
> > > > > > old NFS client, but I don't see anything in HEAD that would
> > > > > > fix this.
> > > > > >
> > > > > > First note that the system was idle so it had dropped down
> > > > > > to only one
> > > > > > nfsiod thread.
> > > > > >
> > > > > Hmm, I see the problem and I'm a bit surprised it doesn't bite
> > > > > more often.
> > > > > It seems to me that this snippet of code from nfs_asyncio()
> > > > > makes too
> > > > > weak an assumption:
> > > > > 	/*
> > > > > 	 * If none are free, we may already have an iod working on
> > > > > 	 this mount
> > > > > 	 * point. If so, it will process our request.
> > > > > 	 */
> > > > > 	if (!gotiod) {
> > > > > 		if (nmp->nm_bufqiods > 0) {
> > > > > 			NFS_DPF(ASYNCIO,
> > > > > 		("nfs_asyncio: %d iods are already processing mount %p\n",
> > > > > 				 nmp->nm_bufqiods, nmp));
> > > > > 			gotiod = TRUE;
> > > > > 		}
> > > > > 	}
> > > > > It assumes that, since an nfsiod thread is processing some
> > > > > buffer for the
> > > > > mount, it will become available to do this one, which isn't
> > > > > true for your
> > > > > deadlock.
> > > > >
> > > > > I think the simple fix would be to recode nfs_asyncio() so
> > > > > that
> > > > > it only returns 0 if it finds an AVAILABLE nfsiod thread that
> > > > > it
> > > > > has assigned to do the I/O, getting rid of the above. The
> > > > > problem
> > > > > with doing this is that it may result in a lot more
> > > > > synchronous I/O
> > > > > (nfs_asyncio() returns EIO, so the caller does the I/O). Maybe
> > > > > more
> > > > > synchronous I/O could be avoided by allowing nfs_asyncio() to
> > > > > create a
> > > > > new thread even if the total is above nfs_iodmax. (I think
> > > > > this would
> > > > > require the fixed array to be replaced with a linked list and
> > > > > might
> > > > > result in a large number of nfsiod threads.) Maybe just having
> > > > > a large
> > > > > nfs_iodmax would be an adequate compromise?
> > > > >
> > > > > Does having a large # of nfsiod threads cause any serious
> > > > > problem for
> > > > > most systems these days?
> > > > >
> > > > > I'd be tempted to recode nfs_asyncio() as above and then,
> > > > > instead
> > > > > of nfs_iodmin and nfs_iodmax, I'd simply have: - a fixed
> > > > > number of
> > > > > nfsiod threads (this could be a tunable, with the
> > > > > understanding that
> > > > > it should be large for good performance)
> > > > >
> > > >
> > > > I do not see how this would solve the deadlock itself. The
> > > > proposal would
> > > > only allow system to survive slightly longer after the deadlock
> > > > appeared.
> > > > And, I think that allowing the unbound amount of nfsiod threads
> > > > is also
> > > > fatal.
> > > >
> > > > The issue there is the LOR between buffer lock and vnode lock.
> > > > Buffer lock
> > > > always must come after the vnode lock. The problematic nfsiod
> > > > thread, which
> > > > locks the vnode, volatile this rule, because despite the
> > > > LK_KERNPROC
> > > > ownership of the buffer lock, it is the thread which de fact
> > > > owns the
> > > > buffer (only the thread can unlock it).
> > > >
> > > > A possible solution would be to pass LK_NOWAIT to nfs_nget()
> > > > from the
> > > > nfs_readdirplusrpc(). From my reading of the code, nfs_nget()
> > > > should
> > > > be capable of correctly handling the lock failure. And EBUSY
> > > > would
> > > > result in doit = 0, which should be fine too.
> > > >
> > > > It is possible that EBUSY should be reset to 0, though.
> > >
> > > Yes, thinking about this more, I do think the right answer is for
> > > readdirplus to do this. The only question I have is if it should
> > > do
> > > this always, or if it should do this only from the nfsiod thread.
> > > I
> > > believe you can't get this in the non-nfsiod case.
> >
> > I agree that it looks as of the workaround only needed for nfsiod
> > thread.
> > On the other hand, it is not immediately obvious how to detect that
> > the current thread is nfsio daemon. Probably a thread flag should be
> > set.
> 
> OTOH, updating the attributes from readdir+ is only an optimization
> anyway, so
> just having it always do LK_NOWAIT is probably ok (and simple).
> Currently I'm
> trying to develop a test case to provoke this so I can test the fix,
> but no
> luck on that yet.
> 
> --
> John Baldwin
When I commented out the readahead stuff for VDIR (patch attached), I
got much better performance for a:
# time ls -lR > /dev/null
run at the root of the nfs mount point (-o nfsv3,rdirplus).

However, I got about the same performance for the previous patch.
(The difference is that this one doesn't play with the buffer cache
 for the read-ahead attempt.)

My test environment is crappy (a laptop mounting itself), so it may
just be a side effect of this.

Maybe you could try this?

rick

-------------- next part --------------
A non-text attachment was scrubbed...
Name: nfsiod2.patch
Type: text/x-patch
Size: 658 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20130314/63750213/attachment.bin>