System deadlock when using mksnap_ffs

Wed Nov 12 21:16:54 PST 2008

Kostik Belousov writes:
| On Wed, Nov 12, 2008 at 07:49:28PM +0000, Tim Bishop wrote:
| > On Wed, Nov 12, 2008 at 05:58:26PM +0000, Tim Bishop wrote:
| > > I run the mksnap_ffs command to take the snapshot and some time later
| > > the system completely freezes up:
| > > 
| > > paladin# cd /u2/.snap/
| > > paladin# mksnap_ffs /u2 test.1
| > 
| > Someone (not named because they choose not to reply to the list) gave me
| > the following patch:
| > 
| > --- sys/ufs/ffs/ffs_snapshot.c.orig	Wed Mar 22 09:42:31 2006
| > +++ sys/ufs/ffs/ffs_snapshot.c	Mon Nov 20 14:59:13 2006
| > @@ -282,6 +282,8 @@ restart:
| >  		if (error)
| >  			goto out;
| >  		bawrite(nbp);
| > +		if (cg % 10 == 0)
| > +			ffs_syncvnode(vp, MNT_WAIT);
| >  	}
| >  	/*
| >  	 * Copy all the cylinder group maps. Although the
| > @@ -303,6 +305,8 @@ restart:
| >  			goto out;
| >  		error = cgaccount(cg, vp, nbp, 1);
| >  		bawrite(nbp);
| > +		if (cg % 10 == 0)
| > +			ffs_syncvnode(vp, MNT_WAIT);
| >  		if (error)
| >  			goto out;
| >  	}
| > 
| > With the description:
| > 
| > "What can happen is on a big file system it will fill up the buffer
| > cache with I/O and then run out.  When the buffer cache fills up then no
| > more disk I/O can happen :-(  When you do a sync, it flushes that out to
| > disk so things don't hang."
| > 
| > It seems to work too. But it seems more like a workaround than a fix?
| 
| It looks hackish, but in fact it is not that wrong, and I even say that
| it provides reasonable workaround.
| 
| The usual way to prevent wdrain deadlock is to issue bwillwrite() call
| before any vnode lock is taken. This is sufficient for most VFS syscalls
| that typically put dozen or less dirty buffers into delayed write
| queue.
| 
| Snapshot creation does not call bwillwrite() at all, and then does a lot
| of async writes, completely saturating buffer cache with dirty buffers.
| bwillwrite cannot be called after the vnode is locked, and just forcing
| a sync for the embrionic snapshot vnode is good enough.
| 
| The 10 counter is debatable, but debate shall be postponed until the patch
| goes into tree. I ask an anonymous submitter to commit it. Thanks !

I plan to commit it tomorrow since I sent it to Tim to test.  The 10 can 
be tuned but it has kept a bunch of machines at work up.  Glad people 
don't think it is that it is to wrong :-)  It probably could be made
a little more dynamic but I wonder if it would show any real performance
difference and might risk more bugs.

Doug A.