System deadlock when using mksnap_ffs
Doug Ambrisko
ambrisko at ambrisko.com
Wed Nov 12 21:16:54 PST 2008
Kostik Belousov writes:
| On Wed, Nov 12, 2008 at 07:49:28PM +0000, Tim Bishop wrote:
| > On Wed, Nov 12, 2008 at 05:58:26PM +0000, Tim Bishop wrote:
| > > I run the mksnap_ffs command to take the snapshot and some time later
| > > the system completely freezes up:
| > >
| > > paladin# cd /u2/.snap/
| > > paladin# mksnap_ffs /u2 test.1
| >
| > Someone (not named because they choose not to reply to the list) gave me
| > the following patch:
| >
| > --- sys/ufs/ffs/ffs_snapshot.c.orig Wed Mar 22 09:42:31 2006
| > +++ sys/ufs/ffs/ffs_snapshot.c Mon Nov 20 14:59:13 2006
| > @@ -282,6 +282,8 @@ restart:
| > if (error)
| > goto out;
| > bawrite(nbp);
| > + if (cg % 10 == 0)
| > + ffs_syncvnode(vp, MNT_WAIT);
| > }
| > /*
| > * Copy all the cylinder group maps. Although the
| > @@ -303,6 +305,8 @@ restart:
| > goto out;
| > error = cgaccount(cg, vp, nbp, 1);
| > bawrite(nbp);
| > + if (cg % 10 == 0)
| > + ffs_syncvnode(vp, MNT_WAIT);
| > if (error)
| > goto out;
| > }
| >
| > With the description:
| >
| > "What can happen is on a big file system it will fill up the buffer
| > cache with I/O and then run out. When the buffer cache fills up then no
| > more disk I/O can happen :-( When you do a sync, it flushes that out to
| > disk so things don't hang."
| >
| > It seems to work too. But it seems more like a workaround than a fix?
|
| It looks hackish, but in fact it is not that wrong, and I even say that
| it provides reasonable workaround.
|
| The usual way to prevent wdrain deadlock is to issue bwillwrite() call
| before any vnode lock is taken. This is sufficient for most VFS syscalls
| that typically put dozen or less dirty buffers into delayed write
| queue.
|
| Snapshot creation does not call bwillwrite() at all, and then does a lot
| of async writes, completely saturating buffer cache with dirty buffers.
| bwillwrite cannot be called after the vnode is locked, and just forcing
| a sync for the embrionic snapshot vnode is good enough.
|
| The 10 counter is debatable, but debate shall be postponed until the patch
| goes into tree. I ask an anonymous submitter to commit it. Thanks !
I plan to commit it tomorrow since I sent it to Tim to test. The 10 can
be tuned but it has kept a bunch of machines at work up. Glad people
don't think it is that it is to wrong :-) It probably could be made
a little more dynamic but I wonder if it would show any real performance
difference and might risk more bugs.
Doug A.
More information about the freebsd-stable
mailing list