Recurring problem: processes block accessing UFS file system

Sat Nov 26 00:04:12 GMT 2005

> Thanks Kris, these are exactly the clues I needed.  Since the deadlock 
> during a snapshot is fairly easy to reproduce, I did so and collected this 
> information below.  "alltrace" didn't work as I expected (didn't produce a 
> trace), so I traced each pid associated with a locked vnode separately.

The vnode syncing loop in ffs_sync() has some problems:

  1. Softupdate processing performed after the loop has started might
     trigger the need for retrying the loop.  Processing of dirrem work
     items can cause IN_CHANGE to be set on some inodes, causing
     deadlock in ufs_inactive() later on while the file system is
     suspended).

  2. nvp might no longer be associated with the same mount point after
     MNT_IUNLOCK(mp) has been called in the loop.  This can cause the
     vnode list traversal to be incomplete, with stale information in
     the snapshot.  Further damage can occur when background fsck uses
     that stale information.

Just a few lines down from that loop is a new problem:

  3. softdep_flushworklist() might not have processed all dirrem work
     items associated with the file system even if both error and count
     are zero.  This can cause both background fsck and softupdate
     processing (after file system has been resumed) to decrement the
     link count of an inode, causing file system corruption or a panic.
     Processing of these work items while the file system is suspended
     causes a panic.

- Tor Egge