Recurring problem: processes block accessing UFS file system
Tor Egge
Tor.Egge at cvsup.no.freebsd.org
Sat Nov 26 00:04:12 GMT 2005
> Thanks Kris, these are exactly the clues I needed. Since the deadlock
> during a snapshot is fairly easy to reproduce, I did so and collected this
> information below. "alltrace" didn't work as I expected (didn't produce a
> trace), so I traced each pid associated with a locked vnode separately.
The vnode syncing loop in ffs_sync() has some problems:
1. Softupdate processing performed after the loop has started might
trigger the need for retrying the loop. Processing of dirrem work
items can cause IN_CHANGE to be set on some inodes, causing
deadlock in ufs_inactive() later on while the file system is
suspended).
2. nvp might no longer be associated with the same mount point after
MNT_IUNLOCK(mp) has been called in the loop. This can cause the
vnode list traversal to be incomplete, with stale information in
the snapshot. Further damage can occur when background fsck uses
that stale information.
Just a few lines down from that loop is a new problem:
3. softdep_flushworklist() might not have processed all dirrem work
items associated with the file system even if both error and count
are zero. This can cause both background fsck and softupdate
processing (after file system has been resumed) to decrement the
link count of an inode, causing file system corruption or a panic.
Processing of these work items while the file system is suspended
causes a panic.
- Tor Egge
More information about the freebsd-stable
mailing list