Processes get stuck in "ufs" state

Sun Mar 25 23:09:03 UTC 2007

Цитирую Oleg Derevenetz <oleg at vsi.ru>:

> On Wed, Mar 07, 2007 at 05:22:38AM +0300, Oleg Derevenetz wrote:
> 
> >> Sometimes (once a week approximately) I have a problem with the same
> >> symptoms described here on SMP FreeBSD 6.2-STABLE with dual AMD
> Opteron(tm)
> >> Processor 850:
> >>
> >> http://www.freebsd.org/cgi/query-pr.cgi?pr=104406&cat=
> >>
> >> Sometimes (apparently when CPU load suddenly goes up) all processes
> that
> >> interacts with disk gets stuck in "ufs" state, but in my case
> >> SIGSTOP/SIGCONT seemingly does not help.
> >
> > See developer handbook, Deadlock Debugging chapter for instruction
> what
> > information shall be gathered to debug the problem.
> 
> OK, I built kernel with debug options and will wait for stuck. By the
> way, when debug options turned on, I see this message on every 
> boot when nullfs mounting in progress:
> 
> acquiring duplicate lock of same type: "vnode interlock"
>  1st vnode interlock @ /usr/src/sys/kern/vfs_vnops.c:806
>  2nd vnode interlock @ /usr/src/sys/kern/vfs_subr.c:2040
> KDB: stack backtrace:
> kdb_backtrace(3,cfc60300,c05926d0,c05926d0,c05542c4,...) at
> kdb_backtrace+0x29
> witness_checkorder(cfd5c4dc,9,c051cf1e,7f8) at witness_checkorder+0x578
> _mtx_lock_flags(cfd5c4dc,0,c051cf1e,7f8,cfb28b90,...) at
> _mtx_lock_flags+0x78
> vrefcnt(cfd5c414) at vrefcnt+0x20
> null_checkvp(cff5eae0,c050c4a6,215) at null_checkvp+0x56
> null_lock(f02f1a68) at null_lock+0x66
> VOP_LOCK_APV(c054d540,f02f1a68) at VOP_LOCK_APV+0x87
> vn_lock(cff5eae0,1002,cfc60300,cff5eae0,cff5ed04,...) at vn_lock+0xac
> nullfs_root(cff76b90,2,f02f1ae0,cfc60300,0,8,0,c05cfca0,0,c051c79c,407)
> at nullfs_root+0x26
> vfs_domount(cfc60300,cfe3d340,cfe3d130,d,cfe3d3f0,c05817e0,0,c051c79c,2bf)
> at vfs_domount+0x975
> vfs_donmount(cfc60300,d,cfe73080,cfe73080,0,...) at vfs_donmount+0x3f9
> nmount(cfc60300,f02f1d04) at nmount+0x8b
> syscall(3b,3b,3b,bf7fe5f5,bf7feea0,...) at syscall+0x25b
> Xint0x80_syscall() at Xint0x80_syscall+0x1f
> --- syscall (378, FreeBSD ELF32, nmount), eip = 0x280bc0e7, esp =
> 0xbf7fe5bc, ebp = 0xbf7fee38 ---
> 
> This host have nullfs filesystems. Is this can be related to deadlock ?

FYI: after replacing nullfs filesystems with unionfs (using new unionfs 
implementation):

http://people.freebsd.org/~daichi/unionfs/

all deadlocks are gone. It seems to be a problem in current nullfs 
implementation, but I can't debug it properly because deadlock cases are 
relatively rare and machine that uses nullfs is heavily loaded so WITNESS and 
DEBUG options leads to unacceptable performance penalty.