[Bug 258208] [zfs] locks up when using rollback or destroy on both 13.0-RELEASE & sysutils/openzfs port
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 28 Oct 2021 09:43:19 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=258208 --- Comment #19 from Andriy Gapon <avg@FreeBSD.org> --- (In reply to Mark Johnston from comment #18) Now that you found this I also belatedly recalled that I ran into the same kind of a problem... I think that getting the teardown lock before the vnode locks would help with the problem like that. Unfortunately, that's not possible (?) to arrange via VOP_LOCK because of the interlock which is not sleepable. I was leaning towards the idea that ZFS should somehow hook the teardown lock into vn_start_write() or something like that. But for ZFS we would also need vn_start_read() as well. And it's not easy to sprinkle such calls in all places where they are needed... Just in case, here is the 3-way deadlock that I saw: (kgdb) tid 102522 (kgdb) bt #0 sched_switch (td=0xfffff802abd53580, newtd=0xfffff80008d16580, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2005 #1 0xffffffff806a5091 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:438 #2 0xffffffff806e252c in sleepq_switch (wchan=0xfffffe000ec025c8, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:611 #3 0xffffffff806e23d2 in sleepq_wait (wchan=0xfffffe000ec025c8, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:690 #4 0xffffffff8063ffa3 in _cv_wait (cvp=0xfffffe000ec025c8, lock=<optimized out>) at /usr/src/sys/kern/kern_condvar.c:144 #5 0xffffffff8035bc6b in rrw_enter_read_impl (rrl=0xfffffe000ec025a8, prio=0, tag=0xfffff8065044cb10, try=0) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c:201 #6 0xffffffff8035bbb3 in rrw_enter_read (rrl=<unavailable>, tag=<unavailable>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c:223 #7 0xffffffff8035c2ff in rrm_enter_read (rrl=<optimized out>, tag=<unavailable>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c:429 #8 0xffffffff8035c2b1 in rrm_enter (rrl=<unavailable>, rw=<optimized out>, tag=<unavailable>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c:405 #9 0xffffffff803b81db in zfs_freebsd_lock (ap=0xfffffe090e81d4f8) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6010 #10 0xffffffff8095608d in VOP_LOCK1_APV (vop=<optimized out>, a=0xfffffe090e81d4f8) at vnode_if.c:2087 #11 0xffffffff8075f8ab in VOP_LOCK1 (vp=<optimized out>, flags=<optimized out>, file=<unavailable>, line=<unavailable>) at ./vnode_if.h:859 #12 0xffffffff8075e755 in _vn_lock (vp=<optimized out>, flags=524544, file=0xffffffff80acfa01 "/usr/src/sys/kern/vfs_subr.c", line=2768) at /usr/src/sys/kern/vfs_vnops.c:1555 #13 0xffffffff8074ef5f in vputx (vp=0xfffff8065044cb10, func=1) at /usr/src/sys/kern/vfs_subr.c:2768 #14 0xffffffff8074edbe in vrele (vp=<unavailable>) at /usr/src/sys/kern/vfs_subr.c:2804 #15 0xffffffff8073a901 in vn_vptocnp (vp=0xfffffe090e81d628, cred=0xfffff800234fe000, buf=<optimized out>, buflen=<optimized out>) at /usr/src/sys/kern/vfs_cache.c:2232 #16 0xffffffff8073a384 in vn_fullpath1 (td=0xfffff802abd53580, vp=0xfffff806cd2063b0, rdir=0xfffff8001bd8bb10, buf=0xfffff801dec4f800 "\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336"..., retbuf=0xfffffe090e81d770, buflen=1007) at /usr/src/sys/kern/vfs_cache.c:2335 #17 0xffffffff8073a62f in vn_fullpath (td=0xfffff802abd53580, vn=0xfffff8060b24cce8, retbuf=0xfffffe090e81d770, freebuf=0xfffffe090e81d778) at /usr/src/sys/kern/vfs_cache.c:2164 #18 0xffffffff80760af4 in vn_fill_kinfo_vnode (vp=0xfffff8060b24cce8, kif=0xfffff80232b49018) at /usr/src/sys/kern/vfs_vnops.c:2353 #19 0xffffffff8064f27c in export_vnode_to_kinfo (vp=0xfffff8060b24cce8, fd=-1, fflags=1, kif=0xfffff80232b49018, flags=1) at /usr/src/sys/kern/kern_descrip.c:3408 #20 0xffffffff8064e9b4 in export_vnode_to_sb (vp=0xfffff8060b24cce8, fd=-1, fflags=1, efbuf=0xfffff80232b49000) at /usr/src/sys/kern/kern_descrip.c:3474 #21 0xffffffff8064e84c in kern_proc_filedesc_out (p=<optimized out>, sb=0xfffffe090e81d888, maxlen=-1, flags=1) at /usr/src/sys/kern/kern_descrip.c:3537 #22 0xffffffff8064f87d in sysctl_kern_proc_filedesc (oidp=<optimized out>, arg1=<optimized out>, arg2=<optimized out>, req=<optimized out>) at /usr/src/sys/kern/kern_descrip.c:3597 #23 0xffffffff806a7d6f in sysctl_root_handler_locked (oid=0xffffffff80d93760 <sysctl___kern_proc_filedesc>, arg1=0xfffffe090e81da9c, arg2=1, req=0xfffffe090e81d9d0, tracker=0xfffffe090e81d940) at /usr/src/sys/kern/kern_sysctl.c:165 #24 0xffffffff806a7545 in sysctl_root (oidp=<optimized out>, arg1=<unavailable>, arg2=1, req=<optimized out>) at /usr/src/sys/kern/kern_sysctl.c:2027 #25 0xffffffff806a7a38 in userland_sysctl (td=<optimized out>, name=0xfffffe090e81da90, namelen=4, old=0x0, oldlenp=<optimized out>, inkernel=<optimized out>, new=<optimized out>, newlen=<optimized out>, retval=<unavailable>, flags=0) at /usr/src/sys/kern/kern_sysctl.c:2122 #26 0xffffffff806a78bf in sys___sysctl (td=0xfffff802abd53580, uap=0xfffff802abd53930) at /usr/src/sys/kern/kern_sysctl.c:2057 #27 0xffffffff808fdecb in syscallenter (td=0xfffff802abd53580) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:132 #28 0xffffffff808fda87 in amd64_syscall (td=0xfffff802abd53580, traced=0) at /usr/src/sys/amd64/amd64/trap.c:915 (kgdb) p rrl->rr_writer->td_tid $2 = 100351 (kgdb) tid 100351 (kgdb) bt #0 sched_switch (td=0xfffff80025288000, newtd=0xfffff806ff236580, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2005 #1 0xffffffff806a5091 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:438 #2 0xffffffff806e252c in sleepq_switch (wchan=0xfffffe000ec032d8, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:611 #3 0xffffffff806e23d2 in sleepq_wait (wchan=0xfffffe000ec032d8, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:690 #4 0xffffffff8063ffa3 in _cv_wait (cvp=0xfffffe000ec032d8, lock=<optimized out>) at /usr/src/sys/kern/kern_condvar.c:144 #5 0xffffffff8035bde5 in rrw_enter_write (rrl=0xfffffe000ec032b8) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c:253 #6 0xffffffff8035c329 in rrm_enter_write (rrl=0xfffffe000ec020e8) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c:438 #7 0xffffffff8035c2b8 in rrm_enter (rrl=<unavailable>, rw=<unavailable>, tag=<unavailable>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c:407 #8 0xffffffff803b364b in zfs_mount (vfsp=0xfffff80043895000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1715 #9 0xffffffff80747c99 in vfs_domount_update (td=<optimized out>, vp=0xfffff800438b3588, fsflags=268505368, optlist=0xfffffe090e404a80) at /usr/src/sys/kern/vfs_mount.c:976 #10 0xffffffff807454fb in vfs_domount (td=0xfffff80025288000, fstype=<optimized out>, fspath=0xfffff804fe9e41a0 "/usr/obj", fsflags=<optimized out>, optlist=0xfffffe090e404a80) at /usr/src/sys/kern/vfs_mount.c:1132 #11 0xffffffff80744c52 in vfs_donmount (td=0xfffff80025288000, fsflags=<optimized out>, fsoptions=0xfffff8059de54900) at /usr/src/sys/kern/vfs_mount.c:687 #12 0xffffffff80744608 in sys_nmount (td=0xfffff80025288000, uap=<optimized out>) at /usr/src/sys/kern/vfs_mount.c:421 #13 0xffffffff808fdecb in syscallenter (td=0xfffff80025288000) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:132 #14 0xffffffff808fda87 in amd64_syscall (td=0xfffff80025288000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:915 (kgdb) tid 102044 (kgdb) bt #0 sched_switch (td=0xfffff806f0cb0580, newtd=0xfffff80008d17000, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2005 #1 0xffffffff806a5091 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:438 #2 0xffffffff806e252c in sleepq_switch (wchan=0xfffff8065044cb78, pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:611 #3 0xffffffff806e23d2 in sleepq_wait (wchan=0xfffff8065044cb78, pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:690 #4 0xffffffff8067311f in sleeplk (lk=0xfffff8065044cb78, flags=2121728, ilk=<optimized out>, wmesg=0xffffffff80aae927 "zfs", pri=96, timo=51, queue=<optimized out>) at /usr/src/sys/kern/kern_lock.c:288 #5 0xffffffff80671c05 in __lockmgr_args (lk=0xfffff8065044cb78, flags=2121728, ilk=0xfffff8065044cba8, wmesg=<optimized out>, pri=<optimized out>, timo=<optimized out>, file=<optimized out>, line=<optimized out>) at /usr/src/sys/kern/kern_lock.c:873 #6 0xffffffff806711c5 in lockmgr_lock_fast_path (lk=0xfffff8065044cb78, flags=2121728, ilk=0xfffff8065044cba8, file=0xffffffff80acfa01 "/usr/src/sys/kern/vfs_subr.c", line=2606) at /usr/src/sys/kern/kern_lock.c:605 #7 0xffffffff8073d939 in vop_stdlock (ap=<optimized out>) at /usr/src/sys/kern/vfs_default.c:517 #8 0xffffffff803b81e8 in zfs_freebsd_lock (ap=0xfffffe090e8a9588) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:5987 #9 0xffffffff8095608d in VOP_LOCK1_APV (vop=<optimized out>, a=0xfffffe090e8a9588) at vnode_if.c:2087 #10 0xffffffff8075f8ab in VOP_LOCK1 (vp=<optimized out>, flags=<optimized out>, file=<unavailable>, line=<unavailable>) at ./vnode_if.h:859 #11 0xffffffff8075e755 in _vn_lock (vp=<optimized out>, flags=2121728, file=0xffffffff80acfa01 "/usr/src/sys/kern/vfs_subr.c", line=2606) at /usr/src/sys/kern/vfs_vnops.c:1555 #12 0xffffffff8074e5af in vget (vp=0xfffff8065044cb10, flags=<optimized out>, td=0xfffff806f0cb0580) at /usr/src/sys/kern/vfs_subr.c:2606 #13 0xffffffff8073719c in cache_lookup (dvp=0xfffff806cd2063b0, vpp=0xfffffe090e8a9a18, cnp=0xfffffe090e8a9a40, tsp=0x0, ticksp=<optimized out>) at /usr/src/sys/kern/vfs_cache.c:1321 #14 0xffffffff8073a0a4 in vfs_cache_lookup (ap=<optimized out>) at /usr/src/sys/kern/vfs_cache.c:2067 #15 0xffffffff803b8391 in zfs_cache_lookup (ap=<unavailable>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4956 #16 0xffffffff8095275a in VOP_LOOKUP_APV (vop=<optimized out>, a=0xfffffe090e8a9720) at vnode_if.c:127 #17 0xffffffff80743779 in VOP_LOOKUP (dvp=<optimized out>, vpp=<optimized out>, cnp=<unavailable>) at ./vnode_if.h:54 #18 0xffffffff80742bce in lookup (ndp=0xfffffe090e8a99b8) at /usr/src/sys/kern/vfs_lookup.c:891 #19 0xffffffff807424a7 in namei (ndp=0xfffffe090e8a99b8) at /usr/src/sys/kern/vfs_lookup.c:453 #20 0xffffffff8075e046 in vn_open_cred (ndp=0xfffffe090e8a99b8, flagp=0xfffffe090e8a9ac4, cmode=0, vn_open_flags=0, cred=0xfffff800234fe000, fp=0xfffff8072ca41b90) at /usr/src/sys/kern/vfs_vnops.c:277 #21 0xffffffff8075de4f in vn_open (ndp=<unavailable>, flagp=<unavailable>, cmode=<unavailable>, fp=<unavailable>) at /usr/src/sys/kern/vfs_vnops.c:180 #22 0xffffffff80757b2f in kern_openat (td=0xfffff806f0cb0580, fd=-100, path=0x800de1a80 <error: Cannot access memory at address 0x800de1a80>, pathseg=UIO_USERSPACE, flags=1, mode=<optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:1082 #23 0xffffffff80757d2b in sys_openat (td=0xfffff806f0cb0580, uap=0xfffff806f0cb0930) at /usr/src/sys/kern/vfs_syscalls.c:1030 #24 0xffffffff808fdecb in syscallenter (td=0xfffff806f0cb0580) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:132 #25 0xffffffff808fda87 in amd64_syscall (td=0xfffff806f0cb0580, traced=0) at /usr/src/sys/amd64/amd64/trap.c:915 Summary: 1. Thread 102044 is doing a lookup, dvp must be already locked and thus the filesystem is read-locked The thread is blocked on the child vp lock. 2. Thread 100351 is doing mount -u and wants to write-lock the filesystem, blocked by thread 102044. 3. Thread 102522 does vrele on the child vp and needs to lock it. The thread has already got the vnode lock is blocked by thread 100351. -- You are receiving this mail because: You are the assignee for the bug.