Re: ZFS deadlock in 14

From: Cy Schubert <Cy.Schubert_at_cschubert.com>
Date: Sat, 12 Aug 2023 17:08:53 UTC
On August 12, 2023 7:11:10 AM PDT, "Dag-Erling Smørgrav" <des@FreeBSD.org> wrote:
>Dag-Erling Smørgrav <des@FreeBSD.org> writes:
>> At some point between 42d088299c (4 May) and f0c9703301 (26 June), a
>> deadlock was introduced in ZFS.
>
>Trying to narrow this range down, I did not get a deadlock with
>4e8d558c9d1c (10 June) but I did with b7198dcfc039 (16 June), albeit
>after building ~1800 packages.  This is surprising since we have a
>report of this or a very similar deadlock occurring with a kernel from 8
>June (https://bugs.freebsd.org/271945).  Perhaps I should try
>4e8d558c9d1c again.
>
>Here's the complete kgdb session showing, once again, a zfs rollback
>stuck waiting for the txg to sync:
>
>    Reading symbols from /boot/GENERIC/kernel...
>    Reading symbols from /usr/lib/debug//boot/GENERIC/kernel.debug...
>    
>    Unread portion of the kernel message buffer:
>    panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffffe03567a01e0 (sh), blocked for 180242 ticks
>    
>    cpuid = 17
>    time = 1691802362
>    KDB: stack backtrace:
>    db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe021507ce00
>    vpanic() at vpanic+0x150/frame 0xfffffe021507ce50
>    panic() at panic+0x43/frame 0xfffffe021507ceb0
>    deadlkres() at deadlkres+0x350/frame 0xfffffe021507cef0
>    fork_exit() at fork_exit+0x80/frame 0xfffffe021507cf30
>    fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe021507cf30
>    --- trap 0xdeadc0de, rip = 0xdeadc0dedeadc0de, rsp = 0xdeadc0dedeadc0de, rbp = 0xdeadc0dedeadc0de ---
>    KDB: enter: panic
>    
>    __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
>    59		__asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
>    (kgdb) tid 0xfffffe03567a01e0
>    (kgdb) bt
>    #0  sched_switch (td=td@entry=0xfffffe03567a01e0, flags=flags@entry=259) at /usr/src/sys/kern/sched_ule.c:2299
>    #1  0xffffffff80b5fbd4 in mi_switch (flags=flags@entry=259) at /usr/src/sys/kern/kern_synch.c:550
>    #2  0xffffffff80bb1257 in sleepq_switch (wchan=0xfffff80b139e4770, wchan@entry=0xffffffff8113878f, pri=pri@entry=64) at /usr/src/sys/kern/subr_sleepqueue.c:609
>    #3  0xffffffff80bb112e in sleepq_wait (wchan=<unavailable>, pri=<unavailable>) at /usr/src/sys/kern/subr_sleepqueue.c:660
>    #4  0xffffffff80b21d6f in sleeplk (lk=lk@entry=0xfffff80b139e4770, flags=flags@entry=2122752, ilk=ilk@entry=0x0, wmesg=wmesg@entry=0xffffffff8113878f "tmpfs", pri=<optimized out>, pri@entry=64, timo=timo@entry=6, queue=1) at /usr/src/sys/kern/kern_lock.c:310
>    #5  0xffffffff80b1fd9f in lockmgr_slock_hard (lk=0xfffff80b139e4770, flags=2122752, ilk=<optimized out>, file=0xffffffff81296919 "/usr/src/sys/kern/vfs_lookup.c", line=1012, lwa=0x0) at /usr/src/sys/kern/kern_lock.c:705
>    #6  0xffffffff80c5e444 in VOP_LOCK1 (vp=0xfffff80b139e4700, flags=2106368, file=0xffffffff81296919 "/usr/src/sys/kern/vfs_lookup.c", line=1012) at ./vnode_if.h:1120
>    #7  _vn_lock (vp=0xfffff80b139e4700, flags=2106368, file=<unavailable>, line=<unavailable>) at /usr/src/sys/kern/vfs_vnops.c:1808
>    #8  0xffffffff80c36eae in vfs_lookup (ndp=ndp@entry=0xfffffe03c63a6bd8) at /usr/src/sys/kern/vfs_lookup.c:1010
>    #9  0xffffffff80c36291 in namei (ndp=ndp@entry=0xfffffe03c63a6bd8) at /usr/src/sys/kern/vfs_lookup.c:689
>    #10 0xffffffff80c5681f in kern_statat (td=0xfffffe03567a01e0, td@entry=<unavailable>, flag=<optimized out>, fd=-100, path=0x1032a8685a15 <error: Cannot access memory at address 0x1032a8685a15>, pathseg=pathseg@entry=UIO_USERSPACE, sbp=sbp@entry=0xfffffe03c63a6d18)
>        at /usr/src/sys/kern/vfs_syscalls.c:2441
>    #11 0xffffffff80c56f17 in sys_fstatat (td=<unavailable>, td@entry=<error reading variable: value is not available>, uap=0xfffffe03567a05e0, uap@entry=<error reading variable: value is not available>) at /usr/src/sys/kern/vfs_syscalls.c:2419
>    #12 0xffffffff8104d8e0 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:190
>    #13 amd64_syscall (td=0xfffffe03567a01e0, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1199
>    #14 <signal handler called>
>    #15 0x0000103acaf3b03a in ?? ()
>    Backtrace stopped: Cannot access memory at address 0x103ac929af28
>    (kgdb) f 5
>    #5  0xffffffff80b1fd9f in lockmgr_slock_hard (lk=0xfffff80b139e4770, flags=2122752, ilk=<optimized out>, file=0xffffffff81296919 "/usr/src/sys/kern/vfs_lookup.c", line=1012, lwa=0x0) at /usr/src/sys/kern/kern_lock.c:705
>    705			error = sleeplk(lk, flags, ilk, iwmesg, ipri, itimo,
>    (kgdb) p (struct thread *)(lk->lk_lock & ~0x1f)
>    $1 = (struct thread *) 0xfffffe02ddae0e40
>    (kgdb) tid 0xfffffe02ddae0e40
>    (kgdb) bt
>    #0  sched_switch (td=td@entry=0xfffffe02ddae0e40, flags=flags@entry=259) at /usr/src/sys/kern/sched_ule.c:2299
>    #1  0xffffffff80b5fbd4 in mi_switch (flags=flags@entry=259) at /usr/src/sys/kern/kern_synch.c:550
>    #2  0xffffffff80bb1257 in sleepq_switch (wchan=wchan@entry=0xfffff81ab3c81154, pri=87, pri@entry=-1278734048) at /usr/src/sys/kern/subr_sleepqueue.c:609
>    #3  0xffffffff80bb112e in sleepq_wait (wchan=<unavailable>, pri=<unavailable>) at /usr/src/sys/kern/subr_sleepqueue.c:660
>    #4  0xffffffff80b5f11d in _sleep (ident=0xfffff81ab3c81154, lock=0xfffff81ab3c81120, priority=87, wmesg=0xffffffff82239fba "zfs teardown inactive", sbt=0, pr=0, flags=256) at /usr/src/sys/kern/kern_synch.c:225
>    #5  0xffffffff80b4b640 in rms_rlock_fallback (rms=rms@entry=0xfffff81ab3c81120) at /usr/src/sys/kern/kern_rmlock.c:1015
>    #6  0xffffffff80b4b51c in rms_rlock (rms=<unavailable>, rms@entry=0xfffff81ab3c81120) at /usr/src/sys/kern/kern_rmlock.c:1036
>    #7  0xffffffff81faff5c in zfs_freebsd_reclaim (ap=<optimized out>) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:5164
>    #8  0xffffffff811215e4 in VOP_RECLAIM_APV (vop=0xffffffff822e61a0 <zfs_vnodeops>, a=a@entry=0xfffffe02fb2118a0) at vnode_if.c:2180
>    #9  0xffffffff80c47d54 in VOP_RECLAIM (vp=0xfffff80912340000) at ./vnode_if.h:1084
>    #10 vgonel (vp=vp@entry=0xfffff80912340000) at /usr/src/sys/kern/vfs_subr.c:4143
>    #11 0xffffffff80c436f2 in vtryrecycle (vp=0xfffff80912340000) at /usr/src/sys/kern/vfs_subr.c:1693
>    #12 vnlru_free_impl (count=count@entry=1, mnt_op=mnt_op@entry=0x0, mvp=0xfffff8010945da00) at /usr/src/sys/kern/vfs_subr.c:1344
>    #13 0xffffffff80c4dd83 in vnlru_free_locked (count=1) at /usr/src/sys/kern/vfs_subr.c:1357
>    #14 vn_alloc_hard (mp=mp@entry=0xfffffe0314140000) at /usr/src/sys/kern/vfs_subr.c:1744
>    #15 0xffffffff80c43db1 in vn_alloc (mp=0xfffffe0314140000) at /usr/src/sys/amd64/include/atomic.h:375
>    #16 getnewvnode (tag=0xffffffff8113878f "tmpfs", mp=0xfffffe0314140000, vops=0xffffffff816b7a70 <tmpfs_vnodeop_entries>, vpp=vpp@entry=0xfffffe02fb211a30) at /usr/src/sys/kern/vfs_subr.c:1810
>    #17 0xffffffff80a7b27c in tmpfs_alloc_vp (mp=0xfffffe0314140000, node=node@entry=0xfffff81924deabc8, lkflag=lkflag@entry=524288, vpp=0xfffffe02fb211cf0) at /usr/src/sys/fs/tmpfs/tmpfs_subr.c:1027
>    #18 0xffffffff80a7b985 in tmpfs_alloc_file (dvp=dvp@entry=0xfffff80b139e4700, vpp=<unavailable>, vpp@entry=0xfffffe02fb211cf0, vap=<optimized out>, cnp=cnp@entry=0xfffffe02fb211d18, target=target@entry=0x0) at /usr/src/sys/fs/tmpfs/tmpfs_subr.c:1203
>    #19 0xffffffff80a74d28 in tmpfs_create (v=<optimized out>) at /usr/src/sys/fs/tmpfs/tmpfs_vnops.c:271
>    #20 0xffffffff8111eb99 in VOP_CREATE_APV (vop=0xffffffff816b7a70 <tmpfs_vnodeop_entries>, a=a@entry=0xfffffe02fb211be0) at vnode_if.c:244
>    #21 0xffffffff80c5d94c in VOP_CREATE (dvp=<unavailable>, vpp=0xfffffe02fb211cf0, cnp=0xfffffe02fb211d18, vap=0xfffffe02fb211b20) at ./vnode_if.h:133
>    #22 vn_open_cred (ndp=ndp@entry=0xfffffe02fb211c98, flagp=flagp@entry=0xfffffe02fb211da4, cmode=cmode@entry=420, vn_open_flags=vn_open_flags@entry=16, cred=0xfffff8010978bc00, fp=0xfffff804f42cff00) at /usr/src/sys/kern/vfs_vnops.c:287
>    #23 0xffffffff80c53f43 in kern_openat (td=0xfffffe02ddae0e40, td@entry=<unavailable>, fd=fd@entry=-100, path=0x8222f799b <error: Cannot access memory at address 0x8222f799b>, pathseg=pathseg@entry=UIO_USERSPACE, flags=1538, mode=<optimized out>)
>        at /usr/src/sys/kern/vfs_syscalls.c:1167
>    #24 0xffffffff80c53cad in sys_open (td=td@entry=<unavailable>, uap=uap@entry=<unavailable>) at /usr/src/sys/kern/vfs_syscalls.c:1095
>    #25 0xffffffff82b18365 in filemon_wrapper_open (td=<unavailable>, td@entry=<error reading variable: value is not available>, uap=<unavailable>, uap@entry=<error reading variable: value is not available>) at /usr/src/sys/dev/filemon/filemon_wrapper.c:220
>    #26 0xffffffff8104d8e0 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:190
>    #27 amd64_syscall (td=0xfffffe02ddae0e40, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1199
>    #28 <signal handler called>
>    #29 0x0000000829c8227a in ?? ()
>    Backtrace stopped: Cannot access memory at address 0x8222f6868
>    (kgdb) f 7
>    #7  0xffffffff81faff5c in zfs_freebsd_reclaim (ap=<optimized out>) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:5164
>    5164		ZFS_TEARDOWN_INACTIVE_ENTER_READ(zfsvfs);
>    (kgdb) p zp->z_zfsvfs->z_teardown_inactive_lock->owner
>    $2 = (struct thread *) 0xfffffe0314249020
>    (kgdb) tid 0xfffffe0314249020
>    (kgdb) bt
>    #0  sched_switch (td=td@entry=0xfffffe0314249020, flags=flags@entry=259) at /usr/src/sys/kern/sched_ule.c:2299
>    #1  0xffffffff80b5fbd4 in mi_switch (flags=flags@entry=259) at /usr/src/sys/kern/kern_synch.c:550
>    #2  0xffffffff80bb1257 in sleepq_switch (wchan=wchan@entry=0xfffff80108fe1540, pri=0, pri@entry=150869200) at /usr/src/sys/kern/subr_sleepqueue.c:609
>    #3  0xffffffff80bb112e in sleepq_wait (wchan=<unavailable>, wchan@entry=0xfffff80108fe1540, pri=<unavailable>, pri@entry=0) at /usr/src/sys/kern/subr_sleepqueue.c:660
>    #4  0xffffffff80ade224 in _cv_wait (cvp=0xfffff80108fe1540, lock=0xfffff80108fe14d0) at /usr/src/sys/kern/kern_condvar.c:146
>    #5  0xffffffff820b383b in txg_wait_synced_impl (dp=0xfffff80108fe1000, txg=8751529, txg@entry=0, wait_sig=wait_sig@entry=0) at /usr/src/sys/contrib/openzfs/module/zfs/txg.c:726
>    #6  0xffffffff820b31eb in txg_wait_synced (dp=<unavailable>, txg=<unavailable>, txg@entry=0) at /usr/src/sys/contrib/openzfs/module/zfs/txg.c:736
>    #7  0xffffffff81fa5fc5 in zfsvfs_teardown (zfsvfs=0xfffff81ab3c81000, unmounting=unmounting@entry=0) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vfsops.c:1661
>    #8  0xffffffff81fa5db9 in zfs_suspend_fs (zfsvfs=<unavailable>) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vfsops.c:1954
>    #9  0xffffffff821680ff in zfs_ioc_rollback (fsname=0xfffffe0301913000 "zroot-default-ref/03", fsname@entry=<error reading variable: value is not available>, innvl=<unavailable>, innvl@entry=<error reading variable: value is not available>, 
>        outnvl=0xfffff81601748640, outnvl@entry=<error reading variable: value is not available>) at /usr/src/sys/contrib/openzfs/module/zfs/zfs_ioctl.c:4401
>    #10 0xffffffff82163836 in zfsdev_ioctl_common (vecnum=vecnum@entry=25, zc=zc@entry=0xfffffe0301913000, flag=flag@entry=0) at /usr/src/sys/contrib/openzfs/module/zfs/zfs_ioctl.c:7798
>    #11 0xffffffff81f969aa in zfsdev_ioctl (dev=<optimized out>, zcmd=<unavailable>, zcmd@entry=<error reading variable: value is not available>, arg=0xfffffe02fd546d50 "\017", arg@entry=<error reading variable: value is not available>, flag=<optimized out>, td=<optimized out>)
>        at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/kmod_core.c:168
>    #12 0xffffffff809dc9cc in devfs_ioctl (ap=0xfffffe02fd546c40) at /usr/src/sys/fs/devfs/devfs_vnops.c:935
>    #13 0xffffffff80c5cac0 in vn_ioctl (fp=0xfffff81e9207f0a0, com=<optimized out>, data=0xfffffe02fd546d50, active_cred=0xfffff8026a65a900, td=<unavailable>) at /usr/src/sys/kern/vfs_vnops.c:1697
>    #14 0xffffffff809dd07e in devfs_ioctl_f (fp=<unavailable>, fp@entry=<error reading variable: value is not available>, com=<unavailable>, com@entry=<error reading variable: value is not available>, data=<unavailable>, data@entry=<error reading variable: value is not available>, 
>        cred=<unavailable>, cred@entry=<error reading variable: value is not available>, td=<unavailable>, td@entry=<error reading variable: value is not available>) at /usr/src/sys/fs/devfs/devfs_vnops.c:866
>    #15 0xffffffff80bca1ce in fo_ioctl (fp=0xfffff81e9207f0a0, com=3222821401, data=<unavailable>, active_cred=<unavailable>, td=<optimized out>) at /usr/src/sys/sys/file.h:367
>    #16 kern_ioctl (td=td@entry=0xfffffe0314249020, fd=<optimized out>, com=com@entry=3222821401, data=<unavailable>, data@entry=0xfffffe02fd546d50 "\017") at /usr/src/sys/kern/sys_generic.c:807
>    #17 0xffffffff80bc9f64 in sys_ioctl (td=0xfffffe0314249020, td@entry=<error reading variable: value is not available>, uap=0xfffffe0314249420, uap@entry=<error reading variable: value is not available>) at /usr/src/sys/kern/sys_generic.c:715
>    #18 0xffffffff8104d8e0 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:190
>    #19 amd64_syscall (td=0xfffffe0314249020, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1199
>    #20 <signal handler called>
>    #21 0x000005c8e125953a in ?? ()
>    Backtrace stopped: Cannot access memory at address 0x5c8d89c8018
>
>DES

Yes, this is the same panic my poudriere builder building amd64 packages gets. The poudeiere builder, also running on amd64, building i386 packages gets a different panic. I'm on my phone and don't have a keyboard to look up the PR number.


-- 
Cheers,
Cy Schubert <Cy.Schubert@cschubert.com>
FreeBSD UNIX:  <cy@FreeBSD.org>  Web:  https://FreeBSD.org
NTP:                     <cy@nwtime.org>    Web:  https://nwtime.org
                                                    e^(i*pi)+1=0

Pardon the typos. Small keyboard in use.