Re: ZFS deadlock in 14

From: Kevin Bowling <kevin.bowling_at_kev009.com>
Date: Thu, 10 Aug 2023 23:28:54 UTC
The two MFVs on head have improved/fixed stability with poudriere for
me 48 core bare metal.

On Thu, Aug 10, 2023 at 6:37 AM Cy Schubert <Cy.Schubert@cschubert.com> wrote:
>
> In message <CAK7dMtDJeuf8rjWbsNEZABUfeqpjUyCHzuOL9AAhKk93sy+PKg@mail.gmail.c
> om>
> , Kevin Bowling writes:
> > Possibly https://github.com/openzfs/zfs/commit/2cb992a99ccadb78d97049b40bd4=
> > 42eb4fdc549d
> >
> > On Tue, Aug 8, 2023 at 10:08=E2=80=AFAM Dag-Erling Sm=C3=B8rgrav <des@freeb=
> > sd.org> wrote:
> > >
> > > At some point between 42d088299c (4 May) and f0c9703301 (26 June), a
> > > deadlock was introduced in ZFS.  It is still present as of 9c2823bae9 (4
> > > August) and is 100% reproducable just by starting poudriere bulk in a
> > > 16-core VM and waiting a few hours until deadlkres kicks in.  In the
> > > latest instance, deadlkres complained about a bash process:
> > >
> > >     #0  sched_switch (td=3Dtd@entry=3D0xfffffe02fb1d8000, flags=3Dflags@e=
> > ntry=3D259) at /usr/src/sys/kern/sched_ule.c:2299
> > >     #1  0xffffffff80b5a0a3 in mi_switch (flags=3Dflags@entry=3D259) at /u=
> > sr/src/sys/kern/kern_synch.c:550
> > >     #2  0xffffffff80babcb4 in sleepq_switch (wchan=3D0xfffff818543a9e70, =
> > pri=3D64) at /usr/src/sys/kern/subr_sleepqueue.c:609
> > >     #3  0xffffffff80babb8c in sleepq_wait (wchan=3D<unavailable>, pri=3D<=
> > unavailable>) at /usr/src/sys/kern/subr_sleepqueue.c:660
> > >     #4  0xffffffff80b1c1b0 in sleeplk (lk=3Dlk@entry=3D0xfffff818543a9e70=
> > , flags=3Dflags@entry=3D2121728, ilk=3Dilk@entry=3D0x0, wmesg=3Dwmesg@entry=
> > =3D0xffffffff8222a054 "zfs", pri=3D<optimized out>, pri@entry=3D64, timo=3D=
> > timo@entry=3D6, queue=3D1) at /usr/src/sys/kern/kern_lock.c:310
> > >     #5  0xffffffff80b1a23f in lockmgr_slock_hard (lk=3D0xfffff818543a9e70=
> > , flags=3D2121728, ilk=3D<optimized out>, file=3D0xffffffff812544fb "/usr/s=
> > rc/sys/kern/vfs_subr.c", line=3D3057, lwa=3D0x0) at /usr/src/sys/kern/kern_=
> > lock.c:705
> > >     #6  0xffffffff80c59ec3 in VOP_LOCK1 (vp=3D0xfffff818543a9e00, flags=
> > =3D2105344, file=3D0xffffffff812544fb "/usr/src/sys/kern/vfs_subr.c", line=
> > =3D3057) at ./vnode_if.h:1120
> > >     #7  _vn_lock (vp=3Dvp@entry=3D0xfffff818543a9e00, flags=3D2105344, fi=
> > le=3D<unavailable>, line=3D<unavailable>, line@entry=3D3057) at /usr/src/sy=
> > s/kern/vfs_vnops.c:1815
> > >     #8  0xffffffff80c4173d in vget_finish (vp=3D0xfffff818543a9e00, flags=
> > =3D<unavailable>, vs=3Dvs@entry=3DVGET_USECOUNT) at /usr/src/sys/kern/vfs_s=
> > ubr.c:3057
> > >     #9  0xffffffff80c1c9b7 in cache_lookup (dvp=3Ddvp@entry=3D0xfffff802c=
> > d02ac40, vpp=3Dvpp@entry=3D0xfffffe046b20ac30, cnp=3Dcnp@entry=3D0xfffffe04=
> > 6b20ac58, tsp=3Dtsp@entry=3D0x0, ticksp=3Dticksp@entry=3D0x0) at /usr/src/s=
> > ys/kern/vfs_cache.c:2086
> > >     #10 0xffffffff80c2150c in vfs_cache_lookup (ap=3D<optimized out>) at =
> > /usr/src/sys/kern/vfs_cache.c:3068
> > >     #11 0xffffffff80c32c37 in VOP_LOOKUP (dvp=3D0xfffff802cd02ac40, vpp=
> > =3D0xfffffe046b20ac30, cnp=3D0xfffffe046b20ac58) at ./vnode_if.h:69
> > >     #12 vfs_lookup (ndp=3Dndp@entry=3D0xfffffe046b20abd8) at /usr/src/sys=
> > /kern/vfs_lookup.c:1266
> > >     #13 0xffffffff80c31ce1 in namei (ndp=3Dndp@entry=3D0xfffffe046b20abd8=
> > ) at /usr/src/sys/kern/vfs_lookup.c:689
> > >     #14 0xffffffff80c52090 in kern_statat (td=3D0xfffffe02fb1d8000, flag=
> > =3D<optimized out>, fd=3D-100, path=3D0xa75b480e070 <error: Cannot access m=
> > emory at address 0xa75b480e070>, pathseg=3Dpathseg@entry=3DUIO_USERSPACE, s=
> > bp=3Dsbp@entry=3D0xfffffe046b20ad18)
> > >         at /usr/src/sys/kern/vfs_syscalls.c:2441
> > >     #15 0xffffffff80c52797 in sys_fstatat (td=3D<unavailable>, uap=3D0xff=
> > fffe02fb1d8400) at /usr/src/sys/kern/vfs_syscalls.c:2419
> > >     #16 0xffffffff81049398 in syscallenter (td=3D<optimized out>) at /usr=
> > /src/sys/amd64/amd64/../../kern/subr_syscall.c:190
> > >     #17 amd64_syscall (td=3D0xfffffe02fb1d8000, traced=3D0) at /usr/src/s=
> > ys/amd64/amd64/trap.c:1199
> > >     #18 <signal handler called>
> > >
> > > The lock it is trying to acquire in frame 5 belongs to another bash
> > > process which is in the process of creating a fifo:
> > >
> > >     #0  sched_switch (td=3Dtd@entry=3D0xfffffe046acd8e40, flags=3Dflags@e=
> > ntry=3D259) at /usr/src/sys/kern/sched_ule.c:2299
> > >     #1  0xffffffff80b5a0a3 in mi_switch (flags=3Dflags@entry=3D259) at /u=
> > sr/src/sys/kern/kern_synch.c:550
> > >     #2  0xffffffff80babcb4 in sleepq_switch (wchan=3D0xfffff8018acbf154, =
> > pri=3D87) at /usr/src/sys/kern/subr_sleepqueue.c:609
> > >     #3  0xffffffff80babb8c in sleepq_wait (wchan=3D<unavailable>, pri=3D<=
> > unavailable>) at /usr/src/sys/kern/subr_sleepqueue.c:660
> > >     #4  0xffffffff80b59606 in _sleep (ident=3Dident@entry=3D0xfffff8018ac=
> > bf154, lock=3Dlock@entry=3D0xfffff8018acbf120, priority=3Dpriority@entry=3D=
> > 87, wmesg=3D0xffffffff8223af0e "zfs teardown inactive", sbt=3Dsbt@entry=3D0=
> > , pr=3Dpr@entry=3D0, flags=3D256)
> > >         at /usr/src/sys/kern/kern_synch.c:225
> > >     #5  0xffffffff80b45dc0 in rms_rlock_fallback (rms=3D0xfffff8018acbf12=
> > 0) at /usr/src/sys/kern/kern_rmlock.c:1015
> > >     #6  0xffffffff80b45c93 in rms_rlock (rms=3D<unavailable>, rms@entry=
> > =3D0xfffff8018acbf120) at /usr/src/sys/kern/kern_rmlock.c:1036
> > >     #7  0xffffffff81fb147b in zfs_freebsd_reclaim (ap=3D<optimized out>) =
> > at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:5164
> > >     #8  0xffffffff8111d245 in VOP_RECLAIM_APV (vop=3D0xffffffff822e71a0 <=
> > zfs_vnodeops>, a=3Da@entry=3D0xfffffe0410f1c9c8) at vnode_if.c:2180
> > >     #9  0xffffffff80c43569 in VOP_RECLAIM (vp=3D0xfffff802cdbaca80) at ./=
> > vnode_if.h:1084
> > >     #10 vgonel (vp=3Dvp@entry=3D0xfffff802cdbaca80) at /usr/src/sys/kern/=
> > vfs_subr.c:4143
> > >     #11 0xffffffff80c3ef61 in vtryrecycle (vp=3D0xfffff802cdbaca80) at /u=
> > sr/src/sys/kern/vfs_subr.c:1693
> > >     #12 vnlru_free_impl (count=3Dcount@entry=3D1, mnt_op=3Dmnt_op@entry=
> > =3D0x0, mvp=3D0xfffff8010864da00) at /usr/src/sys/kern/vfs_subr.c:1344
> > >     #13 0xffffffff80c49553 in vnlru_free_locked (count=3D1) at /usr/src/s=
> > ys/kern/vfs_subr.c:1357
> > >     #14 vn_alloc_hard (mp=3Dmp@entry=3D0x0) at /usr/src/sys/kern/vfs_subr=
> > .c:1744
> > >     #15 0xffffffff80c3f6f0 in vn_alloc (mp=3D0x0) at /usr/src/sys/amd64/i=
> > nclude/atomic.h:375
> > >     #16 getnewvnode_reserve () at /usr/src/sys/kern/vfs_subr.c:1888
> > >     #17 0xffffffff81faa072 in zfs_create (dzp=3D0xfffff812200261d0, name=
> > =3D0xfffff8011b8ac805 "sh-np.yPbxoo", vap=3D0xfffffe0410f1cc20, excl=3D<opt=
> > imized out>, mode=3D<optimized out>, zpp=3Dzpp@entry=3D0xfffffe0410f1cbc8, =
> > cr=3D0xfffff80140fb1100, flag=3D<optimized out>, vsecp=3D0x0, mnt_ns=3D0x0)
> > >         at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_o=
> > s.c:1146
> > >     #18 0xffffffff81faea57 in zfs_freebsd_create (ap=3D0xfffffe0410f1cda0=
> > ) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:4618
> > >     #19 0xffffffff8111aa9a in VOP_MKNOD_APV (vop=3D0xffffffff822e71a0 <zf=
> > s_vnodeops>, a=3Da@entry=3D0xfffffe0410f1cda0) at vnode_if.c:372
> > >     #20 0xffffffff80c50207 in VOP_MKNOD (dvp=3D<unavailable>, cnp=3D0xfff=
> > ffe0410f1cd50, vap=3D0xfffffe0410f1cc20, vpp=3D<optimized out>) at ./vnode_=
> > if.h:188
> > >     #21 kern_mkfifoat (td=3D0xfffffe046acd8e40, fd=3D-100, path=3D0x12772=
> > f073500 <error: Cannot access memory at address 0x12772f073500>, pathseg=3D=
> > UIO_USERSPACE, mode=3D<optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:=
> > 1492
> > >     #22 0xffffffff81049398 in syscallenter (td=3D<optimized out>) at /usr=
> > /src/sys/amd64/amd64/../../kern/subr_syscall.c:190
> > >     #23 amd64_    搬 syscall (td=3D0xfffffe046acd8e40, traced=3D0) at /usr/src/s=
> > ys/amd64/amd64/trap.c:1199
> > >     #24 <signal handler called>
> > >
> > > Frame 7 is trying to acquire the ZFS teardown inactive lock, which is
> > > held by a process which is performing a ZFS rollback and is waiting for
> > > the transaction to sync:
> > >
> > >     #0  sched_switch (td=3Dtd@entry=3D0xfffffe0422ef8560, flags=3Dflags@e=
> > ntry=3D259) at /usr/src/sys/kern/sched_ule.c:2299
> > >     #1  0xffffffff80b5a0a3 in mi_switch (flags=3Dflags@entry=3D259) at /u=
> > sr/src/sys/kern/kern_synch.c:550
> > >     #2  0xffffffff80babcb4 in sleepq_switch (wchan=3D0xfffff8011b83d540, =
> > pri=3D0) at /usr/src/sys/kern/subr_sleepqueue.c:609
> > >     #3  0xffffffff80babb8c in sleepq_wait (wchan=3D<unavailable>, wchan@e=
> > ntry=3D0xfffff8011b83d540, pri=3D<unavailable>, pri@entry=3D0) at /usr/src/=
> > sys/kern/subr_sleepqueue.c:660
> > >     #4  0xffffffff80ad7f75 in _cv_wait (cvp=3Dcvp@entry=3D0xfffff8011b83d=
> > 540, lock=3Dlock@entry=3D0xfffff8011b83d4d0) at /usr/src/sys/kern/kern_cond=
> > var.c:146
> > >     #5  0xffffffff820b42fb in txg_wait_synced_impl (dp=3Ddp@entry=3D0xfff=
> > ff8011b83d000, txg=3D8585097, wait_sig=3Dwait_sig@entry=3D0) at /usr/src/sy=
> > s/contrib/openzfs/module/zfs/txg.c:726
> > >     #6  0xffffffff820b3cab in txg_wait_synced (dp=3D<unavailable>, dp@ent=
> > ry=3D0xfffff8011b83d000, txg=3D<unavailable>) at /usr/src/sys/contrib/openz=
> > fs/module/zfs/txg.c:736
> > >     #7  0xffffffff8206d5b5 in dsl_sync_task_common (pool=3Dpool@entry=3D0=
> > xfffffe0401d15000 "zroot/poudriere/jails/13amd64-default-ref/15", checkfunc=
> > =3D<optimized out>, syncfunc=3D0xffffffff8203fbc0 <dsl_dataset_rollback_syn=
> > c>, sigfunc=3Dsigfunc@entry=3D0x0, arg=3Darg@entry=3D0xfffffe02fb827a90,
> > >         blocks_modified=3Dblocks_modified@entry=3D1, space_check=3DZFS_SP=
> > ACE_CHECK_RESERVED, early=3D0) at /usr/src/sys/contrib/openzfs/module/zfs/d=
> > sl_synctask.c:93
> > >     #8  0xffffffff8206d3c7 in dsl_sync_task (pool=3D<unavailable>, pool@e=
> > ntry=3D0xfffffe0401d15000 "zroot/poudriere/jails/13amd64-default-ref/15", c=
> > heckfunc=3D<unavailable>, syncfunc=3D<unavailable>, arg=3D<unavailable>, ar=
> > g@entry=3D0xfffffe02fb827a90, blocks_modified=3D<unavailable>,
> > >         blocks_modified@entry=3D1, space_check=3D<unavailable>, space_che=
> > ck@entry=3DZFS_SPACE_CHECK_RESERVED) at /usr/src/sys/contrib/openzfs/module=
> > /zfs/dsl_synctask.c:132
> > >     #9  0xffffffff8204075b in dsl_dataset_rollback (fsname=3D<unavailable=
> > >, fsname@entry=3D0xfffffe0401d15000 "zroot/poudriere/jails/13amd64-default=
> > -ref/15", tosnap=3D<optimized out>, owner=3D<optimized out>, result=3Dresul=
> > t@entry=3D0xfffff81c826a9ea0)
> > >         at /usr/src/sys/contrib/openzfs/module/zfs/dsl_dataset.c:3261
> > >     #10 0xffffffff82168dd9 in zfs_ioc_rollback (fsname=3D0xfffffe0401d150=
> > 00 "zroot/poudriere/jails/13amd64-default-ref/15", fsname@entry=3D<error re=
> > ading variable: value is not available>, innvl=3D<unavailable>, innvl@entry=
> > =3D<error reading variable: value is not available>,
> > >         outnvl=3D0xfffff81c826a9ea0, outnvl@entry=3D<error reading variab=
> > le: value is not available>) at /usr/src/sys/contrib/openzfs/module/zfs/zfs=
> > _ioctl.c:4405
> > >     #11 0xffffffff82164522 in zfsdev_ioctl_common (vecnum=3Dvecnum@entry=
> > =3D25, zc=3Dzc@entry=3D0xfffffe0401d15000, flag=3Dflag@entry=3D0) at /usr/s=
> > rc/sys/contrib/openzfs/module/zfs/zfs_ioctl.c:7798
> > >     #12 0xffffffff81f97fca in zfsdev_ioctl (dev=3D<optimized out>, zcmd=
> > =3D<unavailable>, zcmd@entry=3D<error reading variable: value is not availa=
> > ble>, arg=3D0xfffffe02fb827d50 "\017", arg@entry=3D<error reading variable:=
> >  value is not available>, flag=3D<optimized out>, td=3D<optimized out>)
> > >         at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/kmod_core.c=
> > :168
> > >     #13 0xffffffff809d6212 in devfs_ioctl (ap=3D0xfffffe02fb827c50) at /u=
> > sr/src/sys/fs/devfs/devfs_vnops.c:935
> > >     #14 0xffffffff80c585f2 in vn_ioctl (fp=3D0xfffff8052cdd80f0, com=3D<o=
> > ptimized out>, data=3D0xfffffe02fb827d50, active_cred=3D0xfffff80122ab1e00,=
> >  td=3D<unavailable>) at /usr/src/sys/kern/vfs_vnops.c:1704
> > >     #15 0xffffffff809d68ee in devfs_ioctl_f (fp=3D<unavailable>, fp@entry=
> > =3D<error reading variable: value is not available>, com=3D<unavailable>, c=
> > om@entry=3D<error reading variable: value is not available>, data=3D<unavai=
> > lable>, data@entry=3D<error reading variable: value is not available>,
> > >         cred=3D<unavailable>, cred@entry=3D<error reading variable: value=
> >  is not available>, td=3D<unavailable>, td@entry=3D<error reading variable:=
> >  value is not available>) at /usr/src/sys/fs/devfs/devfs_vnops.c:866
> > >     #16 0xffffffff80bc57e6 in fo_ioctl (fp=3D0xfffff8052cdd80f0, com=3D32=
> > 22821401, data=3D<unavailable>, active_cred=3D<unavailable>, td=3D0xfffffe0=
> > 422ef8560) at /usr/src/sys/sys/file.h:367
> > >     #17 kern_ioctl (td=3Dtd@entry=3D0xfffffe0422ef8560, fd=3D4, com=3Dcom=
> > @entry=3D3222821401, data=3D<unavailable>, data@entry=3D0xfffffe02fb827d50 =
> > "\017") at /usr/src/sys/kern/sys_generic.c:807
> > >     #18 0xffffffff80bc54f2 in sys_ioctl (td=3D0xfffffe0422ef8560, uap=3D0=
> > xfffffe0422ef8960) at /usr/src/sys/kern/sys_generic.c:715
> > >     #19 0xffffffff81049398 in syscallenter (td=3D<optimized out>) at /usr=
> > /src/sys/amd64/amd64/../../kern/subr_syscall.c:190
> > >     #20 amd64_syscall (td=3D0xfffffe0422ef8560, traced=3D0) at /usr/src/s=
> > ys/amd64/amd64/trap.c:1199
> [...]
>
> The backtrace looks different though it certainly smells like PR/271945.
>
> I've had similar to PR/271945 panics on an amd64 with a mirrored zpool with
> four vdevs running poudriere with AMD64 jails. My other amd64 with a
> mirrored zpool with two vdevs using i386 jails has no such issue. All other
> workloads are unaffected.
>
> On the affected machine running poudriere bulk with -J N:1 circumvents the
> issue. So far. There were two openzfs cherry-picks this morning. I intend
> to try them against a full bulk build later today.
>
>
> --
> Cheers,
> Cy Schubert <Cy.Schubert@cschubert.com>
> FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  https://FreeBSD.org
> NTP:           <cy@nwtime.org>    Web:  https://nwtime.org
>
>                         e^(i*pi)+1=0
>
>