From nobody Sat Aug 12 04:55:40 2023 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RN7ds4zWZz4TnHD for ; Sat, 12 Aug 2023 04:55:45 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Received: from omta001.cacentral1.a.cloudfilter.net (omta001.cacentral1.a.cloudfilter.net [3.97.99.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "Client", Issuer "CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RN7dq5x6dz3NpK; Sat, 12 Aug 2023 04:55:43 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Authentication-Results: mx1.freebsd.org; dkim=none; spf=none (mx1.freebsd.org: domain of cy.schubert@cschubert.com has no SPF policy when checking 3.97.99.32) smtp.mailfrom=cy.schubert@cschubert.com; dmarc=none Received: from shw-obgw-4004a.ext.cloudfilter.net ([10.228.9.227]) by cmsmtp with ESMTP id UQdOq0dPsLAoIUgf9qWRaa; Sat, 12 Aug 2023 04:55:43 +0000 Received: from spqr.komquats.com ([70.66.152.170]) by cmsmtp with ESMTPA id Ugf7qqs2e3fOSUgf8qsrZS; Sat, 12 Aug 2023 04:55:43 +0000 X-Authority-Analysis: v=2.4 cv=J8G5USrS c=1 sm=1 tr=0 ts=64d710cf a=y8EK/9tc/U6QY+pUhnbtgQ==:117 a=y8EK/9tc/U6QY+pUhnbtgQ==:17 a=kj9zAlcOel0A:10 a=UttIx32zK-AA:10 a=YxBL1-UpAAAA:8 a=6I5d2MoRAAAA:8 a=EkcXrb_YAAAA:8 a=NEAV23lmAAAA:8 a=SLG1KRGDAAAA:8 a=YSBoMgAGpdedK1I-WFkA:9 a=G8qsWpk2SOyw_Jid:21 a=CjuIK1q_8ugA:10 a=Ia-lj3WSrqcvXOmTRaiG:22 a=IjZwj45LgO3ly-622nXo:22 a=LK5xJRSDVpKd5WXXoEvA:22 a=-TBaU1e9WpdkKBzYXnwo:22 Received: from slippy.cwsent.com (slippy [10.1.1.91]) by spqr.komquats.com (Postfix) with ESMTP id 31E6C955; Fri, 11 Aug 2023 21:55:41 -0700 (PDT) Received: by slippy.cwsent.com (Postfix, from userid 1000) id 118E5217; Fri, 11 Aug 2023 21:55:41 -0700 (PDT) X-Mailer: exmh version 2.9.0 11/07/2018 with nmh-1.8+dev Reply-to: Cy Schubert From: Cy Schubert X-os: FreeBSD X-Sender: cy@cwsent.com X-URL: http://www.cschubert.com/ To: Cy Schubert cc: Kevin Bowling , =?UTF-8?Q?Dag=2DErling_Sm=C3=B8rgrav?= , current@freebsd.org Subject: Re: ZFS deadlock in 14 Comments: In-reply-to Cy Schubert message dated "Fri, 11 Aug 2023 20:41:29 -0700." List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 11 Aug 2023 21:55:40 -0700 Message-Id: <20230812045541.118E5217@slippy.cwsent.com> X-CMAE-Envelope: MS4xfB/nBsIrDvZdhPxg9pvW8hYre3o5ovZtdapNOLZ+uKZzTJNtO4SydKxQkf+8BuLNM+9MRYzrhKqBVM5UpY+LDvgOLWz/EX2Ma+h41wc6U81G9udZqXfD WXGVaIfXSVwgy/O1yPbwQyR2X0w5azqffujmrCeQNu8PwdHoJ/dU2n2AzKMXuKkg1RvgbKeocvWSdiynMLdGryImTf/+XYXtLF1dpCBw2DSaSGZoe85Czl84 LEAkvvGTALPgS71GLIUiOY98qVHmZXanG3BaZiezMAw= X-Spamd-Result: default: False [-0.65 / 15.00]; FAKE_REPLY(1.00)[]; AUTH_NA(1.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; NEURAL_HAM_LONG(-0.95)[-0.954]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[text/plain]; RCVD_IN_DNSWL_LOW(-0.10)[3.97.99.32:from]; RCPT_COUNT_THREE(0.00)[4]; DMARC_NA(0.00)[cschubert.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; BLOCKLISTDE_FAIL(0.00)[3.97.99.32:server fail]; REPLYTO_EQ_FROM(0.00)[]; HAS_REPLYTO(0.00)[Cy.Schubert@cschubert.com]; MLMMJ_DEST(0.00)[current@freebsd.org]; RCVD_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; R_DKIM_NA(0.00)[]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:16509, ipnet:3.96.0.0/15, country:US]; R_SPF_NA(0.00)[no SPF record] X-Spamd-Bar: / X-Rspamd-Queue-Id: 4RN7dq5x6dz3NpK The poudriere build machine building amd64 packages also panicked. But with: Dumping 2577 out of 8122 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91 % __curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:59 59 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu , (kgdb) #0 __curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:5 9 #1 doadump (textdump=textdump@entry=1) at /opt/src/git-src/sys/kern/kern_shutdown.c:407 #2 0xffffffff806c10e0 in kern_reboot (howto=260) at /opt/src/git-src/sys/kern/kern_shutdown.c:528 #3 0xffffffff806c15df in vpanic ( fmt=0xffffffff80b6c5f5 "%s: possible deadlock detected for %p (%s), blocked for %d ticks\n", ap=ap@entry=0xfffffe008e698e90) at /opt/src/git-src/sys/kern/kern_shutdown.c:972 #4 0xffffffff806c1383 in panic (fmt=) at /opt/src/git-src/sys/kern/kern_shutdown.c:896 #5 0xffffffff8064a5ea in deadlkres () at /opt/src/git-src/sys/kern/kern_clock.c:201 #6 0xffffffff80677632 in fork_exit (callout=0xffffffff8064a2c0 , arg=0x0, frame=0xfffffe008e698f40) at /opt/src/git-src/sys/kern/kern_fork.c:1162 #7 (kgdb) This is consistent with PR/271945. Reducing -J to 1 or 5:1 circumvents this panic. This is certainly a different panic from the one experienced on the poudriere builder building i386 packages. Both machines run in amd64 mode. -- Cheers, Cy Schubert FreeBSD UNIX: Web: https://FreeBSD.org NTP: Web: https://nwtime.org e^(i*pi)+1=0 Cy Schubert writes: > This is new. Instead of affecting the machine with poudriere building amd64 > packages, it affected the other machine with poudriere building i386 > packages. This is new since the two recent ZFS patches. > > Don't get me wrong, the two new patches have resulted in I believe better > availability of the poudriere machine building amd64 packages. I doubt the > two patches caused this but they may have exposed this problem, probably > fixed by another patch or two. > > Sorry, there was no dump produced by this panic. I'll need to check the > config of this machine, swap is a gmirror, which it doesn't like to dump > to. Below are serial console messages captured by conserver. > > panic: vm_page_dequeue_deferred: page 0xfffffe00028fb0d0 has unexpected > queue state^M > cpuid = 3^M > time = 1691807572^M > KDB: stack backtrace:^M > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe00c50bc600^M > vpanic() at vpanic+0x132/frame 0xfffffe00c50bc730^M > panic() at panic+0x43/frame 0xfffffe00c50bc790^M > vm_page_dequeue_deferred() at vm_page_dequeue_deferred+0xb2/frame > 0xfffffe00c50bc7a0^M > vm_page_free_prep() at vm_page_free_prep+0x11b/frame 0xfffffe00c50bc7c0^M > vm_page_free_toq() at vm_page_free_toq+0x12/frame 0xfffffe00c50bc7f0^M > vm_object_page_remove() at vm_object_page_remove+0xb6/frame > 0xfffffe00c50bc850^M > vn_pages_remove_valid() at vn_pages_remove_valid+0x48/frame > 0xfffffe00c50bc880^M > zfs_rezget() at zfs_rezget+0x35/frame 0xfffffe00c50bca60^M > zfs_resume_fs() at zfs_resume_fs+0x1c8/frame 0xfffffe00c50bcab0^M > zfs_ioc_rollback() at zfs_ioc_rollback+0x157/frame 0xfffffe00c50bcb00^M > zfsdev_ioctl_common() at zfsdev_ioctl_common+0x612/frame > 0xfffffe00c50bcbc0^M > zfsdev_ioctl() at zfsdev_ioctl+0x12a/frame 0xfffffe00c50bcbf0^M > devfs_ioctl() at devfs_ioctl+0xd2/frame 0xfffffe00c50bcc40^M > vn_ioctl() at vn_ioctl+0xc2/frame 0xfffffe00c50bccb0^M > devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe00c50bccd0^M > kern_ioctl() at kern_ioctl+0x286/frame 0xfffffe00c50bcd30^M > sys_ioctl() at sys_ioctl+0x152/frame 0xfffffe00c50bce00^M > amd64_syscall() at amd64_syscall+0x138/frame 0xfffffe00c50bcf30^M > fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00c50bcf30^M > --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x20938296107a, rsp = > 0x209379aeee18, rbp = 0x209379aeee90 ---^M > Uptime: 42m33s^M > Automatic reboot in 15 seconds - press a key on the console to abort^M > Rebooting...^M > cpu_reset: Restarting BSP^M > cpu_reset_proxy: Stopped CPU 3^M > > > -- > Cheers, > Cy Schubert > FreeBSD UNIX: Web: https://FreeBSD.org > NTP: Web: https://nwtime.org > > e^(i*pi)+1=0 > > > Cy Schubert writes: > > I haven't experienced any problems (yet) either. > > > > > > -- > > Cheers, > > Cy Schubert > > FreeBSD UNIX: Web: https://FreeBSD.org > > NTP: Web: https://nwtime.org > > > > e^(i*pi)+1=0 > > > > > > In message c > > om> > > , Kevin Bowling writes: > > > The two MFVs on head have improved/fixed stability with poudriere for > > > me 48 core bare metal. > > > > > > On Thu, Aug 10, 2023 at 6:37=E2=80=AFAM Cy Schubert t. > > = > > > com> wrote: > > > > > > > > In message ai > > = > > > l.c > > > > om> > > > > , Kevin Bowling writes: > > > > > Possibly https://github.com/openzfs/zfs/commit/2cb992a99ccadb78d97049 > b4 > > = > > > 0bd4=3D > > > > > 42eb4fdc549d > > > > > > > > > > On Tue, Aug 8, 2023 at 10:08=3DE2=3D80=3DAFAM Dag-Erling Sm=3DC3=3DB8 > rg > > = > > > rav > > > > sd.org> wrote: > > > > > > > > > > > > At some point between 42d088299c (4 May) and f0c9703301 (26 June), > a > > > > > > deadlock was introduced in ZFS. It is still present as of 9c2823ba > e9 > > = > > > (4 > > > > > > August) and is 100% reproducable just by starting poudriere bulk in > a > > > > > > 16-core VM and waiting a few hours until deadlkres kicks in. In th > e > > > > > > latest instance, deadlkres complained about a bash process: > > > > > > > > > > > > #0 sched_switch (td=3D3Dtd@entry=3D3D0xfffffe02fb1d8000, flags > = > > > =3D3Dflags@e=3D > > > > > ntry=3D3D259) at /usr/src/sys/kern/sched_ule.c:2299 > > > > > > #1 0xffffffff80b5a0a3 in mi_switch (flags=3D3Dflags@entry=3D3D > 25 > > = > > > 9) at /u=3D > > > > > sr/src/sys/kern/kern_synch.c:550 > > > > > > #2 0xffffffff80babcb4 in sleepq_switch (wchan=3D3D0xfffff81854 > 3a > > = > > > 9e70, =3D > > > > > pri=3D3D64) at /usr/src/sys/kern/subr_sleepqueue.c:609 > > > > > > #3 0xffffffff80babb8c in sleepq_wait (wchan=3D3D, > p > > = > > > ri=3D3D<=3D > > > > > unavailable>) at /usr/src/sys/kern/subr_sleepqueue.c:660 > > > > > > #4 0xffffffff80b1c1b0 in sleeplk (lk=3D3Dlk@entry=3D3D0xfffff8 > 18 > > = > > > 543a9e70=3D > > > > > , flags=3D3Dflags@entry=3D3D2121728, ilk=3D3Dilk@entry=3D3D0x0, wmesg > = > > > =3D3Dwmesg@entry=3D > > > > > =3D3D0xffffffff8222a054 "zfs", pri=3D3D, pri@entry=3D3 > D6 > > = > > > 4, timo=3D3D=3D > > > > > timo@entry=3D3D6, queue=3D3D1) at /usr/src/sys/kern/kern_lock.c:310 > > > > > > #5 0xffffffff80b1a23f in lockmgr_slock_hard (lk=3D3D0xfffff818 > 54 > > = > > > 3a9e70=3D > > > > > , flags=3D3D2121728, ilk=3D3D, file=3D3D0xffffffff8125 > 44 > > = > > > fb "/usr/s=3D > > > > > rc/sys/kern/vfs_subr.c", line=3D3D3057, lwa=3D3D0x0) at /usr/src/sys/ > ke > > = > > > rn/kern_=3D > > > > > lock.c:705 > > > > > > #6 0xffffffff80c59ec3 in VOP_LOCK1 (vp=3D3D0xfffff818543a9e00, > f > > = > > > lags=3D > > > > > =3D3D2105344, file=3D3D0xffffffff812544fb "/usr/src/sys/kern/vfs_subr > .c > > = > > > ", line=3D > > > > > =3D3D3057) at ./vnode_if.h:1120 > > > > > > #7 _vn_lock (vp=3D3Dvp@entry=3D3D0xfffff818543a9e00, flags=3D3 > D2 > > = > > > 105344, fi=3D > > > > > le=3D3D, line=3D3D, line@entry=3D3D3057) at > / > > = > > > usr/src/sy=3D > > > > > s/kern/vfs_vnops.c:1815 > > > > > > #8 0xffffffff80c4173d in vget_finish (vp=3D3D0xfffff818543a9e0 > 0, > > = > > > flags=3D > > > > > =3D3D, vs=3D3Dvs@entry=3D3DVGET_USECOUNT) at /usr/src/sy > s/ > > = > > > kern/vfs_s=3D > > > > > ubr.c:3057 > > > > > > #9 0xffffffff80c1c9b7 in cache_lookup (dvp=3D3Ddvp@entry=3D3D0 > xf > > = > > > ffff802c=3D > > > > > d02ac40, vpp=3D3Dvpp@entry=3D3D0xfffffe046b20ac30, cnp=3D3Dcnp@entry= > 3D > > = > > > 3D0xfffffe04=3D > > > > > 6b20ac58, tsp=3D3Dtsp@entry=3D3D0x0, ticksp=3D3Dticksp@entry=3D3D0x0) > a > > = > > > t /usr/src/s=3D > > > > > ys/kern/vfs_cache.c:2086 > > > > > > #10 0xffffffff80c2150c in vfs_cache_lookup (ap=3D3D ut > > = > > > >) at =3D > > > > > /usr/src/sys/kern/vfs_cache.c:3068 > > > > > > #11 0xffffffff80c32c37 in VOP_LOOKUP (dvp=3D3D0xfffff802cd02ac4 > 0, > > = > > > vpp=3D > > > > > =3D3D0xfffffe046b20ac30, cnp=3D3D0xfffffe046b20ac58) at ./vnode_if.h: > 69 > > > > > > #12 vfs_lookup (ndp=3D3Dndp@entry=3D3D0xfffffe046b20abd8) at /u > sr > > = > > > /src/sys=3D > > > > > /kern/vfs_lookup.c:1266 > > > > > > #13 0xffffffff80c31ce1 in namei (ndp=3D3Dndp@entry=3D3D0xfffffe > 04 > > = > > > 6b20abd8=3D > > > > > ) at /usr/src/sys/kern/vfs_lookup.c:689 > > > > > > #14 0xffffffff80c52090 in kern_statat (td=3D3D0xfffffe02fb1d800 > 0, > > = > > > flag=3D > > > > > =3D3D, fd=3D3D-100, path=3D3D0xa75b480e070 no > > = > > > t access m=3D > > > > > emory at address 0xa75b480e070>, pathseg=3D3Dpathseg@entry=3D3DUIO_US > ER > > = > > > SPACE, s=3D > > > > > bp=3D3Dsbp@entry=3D3D0xfffffe046b20ad18) > > > > > > at /usr/src/sys/kern/vfs_syscalls.c:2441 > > > > > > #15 0xffffffff80c52797 in sys_fstatat (td=3D3D, ua > p= > > > =3D3D0xff=3D > > > > > fffe02fb1d8400) at /usr/src/sys/kern/vfs_syscalls.c:2419 > > > > > > #16 0xffffffff