From nobody Fri Sep 01 00:19:59 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RcJrp0fx9z4rpYj; Fri, 1 Sep 2023 00:32:26 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Received: from omta002.cacentral1.a.cloudfilter.net (omta002.cacentral1.a.cloudfilter.net [3.97.99.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "Client", Issuer "CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RcJrn43llz3cdY; Fri, 1 Sep 2023 00:32:25 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Authentication-Results: mx1.freebsd.org; dkim=none; spf=none (mx1.freebsd.org: domain of cy.schubert@cschubert.com has no SPF policy when checking 3.97.99.33) smtp.mailfrom=cy.schubert@cschubert.com; dmarc=none Received: from shw-obgw-4004a.ext.cloudfilter.net ([10.228.9.227]) by cmsmtp with ESMTP id bp28qFOK66NwhbrtKqC1pp; Fri, 01 Sep 2023 00:20:02 +0000 Received: from spqr.komquats.com ([70.66.152.170]) by cmsmtp with ESMTPA id brtIqfhf73fOSbrtJqX7aA; Fri, 01 Sep 2023 00:20:02 +0000 X-Authority-Analysis: v=2.4 cv=J8G5USrS c=1 sm=1 tr=0 ts=64f12e32 a=y8EK/9tc/U6QY+pUhnbtgQ==:117 a=y8EK/9tc/U6QY+pUhnbtgQ==:17 a=kj9zAlcOel0A:10 a=UttIx32zK-AA:10 a=VxmjJ2MpAAAA:8 a=YxBL1-UpAAAA:8 a=6I5d2MoRAAAA:8 a=NEAV23lmAAAA:8 a=EkcXrb_YAAAA:8 a=45_lsCV5zqt6W4nhKLUA:9 a=kbdmn2KRkxbFYjns:21 a=CjuIK1q_8ugA:10 a=o4MvjAAMTzsA:10 a=2tJrspVSXVUA:10 a=gvFz51Dc2IYA:10 a=7gXAzLPJhVmCkEl4_tsf:22 a=Ia-lj3WSrqcvXOmTRaiG:22 a=IjZwj45LgO3ly-622nXo:22 a=LK5xJRSDVpKd5WXXoEvA:22 Received: from slippy.cwsent.com (slippy [10.1.1.91]) by spqr.komquats.com (Postfix) with ESMTP id 312D33D9; Thu, 31 Aug 2023 17:20:00 -0700 (PDT) Received: by slippy.cwsent.com (Postfix, from userid 1000) id E463A1C3; Thu, 31 Aug 2023 17:19:59 -0700 (PDT) X-Mailer: exmh version 2.9.0 11/07/2018 with nmh-1.8+dev Reply-to: Cy Schubert From: Cy Schubert X-os: FreeBSD X-Sender: cy@cwsent.com X-URL: http://www.cschubert.com/ To: Cy Schubert cc: Warner Losh , Alexander Motin , Gleb Smirnoff , Drew Gallatin , Martin Matuska , src-committers , "" , "" Subject: Re: git: 315ee00fa961 - main - zfs: merge openzfs/zfs@804414aad Comments: In-reply-to Cy Schubert message dated "Thu, 31 Aug 2023 17:04:29 -0700." List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 31 Aug 2023 17:19:59 -0700 Message-Id: <20230901001959.E463A1C3@slippy.cwsent.com> X-CMAE-Envelope: MS4xfJllWI4cyg+rrtDisuPaG/JQuAk1YMZBcHNCb4wjTct2DgKHcKTsCvgAfZPNzrywPQLvDETeIFQNJn5SMmDBknV5QOFodmP46xewnh+YVKXmGsVNDRMu fVfBpzqWZrKQCHlkIdLtUq97lhx/QbjpSFcYizRpmqK2/w0y64GFkQEJDcQeBDxAKLE8+cvRbs1/roNszTWHMOuziAG+7kkwJiTtcmv0ELGH4fp5KUGXGV3J uzEDN/m6bUWzrmGLDerGzUzap5nj4LjCB2t2Qo1IYC1iBARWLAf7ZZ1cHkwp4aKKwm/9zlOv2RfRI0ZTh9mYUp3cJOxsAH1dul87O1Rz8DkHms8WD5qqqLiY Ijlk6ydrTw8ivJHe9HiQz/jZjEJ8+kE9DMOltFFsJGJYnINiF0pbVRh5n4iG11w/aQmvIh+X48dEhzGOCPooUfOQ1fwn5Q== X-Spamd-Bar: ++ X-Spamd-Result: default: False [2.87 / 15.00]; R_BAD_CTE_7BIT(3.50)[7bit]; FAKE_REPLY(1.00)[]; AUTH_NA(1.00)[]; NEURAL_HAM_MEDIUM(-0.99)[-0.995]; NEURAL_HAM_LONG(-0.87)[-0.868]; NEURAL_HAM_SHORT(-0.67)[-0.667]; MV_CASE(0.50)[]; RWL_MAILSPIKE_EXCELLENT(-0.40)[3.97.99.33:from]; MIME_GOOD(-0.10)[text/plain]; RCVD_IN_DNSWL_LOW(-0.10)[3.97.99.33:from]; TO_MATCH_ENVRCPT_SOME(0.00)[]; REPLYTO_EQ_FROM(0.00)[]; DMARC_NA(0.00)[cschubert.com]; FROM_HAS_DN(0.00)[]; MLMMJ_DEST(0.00)[dev-commits-src-main@freebsd.org,dev-commits-src-all@freebsd.org]; ARC_NA(0.00)[]; HAS_REPLYTO(0.00)[Cy.Schubert@cschubert.com]; RCVD_TLS_LAST(0.00)[]; TO_DN_ALL(0.00)[]; RCVD_COUNT_THREE(0.00)[4]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_NA(0.00)[no SPF record]; RCPT_COUNT_SEVEN(0.00)[9]; R_DKIM_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; ASN(0.00)[asn:16509, ipnet:3.96.0.0/15, country:US]; MIME_TRACE(0.00)[0:+]; GREYLIST(0.00)[pass,body] X-Rspamd-Queue-Id: 4RcJrn43llz3cdY Cy Schubert writes: > In message <20230831234437.0EB38133@slippy.cwsent.com>, Cy Schubert writes: > > In message <20230831234023.917DFAB@slippy.cwsent.com>, Cy Schubert writes: > > > In message l. > > c > > > om> > > > , Warner Losh writes: > > > > On Thu, Aug 31, 2023, 5:32 PM Cy Schubert w > ro > > te > > > = > > > > : > > > > > > > > > In message <20230831223526.DCB701A1@slippy.cwsent.com>, Cy Schubert > > > > > writes: > > > > > > In message , > > > > > Alexander > > > > > > Motin > > > > > > writes: > > > > > > > On 31.08.2023 13:53, Cy Schubert wrote: > > > > > > > > One thing that circumvents my two problems is reducing poudrier > e > > > > > bulk job > > > > > > s > > > > > > > > from 8 to 5 on my 4 core machines. > > > > > > > > > > > > > > Cy, I have no real evidences to think it is related, other than y > ou > > r > > > > > > > panics look like some memory corruptions, but could you try is pa > tc > > h: > > > > > > > https://github.com/openzfs/zfs/pull/15228 . If it won't do the > > > > > trick, > > > > > > > then I am out of ideas without additional input. > > > > > > > > > > > > So far so good. Poudriere has been running with a decent -J jobs on > b > > ot > > > = > > > > h > > > > > > machines for over an hour. I'll let you know if they survive the ni > gh > > t. > > > > > It > > > > > > can take some time before the panics happen though. > > > > > > > > > > > > The problem is more likely to occur when there are a lot of small > > > > > package > > > > > > builds than large long running jobs, probably because of the parall > el > > > > > ZFS > > > > > > dataset creations, deletions, and rollbacks. > > > > > > > > > > > > > > > > > > > > Gleb, you may try to add this too, just as a choice between impos > si > > bl > > > = > > > > e > > > > > > > and improbable. > > > > > > > > > > > > > > -- > > > > > > > Alexander Motin > > > > > > > > > > > > > > > > > > -- > > > > > > Cheers, > > > > > > Cy Schubert > > > > > > FreeBSD UNIX: Web: https://FreeBSD.org > > > > > > NTP: Web: https://nwtime.org > > > > > > > > > > > > e^(i*pi)+1=3D0 > > > > > > > > > > > > > > > > > > > > > > One of the two machines is hung. > > > > > > > > > > cwfw# ping bob > > > > > PING bob (10.1.1.7): 56 data bytes > > > > > ^C > > > > > --- bob ping statistics --- > > > > > 2 packets transmitted, 0 packets received, 100.0% packet loss > > > > > cwfw# console bob > > > > > [Enter `^Ec?' for help] > > > > > [halt sent] > > > > > KDB: enter: Break to debugger > > > > > [ thread pid 31259 tid 100913 ] > > > > > Stopped at kdb_break+0x48: movq $0,0xa1069d(%rip) > > > > > db> bt > > > > > Tracing pid 31259 tid 100913 td 0xfffffe00c4eca000 > > > > > kdb_break() at kdb_break+0x48/frame 0xfffffe00c53ef2d0 > > > > > uart_intr() at uart_intr+0xf7/frame 0xfffffe00c53ef310 > > > > > intr_event_handle() at intr_event_handle+0x12b/frame 0xfffffe00c53ef3 > 80 > > > > > intr_execute_handlers() at intr_execute_handlers+0x63/frame > > > > > 0xfffffe00c53ef3b0 > > > > > Xapic_isr1() at Xapic_isr1+0xdc/frame 0xfffffe00c53ef3b0 > > > > > --- interrupt, rip =3D 0xffffffff806d5c70, rsp =3D 0xfffffe00c53ef480 > , > > rb > > > = > > > > p =3D > > > > > 0xfffffe00c53ef480 --- > > > > > getbinuptime() at getbinuptime+0x30/frame 0xfffffe00c53ef480 > > > > > arc_access() at arc_access+0x250/frame 0xfffffe00c53ef4d0 > > > > > arc_buf_access() at arc_buf_access+0xd0/frame 0xfffffe00c53ef4f0 > > > > > dbuf_hold_impl() at dbuf_hold_impl+0xf3/frame 0xfffffe00c53ef580 > > > > > dbuf_hold() at dbuf_hold+0x25/frame 0xfffffe00c53ef5b0 > > > > > dnode_hold_impl() at dnode_hold_impl+0x194/frame 0xfffffe00c53ef670 > > > > > dmu_bonus_hold() at dmu_bonus_hold+0x20/frame 0xfffffe00c53ef6a0 > > > > > zfs_zget() at zfs_zget+0x20d/frame 0xfffffe00c53ef750 > > > > > zfs_dirent_lookup() at zfs_dirent_lookup+0x16d/frame 0xfffffe00c53ef7 > a0 > > > > > zfs_dirlook() at zfs_dirlook+0x7f/frame 0xfffffe00c53ef7d0 > > > > > zfs_lookup() at zfs_lookup+0x3c0/frame 0xfffffe00c53ef8a0 > > > > > zfs_freebsd_cachedlookup() at zfs_freebsd_cachedlookup+0x67/frame > > > > > 0xfffffe00c53ef9e0 > > > > > vfs_cache_lookup() at vfs_cache_lookup+0xa6/frame 0xfffffe00c53efa30 > > > > > vfs_lookup() at vfs_lookup+0x457/frame 0xfffffe00c53efac0 > > > > > namei() at namei+0x2e1/frame 0xfffffe00c53efb20 > > > > > vn_open_cred() at vn_open_cred+0x505/frame 0xfffffe00c53efca0 > > > > > kern_openat() at kern_openat+0x287/frame 0xfffffe00c53efdf0 > > > > > ia32_syscall() at ia32_syscall+0x156/frame 0xfffffe00c53eff30 > > > > > int0x80_syscall_common() at int0x80_syscall_common+0x9c/frame 0xffff8 > 9d > > c > > > > > db> > > > > > > > > > > I'll let it continue. Hopefully the watchdog timer will pop and we ge > t > > a > > > > > dump. > > > > > > > > > > > > > > > > > Might also be interesting to see if this moves around or is really hung > > > > getting the time. I suspect it's live lock given this traceback. > > > > > > It's moving around. > > > > > > > > > db> c > > > [halt sent] > > > KDB: enter: Break to debugger > > > [ thread pid 31259 tid 100913 ] > > > Stopped at kdb_break+0x48: movq $0,0xa1069d(%rip) > > > db> bt > > > Tracing pid 31259 tid 100913 td 0xfffffe00c4eca000 > > > kdb_break() at kdb_break+0x48/frame 0xfffffe00c53ef410 > > > uart_intr() at uart_intr+0xf7/frame 0xfffffe00c53ef450 > > > intr_event_handle() at intr_event_handle+0x12b/frame 0xfffffe00c53ef4c0 > > > intr_execute_handlers() at intr_execute_handlers+0x63/frame > > > 0xfffffe00c53ef4f0 > > > Xapic_isr1() at Xapic_isr1+0xdc/frame 0xfffffe00c53ef4f0 > > > --- interrupt, rip = 0xffffffff807117c4, rsp = 0xfffffe00c53ef5c0, rbp = > > > 0xfffffe00c53ef5c0 --- > > > lock_delay() at lock_delay+0x14/frame 0xfffffe00c53ef5c0 > > > _sx_xlock_hard() at _sx_xlock_hard+0x1a9/frame 0xfffffe00c53ef660 > > > _sx_xlock() at _sx_xlock+0xb0/frame 0xfffffe00c53ef6a0 > > > zfs_zget() at zfs_zget+0x1fa/frame 0xfffffe00c53ef750 > > > zfs_dirent_lookup() at zfs_dirent_lookup+0x16d/frame 0xfffffe00c53ef7a0 > > > zfs_dirlook() at zfs_dirlook+0x7f/frame 0xfffffe00c53ef7d0 > > > zfs_lookup() at zfs_lookup+0x3c0/frame 0xfffffe00c53ef8a0 > > > zfs_freebsd_cachedlookup() at zfs_freebsd_cachedlookup+0x67/frame > > > 0xfffffe00c53ef9e0 > > > vfs_cache_lookup() at vfs_cache_lookup+0xa6/frame 0xfffffe00c53efa30 > > > vfs_lookup() at vfs_lookup+0x457/frame 0xfffffe00c53efac0 > > > namei() at namei+0x2e1/frame 0xfffffe00c53efb20 > > > vn_open_cred() at vn_open_cred+0x505/frame 0xfffffe00c53efca0 > > > kern_openat() at kern_openat+0x287/frame 0xfffffe00c53efdf0 > > > ia32_syscall() at ia32_syscall+0x156/frame 0xfffffe00c53eff30 > > > int0x80_syscall_common() at int0x80_syscall_common+0x9c/frame 0xffff89dc > > > db> c > > > > It seems to have settled there for now. > > I manually panicked the machine. It captured a good dump of the above > backtrace. > > This is the amd64 machine that was building i386 packages under poudriere. > I purposely set -J high to stress it enough to panic the machine. > > This kerc¦`bfnel includes amotin's patch from https://github.com/openzfs/zfs/pull > /15228. This manually triggered panic is different from the one it suffered > two days ago. > > The machine building amd64 packages is still running nicely but it's > building some large packages. Because of this there's not as much stress on > zfs. The panicked machine was building a lot of small packages. As such it > was doing a lot of zfs clones, destroys, and rollbacks. An initial look at the dump: (kgdb) bt #0 __curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:57 #1 doadump (textdump=textdump@entry=1) at /opt/src/git-src/sys/kern/kern_shutdown.c:405 #2 0xffffffff806c1b30 in kern_reboot (howto=260) at /opt/src/git-src/sys/kern/kern_shutdown.c:526 #3 0xffffffff806c202f in vpanic (fmt=0xffffffff80b8653c "from debugger", ap=ap@entry=0xfffffe00c53eef30) at /opt/src/git-src/sys/kern/kern_shutdown.c:970 #4 0xffffffff806c1dd3 in panic (fmt=) at /opt/src/git-src/sys/kern/kern_shutdown.c:894 #5 0xffffffff80413937 in db_panic (addr=, have_addr=, count=, modif=) at /opt/src/git-src/sys/ddb/db_command.c:531 #6 0xffffffff80412ddd in db_command (last_cmdp=, cmd_table=, dopager=true) at /opt/src/git-src/sys/ddb/db_command.c:504 #7 0xffffffff80412a9d in db_command_loop () at /opt/src/git-src/sys/ddb/db_command.c:551 #8 0xffffffff80416176 in db_trap (type=, code=) at /opt/src/git-src/sys/ddb/db_main.c:268 #9 0xffffffff80710993 in kdb_trap (type=type@entry=3, code=code@entry=0, tf=tf@entry=0xfffffe00c53ef340) at /opt/src/git-src/sys/kern/subr_kdb.c: 790 #10 0xffffffff80a72cb9 in trap (frame=0xfffffe00c53ef340) at /opt/src/git-src/sys/amd64/amd64/trap.c:608 #11 #12 kdb_enter (why=, msg=) at /opt/src/git-src/sys/kern/subr_kdb.c:556 #13 kdb_break () at /opt/src/git-src/sys/kern/subr_kdb.c:346 #14 0xffffffff804dc7f7 in uart_intr_break (arg=0xfffff80006b84400) at /opt/src/git-src/sys/dev/uart/uart_core.c:298 #15 uart_intr (arg=0xfffff80006b84400) at /opt/src/git-src/sys/dev/uart/uart_core.c:439 #16 0xffffffff8067ae8b in intr_event_handle (ie=ie@entry=0xfffff80005814c00, frame=frame@entry=0xfffffe00c53ef500) at /opt/src/git-src/sys/kern/kern_intr.c:1410 #17 0xffffffff80a37653 in intr_execute_handlers (isrc=0xfffff8000580c9c8, --Type for more, q to quit, c to continue without paging--c frame=0xfffffe00c53ef500) at /opt/src/git-src/sys/x86/x86/intr_machdep.c:354 #18 #19 lock_delay (la=la@entry=0xfffffe00c53ef600) at /opt/src/git-src/sys/kern/subr_lock.c:124 #20 0xffffffff806ccfb9 in _sx_xlock_hard (sx=sx@entry=0xfffff80123712340, x=, opts=opts@entry=0, file=file@entry=0xffffffff81910597 "/opt/src/git-src/sys/contrib/openzfs /module/os/freebsd/zfs/zfs_znode.c", line=line@entry=949) at /opt/src/git-src/sys/kern/kern_sx.c:693 #21 0xffffffff806ccdb0 in _sx_xlock (sx=sx@entry=0xfffff80123712340, opts=opts@entry=0, file=0xffffffff81910597 "/opt/src/git-src/sys/contrib/openzfs/module/os/ freebsd/zfs/zfs_znode.c", line=line@entry=949) at /opt/src/git-src/sys/kern/kern_sx.c:330 #22 0xffffffff816b4ffa in zfs_zget (zfsvfs=zfsvfs@entry=0xfffff80123712000, obj_num=80130, zpp=zpp@entry=0xfffffe00c53ef760) at /opt/src/git-src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_znode. c:949 #23 0xffffffff816a018d in zfs_dirent_lookup (dzp=dzp@entry=0xfffff80131e6dae 0, name=0xfffffe00c53ef8c0 "xlocale", zpp=zpp@entry=0xfffffe00c53ef7b0, flag=flag@entry=2) at /opt/src/git-src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_dir.c: 190 #24 0xffffffff816a028f in zfs_dirlook (dzp=dzp@entry=0xfffff80131e6dae0, name=0xfffffe00c4bbdc80 "", name@entry=0xfffffe00c53ef8c0 "xlocale", zpp=zpp@entry=0xfffffe00c53ef860) at /opt/src/git-src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_dir.c: 246 #25 0xffffffff816b0d20 in zfs_lookup (dvp=0xfffff800c2da9380, nm=nm@entry=0xfffffe00c53ef8c0 "xlocale", vpp=0xfffffe00c53efd10, cnp=cnp@entry=0xfffffe00c53efd38, nameiop=0, cr=, flags=0, cached=1) at /opt/src/git-src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_ os.c:923 #26 0xffffffff816ab907 in zfs_freebsd_lookup (ap=0xfffffe00c53ef9f0, cached=1) at /opt/src/git-src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_ os.c:4558 #27 zfs_freebsd_cachedlookup (ap=0xfffffe00c53ef9f0) at /opt/src/git-src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_ os.c:4566 #28 0xffffffff80797d86 in VOP_CACHEDLOOKUP (dvp=0xfffff800c2da9380, vpp=0xfffffe00c53efd10, cnp=0xfffffe00c53efd38) at ./vnode_if.h:101 #29 vfs_cache_lookup (ap=) at /opt/src/git-src/sys/kern/vfs_cache.c:3068 #30 0xffffffff807a9487 in VOP_LOOKUP (dvp=0xfffff800c2da9380, vpp=0xfffffe00c53efd10, cnp=0xfffffe00c53efd38) at ./vnode_if.h:67 #31 vfs_lookup (ndp=ndp@entry=0xfffffe00c53efcb8) at /opt/src/git-src/sys/kern/vfs_lookup.c:1252 #32 0xffffffff807a8531 in namei (ndp=ndp@entry=0xfffffe00c53efcb8) at /opt/src/git-src/sys/kern/vfs_lookup.c:684 #33 0xffffffff807d0095 in vn_open_cred (ndp=ndp@entry=0xfffffe00c53efcb8, flagp=flagp@entry=0xfffffe00c53efdc4, cmode=cmode@entry=0, vn_open_flags=vn_open_flags@entry=16, cred=0xfffff801ea492800, fp=0xfffff801b9a54690) at /opt/src/git-src/sys/kern/vfs_vnops.c:331 #34 0xffffffff807c61f7 in kern_openat (td=0xfffffe00c4eca000, fd=-100, path=0xffff8a24 , pathseg=UIO_USERSPACE, flags=1048577, mode=) at /opt/src/git-src/sys/kern/vfs_syscalls.c:1165 #35 0xffffffff80a993a6 in syscallenter (td=0xfffffe00c4eca000) at /opt/src/git-src/sys/amd64/ia32/../../kern/subr_syscall.c:187 #36 ia32_syscall (frame=0xfffffe00c53eff40) at /opt/src/git-src/sys/amd64/ia32/ia32_syscall.c:222 #37 0xffffffff80a47f8f in int0x80_syscall_common () at /opt/src/git-src/sys/amd64/ia32/ia32_exception.S:75 #38 0x0000000026a12000 in ?? () #39 0x0000000000000000 in ?? () (kgdb) > > I'll take a look at the dump after dinner tonight. -- Cheers, Cy Schubert FreeBSD UNIX: Web: https://FreeBSD.org NTP: Web: https://nwtime.org e^(i*pi)+1=0