From nobody Thu Aug 31 23:37:45 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RcHf05mhJz4rlWw for ; Thu, 31 Aug 2023 23:38:00 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RcHf0186qz3Txp for ; Thu, 31 Aug 2023 23:38:00 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-wr1-x42e.google.com with SMTP id ffacd0b85a97d-31c93d2a24fso1101543f8f.2 for ; Thu, 31 Aug 2023 16:38:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1693525077; x=1694129877; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=fDZsTig4/XXGa+HY3Fd7OR70XFCzSqOm9rvE3HjGjAg=; b=G1rFrI1zZagu5a8RgjYQ9GmbjmhtpvdllFgnbeMGS8LpDO9mHksByZSLWTIpMfKFSU zxm63dtWJyLo8559XNe3Vwedx2s84/8HWmmikCQKaQxxCM4KPemQnwiaXT5rMy4mYnrD LDDbyJwy4c3yyHBn4Wm/Va0qpyG3KZd3UxlhkHbiTaoodxBa1pze0a/tZonqZn9HwWjQ ggxG2Kj1ahvCfPZrHaXDtvGGWzORrRjmcVYGUA/ZQyIjXqmdSi60D6mX5E3ZLSXdW8Nq Gfwfe+0IJmeR3WNVK6jiLdMT60T+COEQqnx35sBOvA6C7GiVQLz7XX0L+V9fY24vxg/l zvrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693525077; x=1694129877; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fDZsTig4/XXGa+HY3Fd7OR70XFCzSqOm9rvE3HjGjAg=; b=gWhzyfucK1LMY9vCETPtZKn6XGWIoYadQT1dTWgEyUA92fymjIAWdymWVELDx5XZly +rWMFznhcJWZiKSclX09nQSe9Gyv9wA4wbNV+IthOdOLAkjWCEG8uOk1IAjXNrRkOduC LtINIyy2Mqqjj7gGMoT2bsU4Kvcj3ZLbwUNn3tVROUXJ+IUyMnacJL9VlISkF8U41gHq xayn07VTE3Xu+aEdDcmFjrGS7ewVm8gynDC/wYtDlhuxCvqIUEUtjWNv21M2vPnkOf9N qVijpowhhxTArcce59OglDP2+oe53ENYUVJ70nmAFFLjXfH35OSyqcNuZOPW/p2VotoU oVgw== X-Gm-Message-State: AOJu0YzpuSj2spI13BoD247etI211wcUz7r884wGgPtnC0KB8nkYnOGv eifMmp9pd8QpKcfi58sigmaXcHR9rhJfRqw5Bj+EyA== X-Google-Smtp-Source: AGHT+IH6ECYRtcCw0Pa6sWuUVA3jxriLxcSjMyQrwU3NMHKvxSmyZ6Gm+ReqMSLC2GI9PjmEO8L9AehwStNuOWh5Ntw= X-Received: by 2002:a5d:5651:0:b0:318:c108:67b0 with SMTP id j17-20020a5d5651000000b00318c10867b0mr774738wrw.48.1693525077425; Thu, 31 Aug 2023 16:37:57 -0700 (PDT) List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 References: <202308270509.37R596B5048298@gitrepo.freebsd.org> <07faf861-9186-47d1-992a-91d483ea4e9c@app.fastmail.com> <1db726d4-32c9-e1b8-51d6-981aa51b7825@FreeBSD.org> <20230831175350.981F1D5@slippy.cwsent.com> <20230831223526.DCB701A1@slippy.cwsent.com> <20230831233228.9935BA8@slippy.cwsent.com> In-Reply-To: <20230831233228.9935BA8@slippy.cwsent.com> From: Warner Losh Date: Thu, 31 Aug 2023 17:37:45 -0600 Message-ID: Subject: Re: git: 315ee00fa961 - main - zfs: merge openzfs/zfs@804414aad To: Cy Schubert Cc: Alexander Motin , Gleb Smirnoff , Drew Gallatin , Martin Matuska , src-committers , "" , "" Content-Type: multipart/alternative; boundary="000000000000a89ba10604408858" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Queue-Id: 4RcHf0186qz3Txp --000000000000a89ba10604408858 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Aug 31, 2023, 5:32 PM Cy Schubert wrote= : > In message <20230831223526.DCB701A1@slippy.cwsent.com>, Cy Schubert > writes: > > In message , > Alexander > > Motin > > writes: > > > On 31.08.2023 13:53, Cy Schubert wrote: > > > > One thing that circumvents my two problems is reducing poudriere > bulk job > > s > > > > from 8 to 5 on my 4 core machines. > > > > > > Cy, I have no real evidences to think it is related, other than your > > > panics look like some memory corruptions, but could you try is patch: > > > https://github.com/openzfs/zfs/pull/15228 . If it won't do the > trick, > > > then I am out of ideas without additional input. > > > > So far so good. Poudriere has been running with a decent -J jobs on bot= h > > machines for over an hour. I'll let you know if they survive the night. > It > > can take some time before the panics happen though. > > > > The problem is more likely to occur when there are a lot of small > package > > builds than large long running jobs, probably because of the parallel > ZFS > > dataset creations, deletions, and rollbacks. > > > > > > > > Gleb, you may try to add this too, just as a choice between impossibl= e > > > and improbable. > > > > > > -- > > > Alexander Motin > > > > > > -- > > Cheers, > > Cy Schubert > > FreeBSD UNIX: Web: https://FreeBSD.org > > NTP: Web: https://nwtime.org > > > > e^(i*pi)+1=3D0 > > > > > > One of the two machines is hung. > > cwfw# ping bob > PING bob (10.1.1.7): 56 data bytes > ^C > --- bob ping statistics --- > 2 packets transmitted, 0 packets received, 100.0% packet loss > cwfw# console bob > [Enter `^Ec?' for help] > [halt sent] > KDB: enter: Break to debugger > [ thread pid 31259 tid 100913 ] > Stopped at kdb_break+0x48: movq $0,0xa1069d(%rip) > db> bt > Tracing pid 31259 tid 100913 td 0xfffffe00c4eca000 > kdb_break() at kdb_break+0x48/frame 0xfffffe00c53ef2d0 > uart_intr() at uart_intr+0xf7/frame 0xfffffe00c53ef310 > intr_event_handle() at intr_event_handle+0x12b/frame 0xfffffe00c53ef380 > intr_execute_handlers() at intr_execute_handlers+0x63/frame > 0xfffffe00c53ef3b0 > Xapic_isr1() at Xapic_isr1+0xdc/frame 0xfffffe00c53ef3b0 > --- interrupt, rip =3D 0xffffffff806d5c70, rsp =3D 0xfffffe00c53ef480, rb= p =3D > 0xfffffe00c53ef480 --- > getbinuptime() at getbinuptime+0x30/frame 0xfffffe00c53ef480 > arc_access() at arc_access+0x250/frame 0xfffffe00c53ef4d0 > arc_buf_access() at arc_buf_access+0xd0/frame 0xfffffe00c53ef4f0 > dbuf_hold_impl() at dbuf_hold_impl+0xf3/frame 0xfffffe00c53ef580 > dbuf_hold() at dbuf_hold+0x25/frame 0xfffffe00c53ef5b0 > dnode_hold_impl() at dnode_hold_impl+0x194/frame 0xfffffe00c53ef670 > dmu_bonus_hold() at dmu_bonus_hold+0x20/frame 0xfffffe00c53ef6a0 > zfs_zget() at zfs_zget+0x20d/frame 0xfffffe00c53ef750 > zfs_dirent_lookup() at zfs_dirent_lookup+0x16d/frame 0xfffffe00c53ef7a0 > zfs_dirlook() at zfs_dirlook+0x7f/frame 0xfffffe00c53ef7d0 > zfs_lookup() at zfs_lookup+0x3c0/frame 0xfffffe00c53ef8a0 > zfs_freebsd_cachedlookup() at zfs_freebsd_cachedlookup+0x67/frame > 0xfffffe00c53ef9e0 > vfs_cache_lookup() at vfs_cache_lookup+0xa6/frame 0xfffffe00c53efa30 > vfs_lookup() at vfs_lookup+0x457/frame 0xfffffe00c53efac0 > namei() at namei+0x2e1/frame 0xfffffe00c53efb20 > vn_open_cred() at vn_open_cred+0x505/frame 0xfffffe00c53efca0 > kern_openat() at kern_openat+0x287/frame 0xfffffe00c53efdf0 > ia32_syscall() at ia32_syscall+0x156/frame 0xfffffe00c53eff30 > int0x80_syscall_common() at int0x80_syscall_common+0x9c/frame 0xffff89dc > db> > > I'll let it continue. Hopefully the watchdog timer will pop and we get a > dump. > Might also be interesting to see if this moves around or is really hung getting the time. I suspect it's live lock given this traceback. Warner --=20 > Cheers, > Cy Schubert > FreeBSD UNIX: Web: https://FreeBSD.org > NTP: Web: https://nwtime.org > > e^(i*pi)+1=3D0 > > > J=EF=BE=90 =EF=BD=A4 =EF=BF=BD > --000000000000a89ba10604408858 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Thu, Aug 31, 2023, 5:32 PM Cy Schubert <Cy.Schubert@cschubert.com> wro= te:
In message <20230831223526.DCB701A1@slippy.cwsent.com>, Cy Schubert write= s:
> In message <a5c51f3f-8c7f-8bd5-f718-72bc33fe22ed@FreeBSD.org>, A= lexander
> Motin
> writes:
> > On 31.08.2023 13:53, Cy Schubert wrote:
> > > One thing that circumvents my two problems is reducing poudr= iere bulk job
> s
> > > from 8 to 5 on my 4 core machines.
> >
> > Cy, I have no real evidences to think it is related, other than y= our
> > panics look like some memory corruptions, but could you try is pa= tch:
> > https://github.com/openzfs/zfs/pull/15= 228 .=C2=A0 If it won't do the trick,
> > then I am out of ideas without additional input.
>
> So far so good. Poudriere has been running with a decent -J jobs on bo= th
> machines for over an hour. I'll let you know if they survive the n= ight. It
> can take some time before the panics happen though.
>
> The problem is more likely to occur when there are a lot of small pack= age
> builds than large long running jobs, probably because of the parallel = ZFS
> dataset creations, deletions, and rollbacks.
>
> >
> > Gleb, you may try to add this too, just as a choice between impos= sible
> > and improbable.
> >
> > --
> > Alexander Motin
>
>
> --
> Cheers,
> Cy Schubert <Cy.Schubert@cschubert.com>
> FreeBSD UNIX:=C2=A0 <cy@FreeBSD.org>=C2=A0 =C2=A0Web:=C2=A0 https://FreeBSD.org
> NTP:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<cy@nwtime.org>=C2= =A0 =C2=A0 Web:=C2=A0 https://nwtime.org
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0e^(i*pi)+1=3D0
>
>

One of the two machines is hung.

cwfw# ping bob
PING bob (10.1.1.7): 56 data bytes
^C
--- bob ping statistics ---
2 packets transmitted, 0 packets received, 100.0% packet loss
cwfw# console bob
[Enter `^Ec?' for help]
[halt sent]
KDB: enter: Break to debugger
[ thread pid 31259 tid 100913 ]
Stopped at=C2=A0 =C2=A0 =C2=A0 kdb_break+0x48: movq=C2=A0 =C2=A0 $0,0xa1069= d(%rip)
db> bt
Tracing pid 31259 tid 100913 td 0xfffffe00c4eca000
kdb_break() at kdb_break+0x48/frame 0xfffffe00c53ef2d0
uart_intr() at uart_intr+0xf7/frame 0xfffffe00c53ef310
intr_event_handle() at intr_event_handle+0x12b/frame 0xfffffe00c53ef380
intr_execute_handlers() at intr_execute_handlers+0x63/frame
0xfffffe00c53ef3b0
Xapic_isr1() at Xapic_isr1+0xdc/frame 0xfffffe00c53ef3b0
--- interrupt, rip =3D 0xffffffff806d5c70, rsp =3D 0xfffffe00c53ef480, rbp = =3D
0xfffffe00c53ef480 ---
getbinuptime() at getbinuptime+0x30/frame 0xfffffe00c53ef480
arc_access() at arc_access+0x250/frame 0xfffffe00c53ef4d0
arc_buf_access() at arc_buf_access+0xd0/frame 0xfffffe00c53ef4f0
dbuf_hold_impl() at dbuf_hold_impl+0xf3/frame 0xfffffe00c53ef580
dbuf_hold() at dbuf_hold+0x25/frame 0xfffffe00c53ef5b0
dnode_hold_impl() at dnode_hold_impl+0x194/frame 0xfffffe00c53ef670
dmu_bonus_hold() at dmu_bonus_hold+0x20/frame 0xfffffe00c53ef6a0
zfs_zget() at zfs_zget+0x20d/frame 0xfffffe00c53ef750
zfs_dirent_lookup() at zfs_dirent_lookup+0x16d/frame 0xfffffe00c53ef7a0
zfs_dirlook() at zfs_dirlook+0x7f/frame 0xfffffe00c53ef7d0
zfs_lookup() at zfs_lookup+0x3c0/frame 0xfffffe00c53ef8a0
zfs_freebsd_cachedlookup() at zfs_freebsd_cachedlookup+0x67/frame
0xfffffe00c53ef9e0
vfs_cache_lookup() at vfs_cache_lookup+0xa6/frame 0xfffffe00c53efa30
vfs_lookup() at vfs_lookup+0x457/frame 0xfffffe00c53efac0
namei() at namei+0x2e1/frame 0xfffffe00c53efb20
vn_open_cred() at vn_open_cred+0x505/frame 0xfffffe00c53efca0
kern_openat() at kern_openat+0x287/frame 0xfffffe00c53efdf0
ia32_syscall() at ia32_syscall+0x156/frame 0xfffffe00c53eff30
int0x80_syscall_common() at int0x80_syscall_common+0x9c/frame 0xffff89dc db>

I'll let it continue. Hopefully the watchdog timer will pop and we get = a
dump.


Might also be interesting to see if this m= oves around or is really hung getting the time. I suspect it's live loc= k given this traceback.

= Warner

--
Cheers,
Cy Schubert <Cy.Schubert@cschubert.com>
FreeBSD UNIX:=C2=A0 <cy@FreeBSD.org>=C2=A0 =C2=A0Web:=C2=A0 ht= tps://FreeBSD.org
NTP:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<cy@nwtime.org>=C2=A0 =C2= =A0 Web:=C2=A0 https://nwtime.org

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 e^(i*pi)+1=3D0


=C2=A0J=EF=BE=90 =EF=BD=A4=C2=A0 =C2=A0=EF=BF=BD
--000000000000a89ba10604408858--