From nobody Tue Nov 14 15:21:45 2023 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4SV94z2fSsz51J0X for ; Tue, 14 Nov 2023 15:21:55 +0000 (UTC) (envelope-from SRS0=Pv4d=G3=klop.ws=ronald-lists@realworks.nl) Received: from smtp-relay-int.realworks.nl (smtp-relay-int.realworks.nl [194.109.157.24]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4SV94y6F80z3Zw7; Tue, 14 Nov 2023 15:21:54 +0000 (UTC) (envelope-from SRS0=Pv4d=G3=klop.ws=ronald-lists@realworks.nl) Authentication-Results: mx1.freebsd.org; none Date: Tue, 14 Nov 2023 16:21:45 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=klop.ws; s=rw2; t=1699975306; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=pSS2kU5o/7lJjo5gS6AKRqy9ppWor89irnz+D+3QkEs=; b=D0j75Sd7hS1DPVEyz3Dw3ZV8zhaMIfKg9k7bi7B4o+QlOgsH+1XQPgRhxlbuKILhrpPnXB dm9f0nXi81ZroWRtA9kfQflPUaBmhj9/i3m9SWrfgkbZ0O3kLn1n3CVUbkL2JK+9Jaevuc ocgYuoA/1zWdXP7nv0def5QcvHjaVX9mZwgx3nRLctMqus1xjn9mp8fGu0l9UpCS5y0WVw yyxrQ4D4xUFnffCaglnmOJo+5CdZTCtpLAlXK5fBFtNA7vOM5uRFm4HybNRQdAnr5WRKj/ Gws+A5dDC93xnumLL7OhZ4woiu69oOJXAzZeBN7BMYUll3rKPjXU4s3DtHbRow== From: Ronald Klop To: Ronald Klop Cc: Alexander Motin , current@freebsd.org, Konstantin Belousov Message-ID: <1099306217.8124.1699975305932@localhost> In-Reply-To: <1900239445.5968.1699966796547@localhost> References: <349700057.3452.1699611152405@localhost> <1900239445.5968.1699966796547@localhost> Subject: Re: crash zfs_clone_range() List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_8123_18781866.1699975305923" X-Mailer: Realworks (679.16) Importance: Normal X-Priority: 3 (Normal) X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:3265, ipnet:194.109.0.0/16, country:NL] X-Rspamd-Queue-Id: 4SV94y6F80z3Zw7 ------=_Part_8123_18781866.1699975305923 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Van: Ronald Klop Datum: dinsdag, 14 november 2023 13:59 Aan: Konstantin Belousov CC: Alexander Motin , current@freebsd.org Onderwerp: Re: crash zfs_clone_range() > > Response below > > Van: Konstantin Belousov > Datum: zondag, 12 november 2023 19:47 > Aan: Alexander Motin > CC: Ronald Klop , current@freebsd.org > Onderwerp: Re: crash zfs_clone_range() >> >> On Sun, Nov 12, 2023 at 11:51:40AM -0500, Alexander Motin wrote: >> > Hi Ronald, >> > >> > As I can see, the clone request to ZFS came through nullfs, and it crashed >> > immediately on enter. I've never been a VFS layer expert, but to me it may >> > be a nullfs problem, not zfs. Is there chance you was (un-)mounting >> > something when this happened? >> It is not nullfs issue, I believe, but the lack of the busy reference on the >> upper mount. I think https://reviews.freebsd.org/D42554 should cover it. >> >> > >> > On 10.11.2023 05:12, Ronald Klop wrote: >> > > Hi, >> > > >> > > Had this crash today on RPI4/15-CURRENT. >> > > >> > > FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19 >> > > main-b0203aaa46-dirty: Sat Nov 4 11:48:33 CET 2023 ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG >> > > arm64 >> > > >> > > $ sysctl -a | grep bclon >> > > vfs.zfs.bclone_enabled: 1 >> > > >> > > I started a jail with poudriere to build a package. The jail uses null >> > > mounts over ZFS. >> > > >> > > [root]# cu -s 115200 -l /dev/cuaU0 >> > > Connected >> > > >> > > db> bt >> > > Tracing pid 95213 tid 100438 td 0xffff0000e1e97900 >> > > db_trace_self() at db_trace_self >> > > db_stack_trace() at db_stack_trace+0x120 >> > > db_command() at db_command+0x2e4 >> > > db_command_loop() at db_command_loop+0x58 >> > > db_trap() at db_trap+0x100 >> > > kdb_trap() at kdb_trap+0x334 >> > > handle_el1h_sync() at handle_el1h_sync+0x18 >> > > --- exception, esr 0xf2000000 >> > > kdb_enter() at kdb_enter+0x48 >> > > vpanic() at vpanic+0x1dc >> > > panic() at panic+0x48 >> > > data_abort() at data_abort+0x2fc >> > > handle_el1h_sync() at handle_el1h_sync+0x18 >> > > --- exception, esr 0x96000004 >> > > rms_rlock() at rms_rlock+0x1c >> > > zfs_clone_range() at zfs_clone_range+0x68 >> > > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c >> > > null_bypass() at null_bypass+0x118 >> > > vn_copy_file_range() at vn_copy_file_range+0x18c >> > > kern_copy_file_range() at kern_copy_file_range+0x36c >> > > sys_copy_file_range() at sys_copy_file_range+0x8c >> > > do_el0_sync() at do_el0_sync+0x634 >> > > handle_el0_sync() at handle_el0_sync+0x48 >> > > --- exception, esr 0x56000000 >> > > >> > > >> > > Oh.. While typing this I rebooted the machine and it happened again. I >> > > didn't start anything in particular although the machine runs some >> > > jails. >> > > >> > > x0: 0x00000000000000e0 >> > > x1: 0xffffa00090317a48 >> > > x2: 0xffffa000f79d4f00 >> > > x3: 0xffffa000c61a44a8 >> > > x4: 0xffff0000deefe460 ($d.2 + 0xdd776560) >> > > x5: 0xffffa001250e4c00 >> > > x6: 0xffff0000e54025b5 ($d.5 + 0xc) >> > > x7: 0x000000000000030a >> > > x8: 0xffff0000e1559000 ($d.2 + 0xdfdd1100) >> > > x9: 0x0000000000000001 >> > > x10: 0x0000000000000000 >> > > x11: 0x0000000000000001 >> > > x12: 0x0000000000000002 >> > > x13: 0x0000000000000000 >> > > x14: 0x0000000000000001 >> > > x15: 0x0000000000000000 >> > > x16: 0xffff0000016dce88 (__stop_set_modmetadata_set + 0x1310) >> > > x17: 0xffff0000004e0d44 (rms_rlock + 0x0) >> > > x18: 0xffff0000deefe280 ($d.2 + 0xdd776380) >> > > x19: 0x0000000000000000 >> > > x20: 0xffff0000deefe460 ($d.2 + 0xdd776560) >> > > x21: 0x7fffffffffffffff >> > > x22: 0xffffa00090317a48 >> > > x23: 0xffffa000f79d4f00 >> > > x24: 0xffffa001067ef910 >> > > x25: 0x00000000000000e0 >> > > x26: 0xffffa000158a8000 >> > > x27: 0x0000000000000000 >> > > x28: 0xffffa000158a8000 >> > > x29: 0xffff0000deefe280 ($d.2 + 0xdd776380) >> > > sp: 0xffff0000deefe280 >> > > lr: 0xffff000001623564 (zfs_clone_range + 0x6c) >> > > elr: 0xffff0000004e0d60 (rms_rlock + 0x1c) >> > > spsr: 0x00000000a0000045 >> > > far: 0x0000000000000108 >> > > esr: 0x0000000096000004 >> > > panic: data abort in critical section or under mutex >> > > cpuid = 1 >> > > time = 1699610885 >> > > KDB: stack backtrace: >> > > db_trace_self() at db_trace_self >> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x38 >> > > vpanic() at vpanic+0x1a0 >> > > panic() at panic+0x48 >> > > data_abort() at data_abort+0x2fc >> > > handle_el1h_sync() at handle_el1h_sync+0x18 >> > > --- exception, esr 0x96000004 >> > > rms_rlock() at rms_rlock+0x1c >> > > zfs_clone_range() at zfs_clone_range+0x68 >> > > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c >> > > null_bypass() at null_bypass+0x118 >> > > vn_copy_file_range() at vn_copy_file_range+0x18c >> > > kern_copy_file_range() at kern_copy_file_range+0x36c >> > > sys_copy_file_range() at sys_copy_file_range+0x8c >> > > do_el0_sync() at do_el0_sync+0x634 >> > > handle_el0_sync() at handle_el0_sync+0x48 >> > > --- exception, esr 0x56000000 >> > > KDB: enter: panic >> > > [ thread pid 3792 tid 100394 ] >> > > Stopped at kdb_enter+0x48: str xzr, [x19, #768] >> > > db> >> > > >> > > I'll keep the debugger open for a while. Can I type something for >> > > additional info? >> > > >> > > Regards, >> > > Ronald. >> > >> > -- >> > Alexander Motin >> >> >> > > > Hi, > > Build a new kernel today. > FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #20 main-051d69d6f8: Tue Nov 14 12:16:28 CET 2023 ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG arm64 > > x0: 0x00000000000008e0 > x1: 0xffffa0006ce1fb38 > x2: 0xffffa0006837a400 > x3: 0xffffa0012c503a48 > x4: 0xffff0000eb0ef430 (next_index + 0x815d790) > x5: 0xffffa00152636300 > x6: 0xffff0000e2e025b5 ($d.5 + 0xc) > x7: 0x000000000000030a > x8: 0xffff0000eb3212c0 (next_index + 0x838f620) > x9: 0x0000000000000001 > x10: 0x0000000000000000 > x11: 0x0000000000000001 > x12: 0x0000000000000002 > x13: 0x0000000000000000 > x14: 0x0000000000000001 > x15: 0x0000000000000000 > x16: 0xffff0000016e5b58 (__stop_set_modmetadata_set + 0x1328) > x17: 0xffff0000004e0c28 (rms_rlock + 0x0) > x18: 0xffff0000eb0ef250 (next_index + 0x815d5b0) > x19: 0x0000000000000800 > x20: 0xffff0000eb0ef430 (next_index + 0x815d790) > x21: 0x7fffffffffffffff > x22: 0xffffa0006ce1fb38 > x23: 0xffffa0006837a400 > x24: 0xffffa001ee486000 > x25: 0x00000000000008e0 > x26: 0xffffa000135ca000 > x27: 0x0000000000000800 > x28: 0xffffa000135ca000 > x29: 0xffff0000eb0ef250 (next_index + 0x815d5b0) > sp: 0xffff0000eb0ef250 > lr: 0xffff00000162bee8 (zfs_clone_range + 0x6c) > elr: 0xffff0000004e0c44 (rms_rlock + 0x1c) > spsr: 0x00000000a0000045 > far: 0x0000000000000908 > esr: 0x0000000096000004 > panic: data abort in critical section or under mutex > cpuid = 2 > time = 1699966486 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x38 > vpanic() at vpanic+0x1a0 > panic() at panic+0x48 > data_abort() at data_abort+0x2fc > handle_el1h_sync() at handle_el1h_sync+0x18 > --- exception, esr 0x96000004 > rms_rlock() at rms_rlock+0x1c > zfs_clone_range() at zfs_clone_range+0x68 > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c > null_bypass() at null_bypass+0x118 > vn_copy_file_range() at vn_copy_file_range+0x1c0 > kern_copy_file_range() at kern_copy_file_range+0x36c > sys_copy_file_range() at sys_copy_file_range+0x8c > do_el0_sync() at do_el0_sync+0x634 > handle_el0_sync() at handle_el0_sync+0x48 > --- exception, esr 0x56000000 > KDB: enter: panic > [ thread pid 3620 tid 100911 ] > Stopped at kdb_enter+0x48: str xzr, [x19, #768] > db> > > This happens as soon as I start poudriere in a jenkins-agent jail. > > AFAIK this includes the two recent vn_copy_file_range changes of Konstantin. > > Next I will install a GENERIC kernel instead of GENERIC-NODEBUG. > > Regards, > Ronald. > This is while running a DEBUG kernel. FreeBSD 15.0-CURRENT #5 main-20e1f207cc: Tue Nov 14 15:44:40 CET 2023 ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC arm64 acquiring duplicate lock of same type: "vnode interlock" 1st vnode interlock @ /home/ronald/dev/freebsd/src/sys/fs/nullfs/null_vnops.c:370 2nd vnode interlock @ /home/ronald/dev/freebsd/src/sys/fs/nullfs/null_vnops.c:370 stack backtrace: #0 0xffff00000050798c at witness_debugger+0x60 #1 0xffff00000046c25c at __mtx_lock_flags+0xac #2 0xffff0000e3014f14 at null_add_writecount+0x40 #3 0xffff0000e3014f44 at null_add_writecount+0x70 #4 0xffff0000005a08b4 at vn_open_vnode+0x278 #5 0xffff0000005a019c at vn_open_cred+0x484 #6 0xffff000000596cac at kern_openat+0x24c #7 0xffff0000008443f4 at do_el0_sync+0x59c #8 0xffff00000081e91c at handle_el0_sync+0x48 x0: 0x0000000000000000 x1: 0xffff000000a2dfeb (console_pausestr + 0xab0) x2: 0xffff0000009715fb (cam_status_table + 0x14a13) x3: 0xffffa000ba780d18 x4: 0xffff0000deef8430 (ucom_cons_softc + 0xdd66faa0) x5: 0xffffa001200d4900 x6: 0x0000808080808080 x7: 0xffffa001200d4900 x8: 0x0000000000000000 x9: 0x0000000000000000 x10: 0x0000000000000001 x11: 0x0000000000000001 x12: 0x0000000000000002 x13: 0x0000000000000000 x14: 0x0000000000010000 x15: 0x0000000000000001 x16: 0xffff00000172b258 (__stop_set_modmetadata_set + 0x1360) x17: 0xffff00000048ba24 (rms_rlock + 0x0) x18: 0xffff0000deef8240 (ucom_cons_softc + 0xdd66f8b0) x19: 0xffff0000e2ef9fd0 (ucom_cons_softc + 0xe1671640) x20: 0xffff0000deef8430 (ucom_cons_softc + 0xdd66faa0) x21: 0x7fffffffffffffff x22: 0xffffa000b9a169a8 x23: 0xffffa0018c7e4d00 x24: 0xffffa001a377acb0 x25: 0xffff0000e2ef9fd0 (ucom_cons_softc + 0xe1671640) x26: 0xffffa0001480f000 x27: 0xffff0000e2ef9ef0 (ucom_cons_softc + 0xe1671560) x28: 0xffffa0001480f000 x29: 0xffff0000deef8240 (ucom_cons_softc + 0xdd66f8b0) sp: 0xffff0000deef8240 lr: 0xffff00000048ba4c (rms_rlock + 0x28) elr: 0xffff00000048ba90 (rms_rlock + 0x6c) spsr: 0x0000000000000045 far: 0x0000000000000000 esr: 0x0000000096000004 panic: data abort in critical section or under mutex cpuid = 0 time = 1699974087 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x38 vpanic() at vpanic+0x1a0 panic() at panic+0x48 data_abort() at data_abort+0x31c handle_el1h_sync() at handle_el1h_sync+0x18 --- exception, esr 0x96000004 rms_rlock() at rms_rlock+0x6c zfs_clone_range() at zfs_clone_range+0x68 zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x1bc null_bypass() at null_bypass+0x118 vn_copy_file_range() at vn_copy_file_range+0x1b0 kern_copy_file_range() at kern_copy_file_range+0x37c sys_copy_file_range() at sys_copy_file_range+0x8c do_el0_sync() at do_el0_sync+0x59c handle_el0_sync() at handle_el0_sync+0x48 --- exception, esr 0x56000000 KDB: enter: panic [ thread pid 3703 tid 100223 ] Stopped at kdb_enter+0x48: str xzr, [x19, #896] db> I have coredump/textdump files and can DM if needed. BTW: while typing this the RPI rebooted poudriere was restarted by Jenkins and it now logs this message (not a panic): acquiring duplicate lock of same type: "vnode interlock" 1st vnode interlock @ /home/ronald/dev/freebsd/src/sys/fs/nullfs/null_vnops.c:370 2nd vnode interlock @ /home/ronald/dev/freebsd/src/sys/fs/nullfs/null_vnops.c:370 stack backtrace: #0 0xffff00000050798c at witness_debugger+0x60 #1 0xffff00000046c25c at __mtx_lock_flags+0xac #2 0xffff0000e5214f14 at null_add_writecount+0x40 #3 0xffff0000e5214f44 at null_add_writecount+0x70 #4 0xffff0000005a08b4 at vn_open_vnode+0x278 #5 0xffff0000005a019c at vn_open_cred+0x484 #6 0xffff000000596cac at kern_openat+0x24c #7 0xffff0000008443f4 at do_el0_sync+0x59c #8 0xffff00000081e91c at handle_el0_sync+0x48 The last duplicate lock log is with "vfs.zfs.bclone_enabled: 0". The panics are with "vfs.zfs.bclone_enabled: 1". Regards, Ronald. ------=_Part_8123_18781866.1699975305923 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit

Van: Ronald Klop <ronald-lists@klop.ws>
Datum: dinsdag, 14 november 2023 13:59
Aan: Konstantin Belousov <kostikbel@gmail.com>
CC: Alexander Motin <mav@freebsd.org>, current@freebsd.org
Onderwerp: Re: crash zfs_clone_range()

Response below
 

Van: Konstantin Belousov <kostikbel@gmail.com>
Datum: zondag, 12 november 2023 19:47
Aan: Alexander Motin <mav@freebsd.org>
CC: Ronald Klop <ronald-lists@klop.ws>, current@freebsd.org
Onderwerp: Re: crash zfs_clone_range()

On Sun, Nov 12, 2023 at 11:51:40AM -0500, Alexander Motin wrote:
> Hi Ronald,
>
> As I can see, the clone request to ZFS came through nullfs, and it crashed
> immediately on enter.  I've never been a VFS layer expert, but to me it may
> be a nullfs problem, not zfs.  Is there chance you was (un-)mounting
> something when this happened?
It is not nullfs issue, I believe, but the lack of the busy reference on the
upper mount.  I think https://reviews.freebsd.org/D42554 should cover it.

>
> On 10.11.2023 05:12, Ronald Klop wrote:
> > Hi,
> >
> > Had this crash today on RPI4/15-CURRENT.
> >
> > FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19
> > main-b0203aaa46-dirty: Sat Nov  4 11:48:33 CET 2023     ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG
> > arm64
> >
> > $ sysctl -a | grep bclon
> > vfs.zfs.bclone_enabled: 1
> >
> > I started a jail with poudriere to build a package. The jail uses null
> > mounts over ZFS.
> >
> > [root]# cu -s 115200 -l /dev/cuaU0
> > Connected
> >
> > db> bt
> > Tracing pid 95213 tid 100438 td 0xffff0000e1e97900
> > db_trace_self() at db_trace_self
> > db_stack_trace() at db_stack_trace+0x120
> > db_command() at db_command+0x2e4
> > db_command_loop() at db_command_loop+0x58
> > db_trap() at db_trap+0x100
> > kdb_trap() at kdb_trap+0x334
> > handle_el1h_sync() at handle_el1h_sync+0x18
> > --- exception, esr 0xf2000000
> > kdb_enter() at kdb_enter+0x48
> > vpanic() at vpanic+0x1dc
> > panic() at panic+0x48
> > data_abort() at data_abort+0x2fc
> > handle_el1h_sync() at handle_el1h_sync+0x18
> > --- exception, esr 0x96000004
> > rms_rlock() at rms_rlock+0x1c
> > zfs_clone_range() at zfs_clone_range+0x68
> > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c
> > null_bypass() at null_bypass+0x118
> > vn_copy_file_range() at vn_copy_file_range+0x18c
> > kern_copy_file_range() at kern_copy_file_range+0x36c
> > sys_copy_file_range() at sys_copy_file_range+0x8c
> > do_el0_sync() at do_el0_sync+0x634
> > handle_el0_sync() at handle_el0_sync+0x48
> > --- exception, esr 0x56000000
> >
> >
> > Oh.. While typing this I rebooted the machine and it happened again. I
> > didn't start anything in particular although the machine runs some
> > jails.
> >
> > x0: 0x00000000000000e0
> >    x1: 0xffffa00090317a48
> >    x2: 0xffffa000f79d4f00
> >    x3: 0xffffa000c61a44a8
> >    x4: 0xffff0000deefe460 ($d.2 + 0xdd776560)
> >    x5: 0xffffa001250e4c00
> >    x6: 0xffff0000e54025b5 ($d.5 + 0xc)
> >    x7: 0x000000000000030a
> >    x8: 0xffff0000e1559000 ($d.2 + 0xdfdd1100)
> >    x9: 0x0000000000000001
> >   x10: 0x0000000000000000
> >   x11: 0x0000000000000001
> >   x12: 0x0000000000000002
> >   x13: 0x0000000000000000
> >   x14: 0x0000000000000001
> >   x15: 0x0000000000000000
> >   x16: 0xffff0000016dce88 (__stop_set_modmetadata_set + 0x1310)
> >   x17: 0xffff0000004e0d44 (rms_rlock + 0x0)
> >   x18: 0xffff0000deefe280 ($d.2 + 0xdd776380)
> >   x19: 0x0000000000000000
> >   x20: 0xffff0000deefe460 ($d.2 + 0xdd776560)
> >   x21: 0x7fffffffffffffff
> >   x22: 0xffffa00090317a48
> >   x23: 0xffffa000f79d4f00
> >   x24: 0xffffa001067ef910
> >   x25: 0x00000000000000e0
> >   x26: 0xffffa000158a8000
> >   x27: 0x0000000000000000
> >   x28: 0xffffa000158a8000
> >   x29: 0xffff0000deefe280 ($d.2 + 0xdd776380)
> >    sp: 0xffff0000deefe280
> >    lr: 0xffff000001623564 (zfs_clone_range + 0x6c)
> >   elr: 0xffff0000004e0d60 (rms_rlock + 0x1c)
> > spsr: 0x00000000a0000045
> >   far: 0x0000000000000108
> >   esr: 0x0000000096000004
> > panic: data abort in critical section or under mutex
> > cpuid = 1
> > time = 1699610885
> > KDB: stack backtrace:
> > db_trace_self() at db_trace_self
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x38
> > vpanic() at vpanic+0x1a0
> > panic() at panic+0x48
> > data_abort() at data_abort+0x2fc
> > handle_el1h_sync() at handle_el1h_sync+0x18
> > --- exception, esr 0x96000004
> > rms_rlock() at rms_rlock+0x1c
> > zfs_clone_range() at zfs_clone_range+0x68
> > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c
> > null_bypass() at null_bypass+0x118
> > vn_copy_file_range() at vn_copy_file_range+0x18c
> > kern_copy_file_range() at kern_copy_file_range+0x36c
> > sys_copy_file_range() at sys_copy_file_range+0x8c
> > do_el0_sync() at do_el0_sync+0x634
> > handle_el0_sync() at handle_el0_sync+0x48
> > --- exception, esr 0x56000000
> > KDB: enter: panic
> > [ thread pid 3792 tid 100394 ]
> > Stopped at      kdb_enter+0x48: str     xzr, [x19, #768]
> > db>
> >
> > I'll keep the debugger open for a while. Can I type something for
> > additional info?
> >
> > Regards,
> > Ronald.
>
> --
> Alexander Motin



Hi,

Build a new kernel today.
FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #20 main-051d69d6f8: Tue Nov 14 12:16:28 CET 2023     ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG arm64

x0: 0x00000000000008e0                                                                                                         
  x1: 0xffffa0006ce1fb38                                                                                                                
  x2: 0xffffa0006837a400                                                                                                                
  x3: 0xffffa0012c503a48                                                                                                                
  x4: 0xffff0000eb0ef430 (next_index + 0x815d790)                                                                                       
  x5: 0xffffa00152636300                                                                                                                
  x6: 0xffff0000e2e025b5 ($d.5 + 0xc)                                                                                                   
  x7: 0x000000000000030a                                                                                                                
  x8: 0xffff0000eb3212c0 (next_index + 0x838f620)                                                                                       
  x9: 0x0000000000000001                                                                                                                
 x10: 0x0000000000000000                                                                                                                
 x11: 0x0000000000000001                                                                                                                
 x12: 0x0000000000000002                                                                                                                
 x13: 0x0000000000000000                                                                                                                
 x14: 0x0000000000000001                                                                                                                
 x15: 0x0000000000000000
 x16: 0xffff0000016e5b58 (__stop_set_modmetadata_set + 0x1328)
 x17: 0xffff0000004e0c28 (rms_rlock + 0x0)
 x18: 0xffff0000eb0ef250 (next_index + 0x815d5b0)
 x19: 0x0000000000000800
 x20: 0xffff0000eb0ef430 (next_index + 0x815d790)
 x21: 0x7fffffffffffffff
 x22: 0xffffa0006ce1fb38
 x23: 0xffffa0006837a400
 x24: 0xffffa001ee486000
 x25: 0x00000000000008e0
 x26: 0xffffa000135ca000
 x27: 0x0000000000000800
 x28: 0xffffa000135ca000
 x29: 0xffff0000eb0ef250 (next_index + 0x815d5b0)
  sp: 0xffff0000eb0ef250
  lr: 0xffff00000162bee8 (zfs_clone_range + 0x6c)
 elr: 0xffff0000004e0c44 (rms_rlock + 0x1c)
spsr: 0x00000000a0000045
 far: 0x0000000000000908
 esr: 0x0000000096000004
panic: data abort in critical section or under mutex
cpuid = 2
time = 1699966486
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1a0
panic() at panic+0x48
data_abort() at data_abort+0x2fc
handle_el1h_sync() at handle_el1h_sync+0x18
--- exception, esr 0x96000004
rms_rlock() at rms_rlock+0x1c
zfs_clone_range() at zfs_clone_range+0x68
zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c
null_bypass() at null_bypass+0x118
vn_copy_file_range() at vn_copy_file_range+0x1c0
kern_copy_file_range() at kern_copy_file_range+0x36c
sys_copy_file_range() at sys_copy_file_range+0x8c
do_el0_sync() at do_el0_sync+0x634
handle_el0_sync() at handle_el0_sync+0x48
--- exception, esr 0x56000000
KDB: enter: panic
[ thread pid 3620 tid 100911 ]
Stopped at      kdb_enter+0x48: str     xzr, [x19, #768]
db>

This happens as soon as I start poudriere in a jenkins-agent jail.

AFAIK this includes the two recent vn_copy_file_range changes of Konstantin.

Next I will install a GENERIC kernel instead of GENERIC-NODEBUG.

Regards,
Ronald.
 


This is while running a DEBUG kernel.
FreeBSD 15.0-CURRENT #5 main-20e1f207cc: Tue Nov 14 15:44:40 CET 2023
    ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC arm64

acquiring duplicate lock of same type: "vnode interlock"                                                                         
 1st vnode interlock @ /home/ronald/dev/freebsd/src/sys/fs/nullfs/null_vnops.c:370                                                      
 2nd vnode interlock @ /home/ronald/dev/freebsd/src/sys/fs/nullfs/null_vnops.c:370                                                      
stack backtrace:                                                                                                                        
#0 0xffff00000050798c at witness_debugger+0x60                                                                                          
#1 0xffff00000046c25c at __mtx_lock_flags+0xac                                                                                          
#2 0xffff0000e3014f14 at null_add_writecount+0x40                                                                                       
#3 0xffff0000e3014f44 at null_add_writecount+0x70                                                                                       
#4 0xffff0000005a08b4 at vn_open_vnode+0x278                                                                                            
#5 0xffff0000005a019c at vn_open_cred+0x484                                                                                             
#6 0xffff000000596cac at kern_openat+0x24c                                                                                              
#7 0xffff0000008443f4 at do_el0_sync+0x59c                                                                                              
#8 0xffff00000081e91c at handle_el0_sync+0x48                                                                                           
  x0: 0x0000000000000000                                                                                                                
  x1: 0xffff000000a2dfeb (console_pausestr + 0xab0)                                                                                     
  x2: 0xffff0000009715fb (cam_status_table + 0x14a13)                                                                                   
  x3: 0xffffa000ba780d18                                                                                                                
  x4: 0xffff0000deef8430 (ucom_cons_softc + 0xdd66faa0)                                                                                 
  x5: 0xffffa001200d4900                                                                                                                
  x6: 0x0000808080808080                                                                                                                
  x7: 0xffffa001200d4900                                                                                                                
  x8: 0x0000000000000000                                                                                                                
  x9: 0x0000000000000000                                                                                                                
 x10: 0x0000000000000001                                                                                                                
 x11: 0x0000000000000001                                                                                                                
 x12: 0x0000000000000002                                                                                                                
 x13: 0x0000000000000000                                                                                                                
 x14: 0x0000000000010000                                                                                                                
 x15: 0x0000000000000001
 x16: 0xffff00000172b258 (__stop_set_modmetadata_set + 0x1360)
 x17: 0xffff00000048ba24 (rms_rlock + 0x0)
 x18: 0xffff0000deef8240 (ucom_cons_softc + 0xdd66f8b0)
 x19: 0xffff0000e2ef9fd0 (ucom_cons_softc + 0xe1671640)
 x20: 0xffff0000deef8430 (ucom_cons_softc + 0xdd66faa0)
 x21: 0x7fffffffffffffff
 x22: 0xffffa000b9a169a8
 x23: 0xffffa0018c7e4d00
 x24: 0xffffa001a377acb0
 x25: 0xffff0000e2ef9fd0 (ucom_cons_softc + 0xe1671640)
 x26: 0xffffa0001480f000
 x27: 0xffff0000e2ef9ef0 (ucom_cons_softc + 0xe1671560)
 x28: 0xffffa0001480f000
 x29: 0xffff0000deef8240 (ucom_cons_softc + 0xdd66f8b0)
  sp: 0xffff0000deef8240
  lr: 0xffff00000048ba4c (rms_rlock + 0x28)
 elr: 0xffff00000048ba90 (rms_rlock + 0x6c)
spsr: 0x0000000000000045
 far: 0x0000000000000000
 esr: 0x0000000096000004
panic: data abort in critical section or under mutex
cpuid = 0
time = 1699974087
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1a0
panic() at panic+0x48
data_abort() at data_abort+0x31c
handle_el1h_sync() at handle_el1h_sync+0x18
--- exception, esr 0x96000004
rms_rlock() at rms_rlock+0x6c
zfs_clone_range() at zfs_clone_range+0x68
zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x1bc
null_bypass() at null_bypass+0x118
vn_copy_file_range() at vn_copy_file_range+0x1b0
kern_copy_file_range() at kern_copy_file_range+0x37c
sys_copy_file_range() at sys_copy_file_range+0x8c
do_el0_sync() at do_el0_sync+0x59c
handle_el0_sync() at handle_el0_sync+0x48
--- exception, esr 0x56000000
KDB: enter: panic
[ thread pid 3703 tid 100223 ]
Stopped at      kdb_enter+0x48: str     xzr, [x19, #896]
db>


I have coredump/textdump files and can DM if needed.

BTW: while typing this the RPI rebooted poudriere was restarted by Jenkins and it now logs this message (not a panic):
acquiring duplicate lock of same type: "vnode interlock"
 1st vnode interlock @ /home/ronald/dev/freebsd/src/sys/fs/nullfs/null_vnops.c:370
 2nd vnode interlock @ /home/ronald/dev/freebsd/src/sys/fs/nullfs/null_vnops.c:370
stack backtrace:
#0 0xffff00000050798c at witness_debugger+0x60
#1 0xffff00000046c25c at __mtx_lock_flags+0xac
#2 0xffff0000e5214f14 at null_add_writecount+0x40
#3 0xffff0000e5214f44 at null_add_writecount+0x70
#4 0xffff0000005a08b4 at vn_open_vnode+0x278
#5 0xffff0000005a019c at vn_open_cred+0x484
#6 0xffff000000596cac at kern_openat+0x24c
#7 0xffff0000008443f4 at do_el0_sync+0x59c
#8 0xffff00000081e91c at handle_el0_sync+0x48

The last duplicate lock log is with "vfs.zfs.bclone_enabled: 0".

The panics are with "vfs.zfs.bclone_enabled: 1".

Regards,
Ronald.

  ------=_Part_8123_18781866.1699975305923--