Re: FreeBSD-main-amd64-test - Build #19839 - Failure

Reply: Kristof Provost: "Re: FreeBSD-main-amd64-test - Build #19839 - Failure"
In reply to: jenkins-admin_a_FreeBSD.org: "FreeBSD-main-amd64-test - Build #19839 - Failure"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: John Baldwin <jhb_at_FreeBSD.org>
Date: Fri, 19 Nov 2021 00:21:16 UTC
On 11/17/21 5:50 PM, jenkins-admin@FreeBSD.org wrote:
> FreeBSD-main-amd64-test - Build #19839 (4082b189d2ce00674439226c9d5a8bdcafd23d01) - Failure
> 
> Build information: https://ci.FreeBSD.org/job/FreeBSD-main-amd64-test/19839/
> Full change log: https://ci.FreeBSD.org/job/FreeBSD-main-amd64-test/19839/changes
> Full build log: https://ci.FreeBSD.org/job/FreeBSD-main-amd64-test/19839/console
> 
> Status explanation:
> "Failure" - the build is suspected being broken by the following changes
> "Still Failing" - the build has not been fixed by the following changes and
>                    this is a notification to note that these changes have
>                    not been fully tested by the CI system
> 
> Change summaries:
> (Those commits are likely but not certainly responsible)
> 
> b928e924f74b0b8f882a9b735611421a93113640 by jhb:
> rtld-elf: Use _get_tp in __tls_get_addr for aarch64 and riscv64.
> 
> a8d885296a9dc517e731723081c83d97d2aa598f by jhb:
> linux_linkat: Don't invert AT_* flags.
> 
> 8b2ce7a3bbd0a754d31ff3943d918b4c84c831a3 by jhb:
> linux_name_to_handle_at: Support AT_EMPTY_PATH.
> 
> 1962164584a91078418afcd7c979afef13df8c4d by jhb:
> imgact_elf: Use bool instead of boolean_t.
> 
> 4082b189d2ce00674439226c9d5a8bdcafd23d01 by jhb:
> elf*_brand_inuse: Change return type to bool.

I don't the panic is related to these commits.

> The end of the build log:
> 
> [...truncated 4.27 MB...]
> --- trap 0xc, rip = 0x2100c7156d9a, rsp = 0x4423f48, rbp = 0x4423f60 ---

The useful parts of the panic were earlier in the log:

01:50:13 sys/netpfil/common/dummynet:pf_queue_v6  ->  epair5a: Ethernet address: 02:b5:98:ea:1d:0a
01:50:14 epair5b: Ethernet address: 02:b5:98:ea:1d:0b
01:50:14 epair5a: link state changed to UP
01:50:14 epair5b: link state changed to UP
01:50:14 epair5a: link state changed to DOWN
01:50:27 epair5b: link state changed to DOWN
01:50:27 passed  [13.831s]
01:50:27
01:50:27
01:50:27 Fatal trap 12: page fault while in kernel mode
01:50:27 cpuid = 0; apic id = 00
01:50:27 fault virtual address	= 0x10
01:50:27 fault code		= supervisor read data, page not present
01:50:27 instruction pointer	= 0x20:0xffffffff80e3c60f
01:50:27 stack pointer	        = 0x28:0xfffffe00a5d9ec30
01:50:27 frame pointer	        = 0x28:0xfffffe00a5d9ed00
01:50:27 code segment		= base 0x0, limit 0xfffff, type 0x1b
01:50:27 			= DPL 0, pres 1, long 1, def32 0, gran 1
01:50:27 processor eflags	= interrupt enabled, resume, IOPL = 0
01:50:27 current process		= 0 (dummynet)
01:50:27 trap number		= 12
01:50:27 panic: page fault
01:50:27 cpuid = 0
01:50:27 time = 1637200227
01:50:27 KDB: stack backtrace:
01:50:27 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00a5d9e9f0
01:50:27 vpanic() at vpanic+0x17f/frame 0xfffffe00a5d9ea40
01:50:27 panic() at panic+0x43/frame 0xfffffe00a5d9eaa0
01:50:27 trap_fatal() at trap_fatal+0x385/frame 0xfffffe00a5d9eb00
01:50:27 trap_pfault() at trap_pfault+0xab/frame 0xfffffe00a5d9eb60
01:50:27 calltrap() at calltrap+0x8/frame 0xfffffe00a5d9eb60
01:50:27 --- trap 0xc, rip = 0xffffffff80e3c60f, rsp = 0xfffffe00a5d9ec30, rbp = 0xfffffe00a5d9ed00 ---
01:50:27 ip6_input() at ip6_input+0x4f/frame 0xfffffe00a5d9ed00
01:50:27 netisr_dispatch_src() at netisr_dispatch_src+0xb1/frame 0xfffffe00a5d9ed60
01:50:27 dummynet_send() at dummynet_send+0x1dd/frame 0xfffffe00a5d9eda0
01:50:27 dummynet_task() at dummynet_task+0x493/frame 0xfffffe00a5d9ee40
01:50:27 taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 0xfffffe00a5d9eec0
01:50:27 taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe00a5d9eef0
01:50:27 fork_exit() at fork_exit+0x80/frame 0xfffffe00a5d9ef30
01:50:27 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00a5d9ef30

So I suspect this is a race with teardown of the epair interfaces and whatever
traffic the sys/netpfil/common/dummynet:pf_queue_v6 test was sending when it
stopped.  I've cc'd kp@ in case he has any ideas?

-- 
John Baldwin