Re: FreeBSD-main-amd64-test - Build #19839 - Failure
- In reply to: Kristof Provost: "Re: FreeBSD-main-amd64-test - Build #19839 - Failure"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 20 Nov 2021 00:36:23 UTC
On 18 Nov 2021, at 18:50, Kristof Provost wrote: > On 18 Nov 2021, at 17:21, John Baldwin wrote: >> On 11/17/21 5:50 PM, jenkins-admin@FreeBSD.org wrote: >>> FreeBSD-main-amd64-test - Build #19839 >>> (4082b189d2ce00674439226c9d5a8bdcafd23d01) - Failure >>> >>> Build information: >>> https://ci.FreeBSD.org/job/FreeBSD-main-amd64-test/19839/ >>> Full change log: >>> https://ci.FreeBSD.org/job/FreeBSD-main-amd64-test/19839/changes >>> Full build log: >>> https://ci.FreeBSD.org/job/FreeBSD-main-amd64-test/19839/console >>> >>> Status explanation: >>> "Failure" - the build is suspected being broken by the following >>> changes >>> "Still Failing" - the build has not been fixed by the following >>> changes and >>> this is a notification to note that these changes >>> have >>> not been fully tested by the CI system >>> >>> Change summaries: >>> (Those commits are likely but not certainly responsible) >>> >>> b928e924f74b0b8f882a9b735611421a93113640 by jhb: >>> rtld-elf: Use _get_tp in __tls_get_addr for aarch64 and riscv64. >>> >>> a8d885296a9dc517e731723081c83d97d2aa598f by jhb: >>> linux_linkat: Don't invert AT_* flags. >>> >>> 8b2ce7a3bbd0a754d31ff3943d918b4c84c831a3 by jhb: >>> linux_name_to_handle_at: Support AT_EMPTY_PATH. >>> >>> 1962164584a91078418afcd7c979afef13df8c4d by jhb: >>> imgact_elf: Use bool instead of boolean_t. >>> >>> 4082b189d2ce00674439226c9d5a8bdcafd23d01 by jhb: >>> elf*_brand_inuse: Change return type to bool. >> >> I don't the panic is related to these commits. >> >>> The end of the build log: >>> >>> [...truncated 4.27 MB...] >>> --- trap 0xc, rip = 0x2100c7156d9a, rsp = 0x4423f48, rbp = 0x4423f60 >>> --- >> >> The useful parts of the panic were earlier in the log: >> >> 01:50:13 sys/netpfil/common/dummynet:pf_queue_v6 -> epair5a: >> Ethernet address: 02:b5:98:ea:1d:0a >> 01:50:14 epair5b: Ethernet address: 02:b5:98:ea:1d:0b >> 01:50:14 epair5a: link state changed to UP >> 01:50:14 epair5b: link state changed to UP >> 01:50:14 epair5a: link state changed to DOWN >> 01:50:27 epair5b: link state changed to DOWN >> 01:50:27 passed [13.831s] >> 01:50:27 >> 01:50:27 >> 01:50:27 Fatal trap 12: page fault while in kernel mode >> 01:50:27 cpuid = 0; apic id = 00 >> 01:50:27 fault virtual address = 0x10 >> 01:50:27 fault code = supervisor read data, page not present >> 01:50:27 instruction pointer = 0x20:0xffffffff80e3c60f >> 01:50:27 stack pointer = 0x28:0xfffffe00a5d9ec30 >> 01:50:27 frame pointer = 0x28:0xfffffe00a5d9ed00 >> 01:50:27 code segment = base 0x0, limit 0xfffff, type 0x1b >> 01:50:27 = DPL 0, pres 1, long 1, def32 0, gran 1 >> 01:50:27 processor eflags = interrupt enabled, resume, IOPL = 0 >> 01:50:27 current process = 0 (dummynet) >> 01:50:27 trap number = 12 >> 01:50:27 panic: page fault >> 01:50:27 cpuid = 0 >> 01:50:27 time = 1637200227 >> 01:50:27 KDB: stack backtrace: >> 01:50:27 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 0xfffffe00a5d9e9f0 >> 01:50:27 vpanic() at vpanic+0x17f/frame 0xfffffe00a5d9ea40 >> 01:50:27 panic() at panic+0x43/frame 0xfffffe00a5d9eaa0 >> 01:50:27 trap_fatal() at trap_fatal+0x385/frame 0xfffffe00a5d9eb00 >> 01:50:27 trap_pfault() at trap_pfault+0xab/frame 0xfffffe00a5d9eb60 >> 01:50:27 calltrap() at calltrap+0x8/frame 0xfffffe00a5d9eb60 >> 01:50:27 --- trap 0xc, rip = 0xffffffff80e3c60f, rsp = >> 0xfffffe00a5d9ec30, rbp = 0xfffffe00a5d9ed00 --- >> 01:50:27 ip6_input() at ip6_input+0x4f/frame 0xfffffe00a5d9ed00 >> 01:50:27 netisr_dispatch_src() at netisr_dispatch_src+0xb1/frame >> 0xfffffe00a5d9ed60 >> 01:50:27 dummynet_send() at dummynet_send+0x1dd/frame >> 0xfffffe00a5d9eda0 >> 01:50:27 dummynet_task() at dummynet_task+0x493/frame >> 0xfffffe00a5d9ee40 >> 01:50:27 taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame >> 0xfffffe00a5d9eec0 >> 01:50:27 taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame >> 0xfffffe00a5d9eef0 >> 01:50:27 fork_exit() at fork_exit+0x80/frame 0xfffffe00a5d9ef30 >> 01:50:27 fork_trampoline() at fork_trampoline+0xe/frame >> 0xfffffe00a5d9ef30 >> >> So I suspect this is a race with teardown of the epair interfaces and >> whatever >> traffic the sys/netpfil/common/dummynet:pf_queue_v6 test was sending >> when it >> stopped. I've cc'd kp@ in case he has any ideas? >> > I think I’ve seen that before, while I was doing the pf/dummynet > work. > Dummynet will queue packets, delay them and eventually send them out. > If between the enqueuing and dequeuing the packet the interface goes > away we can end up with this panic. > > I have a patch to teach dummynet to walk its queues when an interface > is removed and to drop any packets destined for that interface. I > wound up not committing it because I was unable to reproduce the > problem at the time. > > Can we reliably trigger this panic? I’d love to be able to add a > test case for it, especially because the patch is non-trivial. > I managed to build a test case to reliably trigger the problem (with v4, but same principle): The fix: https://reviews.freebsd.org/D33064 The test: https://reviews.freebsd.org/D33065 Kristof