crash in dummynet, , si->sched
Andriy Gapon
avg at FreeBSD.org
Tue May 14 09:24:54 UTC 2019
Unfortunately, all we have is some information from a ddb text dump. We do not
have a vmcore and we do not have a way to re-create the crash. It happened just
once on a production system.
So, the information follows.
dn_enqueue fs 0 si 0, dropping
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x60
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff8077bdff
stack pointer = 0x28:0xfffffe1096343910
frame pointer = 0x28:0xfffffe1096343920
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 (dummynet)
db:3:psinfo> bt
Tracing pid 0 tid 100248 td 0xfffff8002829d4d0
stack1 drain_scheduler_cb+0x1f drain_scheduler_sch_cb+0x25
dn_ht_scan_bucket+0x7a dn_drain_scheduler+0x20 dummynet_task+0x219
taskqueue_run_locked+0x71 taskqueue_thread_loop+0x56 fork_exit+0x121
fork_trampoline+0xe
drain_scheduler_cb+0x1f: movq 0x60(%rdx),%rax
Here is disassembly of the function with some notes of mine:
0xffffffff8077bde0 <+0>: push %rbp
0xffffffff8077bde1 <+1>: testb $0x20,0x90(%rdi) // test DN_ACTIVE
0xffffffff8077bde8 <+8>: mov %rsp,%rbp
0xffffffff8077bdeb <+11>: jne 0xffffffff8077bdf4 <drain_scheduler_cb+20>
0xffffffff8077bded <+13>: cmpq $0x0,0x78(%rdi)
0xffffffff8077bdf2 <+18>: je 0xffffffff8077bdf8 <drain_scheduler_cb+24>
0xffffffff8077bdf4 <+20>: leaveq
0xffffffff8077bdf5 <+21>: xor %eax,%eax
0xffffffff8077bdf7 <+23>: retq
0xffffffff8077bdf8 <+24>: mov 0x88(%rdi),%rdx // rdx = si->sched
0xffffffff8077bdff <+31>: mov 0x60(%rdx),%rax // rax = si->sched->fp
0xffffffff8077be03 <+35>: testb $0x1,0x10(%rax)
So, it seems that dummynet ran into dn_sch_inst with sched field being NULL.
I am not sure how that could be possible.
Also, I am not sure if that "dn_enqueue ..." message is related to the crash.
Does anyone have any ideas?
Thank you.
P.S.
I found a somewhat similar but different and very old report:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166937
It seems that it was not really root-caused and fixed, but marked as fixed
because of a chance that it could have been caused by flaky hardware.
--
Andriy Gapon
More information about the freebsd-net
mailing list