Re: 12.2 Splay Tree ipfw potential panic source
- Reply: Stefan Esser : "Re: 12.2 Splay Tree ipfw potential panic source"
- In reply to: Karl Denninger : "Re: 12.2 Splay Tree ipfw potential panic source"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 10 Jul 2021 02:41:00 UTC
On 7/9/2021 18:06, Karl Denninger wrote: > On 7/9/2021 16:17, Ryan Stone wrote: >> On Thu, Jul 8, 2021 at 8:54 PM Karl Denninger <karl@denninger.net> >> wrote: >>> I will see if I can get at least a panic backtrace, although the >>> impacted box is a pcEngines firewall that boots of an SD card. >> Have you checked whether netdump supports your NICs? You should be >> able to get a full vmcore off if so. > > Yes; the box in question is in heavy production and I will not be able > to get an isolated period of time to pull a core (assuming the remote > dump works) until sometime this weekend. > > Will advise once I (hopefully) have it. > Ok, so I have good news and bad news. I have the trap and it is definitely in libalias which appears to come about as a result of a NAT translation attempt. Fatal trap 18: integer divide fault while in kernel mode cpuid = 1; apic id = 01 instruction pointer = 0x20:0xffffffff8275b7cc stack pointer = 0x28:0xfffffe0017b6b310 frame pointer = 0x28:0xfffffe0017b6b320 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (if_io_tqg_1) trap number = 18 panic: integer divide fault cpuid = 1 time = 1625883072 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0017b6b020 vpanic() at vpanic+0x17b/frame 0xfffffe0017b6b070 panic() at panic+0x43/frame 0xfffffe0017b6b0d0 trap_fatal() at trap_fatal+0x391/frame 0xfffffe0017b6b130 trap() at trap+0x67/frame 0xfffffe0017b6b240 calltrap() at calltrap+0x8/frame 0xfffffe0017b6b240 --- trap 0x12, rip = 0xffffffff8275b7cc, rsp = 0xfffffe0017b6b310, rbp = 0xfffffe0017b6b320 --- HouseKeeping() at HouseKeeping+0x1c/frame 0xfffffe0017b6b320 LibAliasInLocked() at LibAliasInLocked+0x2f/frame 0xfffffe0017b6b3e0 LibAliasIn() at LibAliasIn+0x46/frame 0xfffffe0017b6b410 ipfw_nat() at ipfw_nat+0x234/frame 0xfffffe0017b6b460 ipfw_chk() at ipfw_chk+0x1350/frame 0xfffffe0017b6b670 ipfw_check_packet() at ipfw_check_packet+0xf0/frame 0xfffffe0017b6b760 pfil_run_hooks() at pfil_run_hooks+0xb0/frame 0xfffffe0017b6b7f0 ip_input() at ip_input+0x427/frame 0xfffffe0017b6b8a0 netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe0017b6b8f0 ether_demux() at ether_demux+0x138/frame 0xfffffe0017b6b920 ether_nh_input() at ether_nh_input+0x33b/frame 0xfffffe0017b6b980 netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe0017b6b9d0 ether_input() at ether_input+0x4b/frame 0xfffffe0017b6ba00 iflib_rxeof() at iflib_rxeof+0xad6/frame 0xfffffe0017b6bae0 _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe0017b6bb20 gtaskqueue_run_locked() at gtaskqueue_run_locked+0x121/frame 0xfffffe0017b6bb80 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xb6/frame 0xfffffe0017b6bbb0 fork_exit() at fork_exit+0x7e/frame 0xfffffe0017b6bbf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0017b6bbf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 7m23s netdump: overwriting mbuf zone pointers netdump in progress. searching for server... netdumping to 192.168.10.100 (ac:1f:6b:ad:d8:cb) Dumping 190 out of 1882 MB:. . . . . . . . . . . . . ** DUMP FAILED (ERROR 60) ** Now the bad news -- as you can see, an attempted remote dump fails, possibly because the network code at that point is hosed. I get a 69632 length file (exactly and repeatedly) on the remote machine where the dump is set to go; it looks like the first piece of it is indeed received but that's it and then the panic'd unit reboots. On the server (remote) end I have this in the "info" file: Dump from IpGw [192.168.10.200] Dump incomplete: client timed out So it looks like it got the first part of it, the server replied but the crashed box never sent anything else. -rw------- 1 root wheel 2 Jul 9 22:11 bounds.IpGw -rw------- 1 root wheel 66 Jul 9 22:10 info.IpGw.0 -rw------- 1 root wheel 0 Jul 9 22:11 info.IpGw.1 -rw------- 1 root wheel 69632 Jul 9 22:00 vmcore.IpGw.0 -rw------- 1 root wheel 69632 Jul 9 22:11 vmcore.IpGw.1 Without a complete core I can't give you a good traceback. I may be able to get a local device on this unit sometime over the weekend sometime -- not sure as of yet as it is in production use. This is an extremely reliable panic -- uptime is only a few minutes before it blows up. -- Karl Denninger karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/