[Bug 283747] kernel panic after telegraf service restart
- In reply to: bugzilla-noreply_a_freebsd.org: "[Bug 283747] [crash] kernel panic after telegraf service restart"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 19 Mar 2025 19:17:23 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283747 --- Comment #41 from Matthew L. Dailey <matthew.l.dailey@dartmouth.edu> --- Thanks for the explanation, Gleb. It sounds like this is definitely worth fixing, regardless of whether it's the cause of this specific bug. :-) FWIW, neither of my test servers (uptime ~8 days with telegraf running constantly) showed any zombies: # ps aux | grep Z | grep -v grep USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND I was just now able to panic my hardware system (took just over 8 days) and will send a core the same way I did before. This is obviously from the unpatched kernel. The last frame before the panic is in crunusebatch() so this looks promising. Unread portion of the kernel message buffer: [691684] panic: crunusebatch: ref -4294967294 not >= 0 on cred 0xfffff8011b43cb00 [691684] cpuid = 17 [691684] time = 1742410811 [691684] KDB: stack backtrace: [691684] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d3bfa870 [691684] vpanic() at vpanic+0x131/frame 0xfffffe00d3bfa9a0 [691684] panic() at panic+0x43/frame 0xfffffe00d3bfaa00 [691684] crunusebatch() at crunusebatch+0xfa/frame 0xfffffe00d3bfaa30 [691684] thread_reap_domain() at thread_reap_domain+0x28d/frame 0xfffffe00d3bfaae0 [691684] proc_reap() at proc_reap+0x660/frame 0xfffffe00d3bfab20 [691684] proc_to_reap() at proc_to_reap+0x3c4/frame 0xfffffe00d3bfab70 [691684] kern_wait6() at kern_wait6+0x1a6/frame 0xfffffe00d3bfac10 [691684] sys_wait4() at sys_wait4+0x6b/frame 0xfffffe00d3bfae00 [691684] amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe00d3bfaf30 [691684] fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00d3bfaf30 [691684] --- syscall (7, FreeBSD ELF64, wait4), rip = 0x2893da, rsp = 0x821093d18, rbp = 0x821093d80 --- [691684] KDB: enter: panic Restarted both test systems (hardware and VM) with patched DEBUG kernels and will report back in 10 days or so. --- Comment #42 from Matthew L. Dailey <matthew.l.dailey@dartmouth.edu> --- Thanks for the explanation, Gleb. It sounds like this is definitely worth fixing, regardless of whether it's the cause of this specific bug. :-) FWIW, neither of my test servers (uptime ~8 days with telegraf running constantly) showed any zombies: # ps aux | grep Z | grep -v grep USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND I was just now able to panic my hardware system (took just over 8 days) and will send a core the same way I did before. This is obviously from the unpatched kernel. The last frame before the panic is in crunusebatch() so this looks promising. Unread portion of the kernel message buffer: [691684] panic: crunusebatch: ref -4294967294 not >= 0 on cred 0xfffff8011b43cb00 [691684] cpuid = 17 [691684] time = 1742410811 [691684] KDB: stack backtrace: [691684] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d3bfa870 [691684] vpanic() at vpanic+0x131/frame 0xfffffe00d3bfa9a0 [691684] panic() at panic+0x43/frame 0xfffffe00d3bfaa00 [691684] crunusebatch() at crunusebatch+0xfa/frame 0xfffffe00d3bfaa30 [691684] thread_reap_domain() at thread_reap_domain+0x28d/frame 0xfffffe00d3bfaae0 [691684] proc_reap() at proc_reap+0x660/frame 0xfffffe00d3bfab20 [691684] proc_to_reap() at proc_to_reap+0x3c4/frame 0xfffffe00d3bfab70 [691684] kern_wait6() at kern_wait6+0x1a6/frame 0xfffffe00d3bfac10 [691684] sys_wait4() at sys_wait4+0x6b/frame 0xfffffe00d3bfae00 [691684] amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe00d3bfaf30 [691684] fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00d3bfaf30 [691684] --- syscall (7, FreeBSD ELF64, wait4), rip = 0x2893da, rsp = 0x821093d18, rbp = 0x821093d80 --- [691684] KDB: enter: panic Restarted both test systems (hardware and VM) with patched DEBUG kernels and will report back in 10 days or so. -- You are receiving this mail because: You are the assignee for the bug.