[Bug 283747] [crash] kernel panic after telegraf service restart

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 30 Dec 2024 16:43:36 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283747

            Bug ID: 283747
           Summary: [crash] kernel panic after telegraf service restart
           Product: Base System
           Version: 14.1-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: fax@nohik.ee

When server is up about 10 or more days and if I do restart to telegraf service
then result is kernel panic. I don't think that this is telegraf itself related
but telegraf is a "good" trigger. As far I have tried to reproduce this crash,
I have noticed that this happens most of the times on machines with AMD cpu
(for example EPYC 7443, EPYC 9354P, EPYC 7282).
I'm not sure, but this panic happens with FreeBSD 14.0 and 14.1 (not tested on
14.2).

Here is crash data:

Dump header from device: /dev/ada9p1
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 25966280704
  Blocksize: 512
  Compression: none
  Dumptime: 2024-12-30 16:54:25 +0200
  Hostname: xx.xx.xx
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 14.1-RELEASE-p5 GENERIC
  Panic String: page fault
  Dump Parity: 1677611570
  Bounds: 0
  Dump Status: good



Unread portion of the kernel message buffer:     
[1748990]                                                                       
[1748990]                                                       
[1748990] Fatal trap 12: page fault while in kernel mode                        
[1748990] cpuid = 0; apic id = 00         
[1748990] fault virtual address = 0x458                        
[1748990] fault code            = supervisor read data, page not present
[1748990] instruction pointer   = 0x20:0xffffffff80b0f0c9                       
[1748990] stack pointer         = 0x28:0xfffffe089f2f3c60
[1748990] frame pointer         = 0x28:0xfffffe089f2f3ce0                       
[1748990] code segment          = base 0x0, limit 0xfffff, type 0x1b
[1748990]                       = DPL 0, pres 1, long 1, def32 0, gran 1
[1748990] processor eflags      = interrupt enabled, resume, IOPL = 0
[1748990] current process               = 2 (clock (0))                         
[1748990] rdi: fffff8047720e518 rsi: 0000000000000004 rdx: 0000000000000000
[1748990] rcx: 0000000000000000  r8: 00000000000000bd  r9: fffffe089f2f4000
[1748990] rax: 0000000000000000 rbx: fffff8010813a740 rbp: fffffe089f2f3ce0
[1748990] r10: 0000000000001388 r11: 00000000e836231a r12: 0000000000000000
[1748990] r13: fffff8010813a740 r14: fffffe089f2f3c88 r15: fffff8047720e518
[1748990] trap number           = 12                                            
[1748990] panic: page fault                                     
[1748990] cpuid = 0                                             
[1748990] time = 1735570465                                     
[1748990] KDB: stack backtrace:                                                 
[1748990] #0 0xffffffff80b7fefd at kdb_backtrace+0x5d
[1748990] #1 0xffffffff80b32bd1 at vpanic+0x131                                 
[1748990] #2 0xffffffff80b32a93 at panic+0x43    
[1748990] #3 0xffffffff8100091b at trap_fatal+0x40b                             
[1748990] #4 0xffffffff81000966 at trap_pfault+0x46                             
[1748990] #5 0xffffffff80fd6d48 at calltrap+0x8
[1748990] #6 0xffffffff80b1f6b9 at crfree+0xa9                                  
[1748990] #7 0xffffffff80ceadb4 at in_pcbfree+0x2a4
[1748990] #8 0xffffffff80bd6049 at sorele_locked+0x89                           
[1748990] #9 0xffffffff80d12c20 at tcp_close+0x170        
[1748990] #10 0xffffffff80d1cb09 at tcp_timer_2msl+0xf9                         
[1748990] #11 0xffffffff80d1bb7e at tcp_timer_enter+0xfe
[1748990] #12 0xffffffff80b50b0c at softclock_call_cc+0x12c
[1748990] #13 0xffffffff80b52355 at softclock_thread+0xe5
[1748990] #14 0xffffffff80aecf7f at fork_exit+0x7f
[1748990] #15 0xffffffff80fd7dae at fork_trampoline+0xe
[1748990] Uptime: 20d5h49m50s
[1748990] Dumping 24763 out of 786107
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/mrsas.ko...
Reading symbols from /usr/lib/debug//boot/kernel/mrsas.ko.debug...
Reading symbols from /boot/kernel/amdtemp.ko...
Reading symbols from /usr/lib/debug//boot/kernel/amdtemp.ko.debug...
Reading symbols from /boot/kernel/amdsmn.ko...
Reading symbols from /usr/lib/debug//boot/kernel/amdsmn.ko.debug...
Reading symbols from /boot/kernel/if_igb.ko...
Reading symbols from /usr/lib/debug//boot/kernel/if_em.ko.debug...
Reading symbols from /boot/kernel/accf_dns.ko...
Reading symbols from /usr/lib/debug//boot/kernel/accf_dns.ko.debug...
Reading symbols from /boot/kernel/zfs.ko...
Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...
Reading symbols from /boot/kernel/accf_data.ko...
Reading symbols from /usr/lib/debug//boot/kernel/accf_data.ko.debug...
Reading symbols from /boot/kernel/accf_http.ko...
Reading symbols from /usr/lib/debug//boot/kernel/accf_http.ko.debug...
Reading symbols from /boot/kernel/pflog.ko...
Reading symbols from /usr/lib/debug//boot/kernel/pflog.ko.debug...
Reading symbols from /boot/kernel/pf.ko...
Reading symbols from /usr/lib/debug//boot/kernel/pf.ko.debug...
Reading symbols from /boot/kernel/ipmi.ko...
Reading symbols from /usr/lib/debug//boot/kernel/ipmi.ko.debug...
Reading symbols from /boot/kernel/smbus.ko...
Reading symbols from /usr/lib/debug//boot/kernel/smbus.ko.debug...
Reading symbols from /boot/kernel/intpm.ko...
Reading symbols from /usr/lib/debug//boot/kernel/intpm.ko.debug...
Reading symbols from /boot/kernel/cpuctl.ko...
Reading symbols from /usr/lib/debug//boot/kernel/cpuctl.ko.debug...
Reading symbols from /boot/kernel/if_lagg.ko...
Reading symbols from /usr/lib/debug//boot/kernel/if_lagg.ko.debug...
Reading symbols from /boot/kernel/if_infiniband.ko...
Reading symbols from /usr/lib/debug//boot/kernel/if_infiniband.ko.debug...
Reading symbols from /boot/kernel/uhid.ko...
Reading symbols from /usr/lib/debug//boot/kernel/uhid.ko.debug...
Reading symbols from /boot/kernel/ums.ko...
Reading symbols from /usr/lib/debug//boot/kernel/ums.ko.debug...
Reading symbols from /boot/kernel/usbhid.ko...
Reading symbols from /usr/lib/debug//boot/kernel/usbhid.ko.debug...
Reading symbols from /boot/kernel/hidbus.ko...
Reading symbols from /usr/lib/debug//boot/kernel/hidbus.ko.debug...
Reading symbols from /boot/kernel/if_urndis.ko...
Reading symbols from /usr/lib/debug//boot/kernel/if_urndis.ko.debug...
Reading symbols from /boot/kernel/uether.ko...
Reading symbols from /usr/lib/debug//boot/kernel/uether.ko.debug...
Reading symbols from /boot/kernel/nullfs.ko...
Reading symbols from /usr/lib/debug//boot/kernel/nullfs.ko.debug...
__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
57              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
pcpu,
(kgdb)

-- 
You are receiving this mail because:
You are the assignee for the bug.