[Bug 263814] panic: GPF in cpu_search_highest if all cores in a domain are disabled

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 06 May 2022 12:13:48 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=263814

            Bug ID: 263814
           Summary: panic: GPF in cpu_search_highest if all cores in a
                    domain are disabled
           Product: Base System
           Version: CURRENT
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: dgmorris@earthlink.net
 Attachment #233763 text/plain
         mime type:

Created attachment 233763
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=233763&action=edit
2 domain KVM node definition

Looking into a hang at boot in intr_next_cpu() for a product based off of
FreeBSD (and a bit behind Top Of Branch, I'm afraid) when performance folks
wanted to check NUMA effects on an AWS i4i.32xlarge
(https://aws.amazon.com/ec2/instance-types/i4i/) instance by disabling one
socket via setting hint.lapic.X.disabled for all cores of the socket. (Note: If
there are better ways, I'm certainly open to hear them - I'll freely admit this
may have a "don't do that" aspect to it.. but raising the issue so that can be
considered).

This will reproduce with a local KVM configured to present as a two domain (at
least) NUMA system. Attaching XML for virtual machine in case it is
needed/helps. Please forgive any oddities within it, it is a local testing
environment I hack around on a lot.

When checking how Top-of-Branch upstream handles things (to see if we could
just cherry-pick fixes back), we get a panic setting up scheduling instead
(this is with a downloaded qcow2.xz for FreeBSD14-CURRENT, plugged in as the
boot drive for the same KVM used to replicate):

FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
FreeBSD/SMP: 2 package(s) x 8 core(s)
root@freebsd:~ # uname -a
FreeBSD freebsd 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n255198-1907e1c07c3:
Thu May  5 07:52:56 UTC 2022    
root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
root@freebsd:~ # sysctl kern.vm_guest
kern.vm_guest: kvm

root@freebsd:~ # bash
[root@freebsd ~]# for i in {8..15};do echo hint.lapic.$i.disabled=1; done >>
/boot/loader.conf
[root@freebsd ~]# cat /boot/loader.conf
hint.lapic.8.disabled=1
hint.lapic.9.disabled=1
hint.lapic.10.disabled=1
hint.lapic.11.disabled=1
hint.lapic.12.disabled=1
hint.lapic.13.disabled=1
hint.lapic.14.disabled=1
hint.lapic.15.disabled=1

At the reboot:

Loading kernel...
/boot/kernel/kernel text=0x189e30 text=0xe405f8 text=0x6b8114 data=0x140
data=0x1c8240+0x436dc0 0x8+0x198600+0x8+0x1b88b0
Loading configured modules...
/etc/hostid size=0x25
/boot/entropy size=0x1000
staging 0xb4000000 (not copying) tramp 0xbdd00000 PT4 0xbdcf7000
Start @ 0xffffffff8038a000 ...
EFI framebuffer information:
addr, size     0xc4000000, 0x1d5000
dimensions     800 x 600
stride         800
masks          0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000
GDB: no debug ports present
KDB: debugger backends: ddb
KDB: current backend: ddb
---<<BOOT>>---
Copyright (c) 1992-2022 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 14.0-CURRENT #0 main-n255198-1907e1c07c3: Thu May  5 07:52:56 UTC 2022
    root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git
llvmorg-13.0.0-0-gd7b669b3a303)
WARNING: WITNESS option enabled, expect reduced performance.
VT(efifb): resolution 800x600
CPU: Intel(R) Xeon(R) W-2295 CPU @ 3.00GHz (3000.05-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x50657  Family=0x6  Model=0x55  Stepping=7
 
Features=0x1f83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS,HTT>
 
Features2=0xfffab223<SSE3,PCLMULQDQ,VMX,SSSE3,FMA,CX16,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x121<LAHF,ABM,Prefetch>
  Structured Extended
Features=0xd19f47ab<FSGSBASE,TSCADJ,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,AVX512CD,AVX512BW,AVX512VL>
  Structured Extended Features2=0x804<UMIP,AVX512VNNI>
  Structured Extended Features3=0xac000400<MD_CLEAR,IBPB,STIBP,ARCH_CAP,SSBD>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  IA32_ARCH_CAPS=0xeb<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,MDS_NO,TSX_CTRL>
  AMD Extended Feature Extensions ID EBX=0x100d000<IBPB,IBRS,STIBP,SSBD>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
Hypervisor: Origin = "KVMKVMKVM"
real memory  = 16777216000 (16000 MB)
avail memory = 16193560576 (15443 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <BOCHS  BXPC    >
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 2 package(s) x 8 core(s)
FreeBSD/SMP Online: 1 package(s) x 8 core(s)
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
ioapic0 <Version 1.1> irqs 0-23
Launching APs: 2 7 1 5 4 6 3


Fatal trap 9: general protection fault while in kernel mode
cpuid = 2; apic id = 02
instruction pointer     = 0x20:0xffffffff80c4426a
stack pointer           = 0x28:0xfffffe001b5aedc0
frame pointer           = 0x28:0xfffffe001b5aee00
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 11 (idle: cpu2)
trap number             = 9
panic: general protection fault
cpuid = 2
time = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe001b5aebe0
vpanic() at vpanic+0x17f/frame 0xfffffe001b5aec30
panic() at panic+0x43/frame 0xfffffe001b5aec90
trap_fatal() at trap_fatal+0x385/frame 0xfffffe001b5aecf0
calltrap() at calltrap+0x8/frame 0xfffffe001b5aecf0
--- trap 0x9, rip = 0xffffffff80c4426a, rsp = 0xfffffe001b5aedc0, rbp =
0xfffffe001b5aee00 ---
cpu_search_highest() at cpu_search_highest+0xfa/frame 0xfffffe001b5aee00
cpu_search_highest() at cpu_search_highest+0x80/frame 0xfffffe001b5aee50
sched_idletd() at sched_idletd+0x377/frame 0xfffffe001b5aeef0
fork_exit() at fork_exit+0x80/frame 0xfffffe001b5aef30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001b5aef30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 11 tid 100005 ]
Stopped at      kdb_enter+0x32: movq    $0,0x127c753(%rip)
db> x/s version
version:        FreeBSD 14.0-CURRENT #0 main-n255198-1907e1c07c3: Thu May  5
07:52:56 UTC 2022\012   
root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC\012
db>

-- 
You are receiving this mail because:
You are the assignee for the bug.