[Bug 271578] FreeBSD fails to init SMP and hits panic post install in Azure ARM64

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 23 May 2023 08:21:02 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271578

            Bug ID: 271578
           Summary: FreeBSD fails to init SMP and hits panic post install
                    in Azure ARM64
           Product: Base System
           Version: CURRENT
          Hardware: arm64
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: schakrabarti@microsoft.com

When we use SI_SUB_SMP + 1 for sysinit for vmbus_doattach(), to enable
synthetic interrupt setup to all CPU in SMP, then after installation FreeBSD is
hitting a panic during post install reboot.
Below is the panic.

OK boot -v
Loading kernel...
/boot/kernel/kernel text=0x2a8 text=0x8fe6a0 text=0x2991ec data=0x1526c8
data=0x                                                                        
                                                                               
    0+0x31e000 0x8+0x1549c8+0x8+0x17ce6d
Loading configured modules...
/boot/kernel/cryptodev.ko text=0x17a2 text=0x228c data=0x648+0x10
0x8+0xcc0+0x8+                                                                 
                                                                               
           0x958
/boot/entropy size=0x1000
/boot/kernel/zfs.ko text=0xe89fc text=0x227800 data=0x2c770+0xab87c
0x8+0x353a0+                                                                   
                                                                               
         0x8+0x2e4b8|
can't find '/etc/hostid'
No valid device tree blob found!
WARNING! Trying to fire up the kernel, but no device tree blob found!
EFI framebuffer information:
addr, size     0xe0000000, 0x800000
dimensions     1024 x 768
stride         1024
masks          0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000
---<<BOOT>>---
GDB: no debug ports present
KDB: debugger backends: ddb
KDB: current backend: ddb
                   Type     Physical      Virtual   #Pages Attr
       BootServicesData 000000000000 000000000000 00000800 UC WC WT WB
       BootServicesData 000000824000 000000000000 00000021 UC WC WT WB
     ConventionalMemory 000000845000 000000000000 000d5fb3 UC WC WT WB
             LoaderData 0000d67f8000 000000000000 00000001 UC WC WT WB
             LoaderCode 0000d67f9000 000000000000 00004000 UC WC WT WB
             LoaderData 0000da7f9000 000000000000 00004000 UC WC WT WB
             LoaderCode 0000de7f9000 000000000000 000000d1 UC WC WT WB
     ConventionalMemory 0000de8ca000 000000000000 0000021f UC WC WT WB
       BootServicesData 0000deae9000 000000000000 00000ce1 UC WC WT WB
     ConventionalMemory 0000df7ca000 000000000000 0000015b UC WC WT WB
       BootServicesCode 0000df925000 000000000000 000003a5 UC WC WT WB
    RuntimeServicesCode 0000dfcca000 0000dfcca000 00000030 UC WC WT WB RUNTIME
    RuntimeServicesData 0000dfcfa000 0000dfcfa000 00000054 UC WC WT WB RUNTIME
     ConventionalMemory 0000dfd4e000 000000000000 00000017 UC WC WT WB
      ACPIReclaimMemory 0000dfd65000 000000000000 0000001f UC WC WT WB
          ACPIMemoryNVS 0000dfd84000 000000000000 00000004 UC WC WT WB
       BootServicesData 0000dfd88000 000000000000 00000021 UC WC WT WB
       BootServicesCode 0000dfda9000 000000000000 0000002a UC WC WT WB
       BootServicesData 0000dfdd3000 000000000000 00000008 UC WC WT WB
       BootServicesCode 0000dfddb000 000000000000 00000021 UC WC WT WB
       BootServicesData 0000dfdfc000 000000000000 00000204 UC WC WT WB
     ConventionalMemory 000100000000 000000000000 00020000 UC WC WT WB
         MemoryMappedIO 0000effed000 0000effed000 00000001 UC RUNTIME
Physical memory chunk(s):
  0x00001000 - 0x007fffff,     7 MB (   2047 pages)
  0x00824000 - 0xdfd83fff,  3573 MB ( 914784 pages)
  0xdfd88000 - 0xdfffffff,     2 MB (    632 pages)
  0x100000000 - 0x11fffffff,   512 MB ( 131072 pages)
Excluded memory regions:
  0xd6800000 - 0xd82c5fff,    26 MB (   6854 pages) NoAlloc
  0xdfcca000 - 0xdfd4dfff,     0 MB (    132 pages) NoAlloc
  0xdfd65000 - 0xdfd83fff,     0 MB (     31 pages) NoAlloc
  0xe0000000 - 0xe07fffff,     8 MB (   2048 pages) NoAlloc
Found 2 CPUs in the ACPI tables
Copyright (c) 1992-2023 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 14.0-CURRENT #2 main-n263073-634a770a5e16-dirty: Tue May 23 07:33:53
UTC                                                                            
                                                                               
 2023
   
schakrabarti@schakrabarti-freebsd:/datadrive/sandbox_19_05/obj/datadrive/san   
                                                                               
                                                                        
dbox_19_05/src/arm64.aarch64/sys/GENERIC arm64
FreeBSD clang version 15.0.7 (https://github.com/llvm/llvm-project.git
llvmorg-1                                                                      
                                                                               
      5.0.7-0-g8dfdcc7b7bf6)
WARNING: WITNESS option enabled, expect reduced performance.
SRAT: Found CPU UID 1 domain 0: enabled
SRAT: Found CPU UID 2 domain 0: enabled
SRAT: Found memory domain 0 addr 0x0 len 0xe0000000: enabled
SRAT: Found memory domain 0 addr 0x100000000 len 0x20000000: enabled
SRAT: Found memory domain 0 addr 0x120000000 len 0xec0000000: enabled
SRAT: Ignoring memory at addr 0x120000000
SRAT: Found memory domain 0 addr 0x1000000000 len 0xf000000000: enabled
SRAT: Ignoring memory at addr 0x1000000000
SRAT: Found memory domain 0 addr 0x10000000000 len 0x10000000000: enabled
SRAT: Ignoring memory at addr 0x10000000000
SRAT: Found memory domain 0 addr 0x20000000000 len 0x20000000000: enabled
SRAT: Ignoring memory at addr 0x20000000000
SRAT: Found memory domain 0 addr 0x40000000000 len 0x40000000000: enabled
SRAT: Ignoring memory at addr 0x40000000000
SRAT: Found memory domain 0 addr 0x80000000000 len 0x80000000000: enabled
SRAT: Ignoring memory at addr 0x80000000000
SRAT: Found memory domain 0 addr 0x100000000000 len 0x100000000000: enabled
SRAT: Ignoring memory at addr 0x100000000000
SRAT: Found memory domain 0 addr 0x200000000000 len 0x200000000000: enabled
SRAT: Ignoring memory at addr 0x200000000000
SRAT: Found memory domain 0 addr 0x400000000000 len 0x400000000000: enabled
SRAT: Ignoring memory at addr 0x400000000000
SRAT: Found memory domain 0 addr 0x800000000000 len 0x800000000000: enabled
SRAT: Ignoring memory at addr 0x800000000000
VT(efifb): resolution 1024x768
Preloaded elf kernel "/boot/kernel/kernel" at 0xffff00000188e000.
Preloaded elf module "/boot/kernel/cryptodev.ko" at 0xffff000001897378.
Preloaded boot_entropy_cache "/boot/entropy" at 0xffff000001897b58.
Preloaded elf module "/boot/kernel/zfs.ko" at 0xffff000001897bb0.
Preloaded boot_entropy_platform "efi_rng_seed" at 0xffff000001898408.
Preloaded TSLOG data "TSLOG" at 0xffff000001898460.
module scmi already present!
module firmware already present!
real memory  = 4294799360 (4095 MB)
Physical memory chunk(s):
0x00000000001000 - 0x000000007fffff, 8384512 bytes (2047 pages)
0x00000000824000 - 0x000000d006dfff, 3481575424 bytes (849994 pages)
0x000000d82c6000 - 0x000000dfcc9fff, 127942656 bytes (31236 pages)
0x000000dfd4e000 - 0x000000dfd64fff, 94208 bytes (23 pages)
0x000000dfd88000 - 0x000000dfffffff, 2588672 bytes (632 pages)
0x00000100000000 - 0x0000011fffdfff, 536862720 bytes (131070 pages)
avail memory = 4155707392 (3963 MB)
Starting CPU 1 (1)
SRAT: CPU 0 has memory domain 0
SRAT: CPU 1 has memory domain 0
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
Enabling LSE atomics in the kernel
random: read 4096 bytes from preloaded cache
random: read 2048 bytes from platform bootloader
random: unblocking device.
VIMAGE (virtualized network stack) enabled
hostuuid: using 00000000-0000-0000-0000-000000000000
ULE: setup cpu 0
ULE: setup cpu 1
random: entropy device external interface
hyperv: Hypercall created
firmware: 'tegra210_xusb_fw' version 0: 132608 bytes loaded at
0xffff000000a7e1a                                                              
                                                                               
              8
snd_unit_init() u=0x00ff8000 [512] d=0x00007c00 [32] c=0x000003ff [1024]
feeder_register: snd_unit=-1 snd_maxautovchans=16 latency=2 feeder_rate_min=1
fe                                                                             
                                                                              
eder_rate_max=2016000 feeder_rate_round=25
MAP dfcca000 mode 2 pages 48
MAP dfcfa000 mode 2 pages 84
MAP effed000 mode 4 pages 1
null: <full device, null device, zero device>
openfirm: <Open Firmware control device>
tcp_log: tcp_log device
kbd0 at kbdmux0
mem: <memory>
crypto: <crypto core>
ACPI: RSDP 0x00000000DFD83018 000024 (v02 VRTUAL)
ACPI: XSDT 0x00000000DFD83F18 00006C (v01 VRTUAL MICROSFT 00000001 MSFT
00000001                                                                       
                                                                               
     )
ACPI: FACP 0x00000000DFD83C18 000114 (v06 VRTUAL MICROSFT 00000001 MSFT
00000001                                                                       
                                                                               
     )
ACPI: DSDT 0x00000000DFD65018 01DEC0 (v02 MSFTVM DSDT01   00000001 MSFT
05000000                                                                       
                                                                               
     )
ACPI: DBG2 0x00000000DFD83B18 000072 (v00 VRTUAL MICROSFT 00000001 MSFT
00000001                                                                       
                                                                               
     )
ACPI: GTDT 0x00000000DFD83D98 000060 (v02 VRTUAL MICROSFT 00000001 MSFT
00000001                                                                       
                                                                               
     )
ACPI: OEM0 0x00000000DFD83098 000064 (v01 VRTUAL MICROSFT 00000001 MSFT
00000001                                                                       
                                                                               
     )
ACPI: SPCR 0x00000000DFD83A98 000050 (v02 VRTUAL MICROSFT 00000001 MSFT
00000001                                                                       
                                                                               
     )
ACPI: APIC 0x00000000DFD83818 0000FC (v04 VRTUAL MICROSFT 00000001 MSFT
00000001                                                                       
                                                                               
     )
ACPI: SRAT 0x00000000DFD83198 000234 (v03 VRTUAL MICROSFT 00000001 MSFT
00000001                                                                       
                                                                               
     )
ACPI: PPTT 0x00000000DFD83418 000108 (v01 VRTUAL MICROSFT 00000000 MSFT
00000000                                                                       
                                                                               
     )
ACPI: BGRT 0x00000000DFD83E98 000038 (v01 VRTUAL MICROSFT 00000001 MSFT
00000001                                                                       
                                                                               
     )
acpi0: <VRTUAL MICROSFT>
ACPI: 1 ACPI AML tables successfully acquired and loaded
acpi0: Could not update all GPEs: AE_NOT_CONFIGURED
psci0: <ARM Power State Co-ordination Interface Driver> on acpi0
psci0: PSCI version 0.2 compatible
Found SMCCC version 1.0
gic0: <ARM Generic Interrupt Controller v3.0> iomem
0xffff0000-0xffffffff,0xeffe                                                   
                                                                               
                         e000-0xf000dfff,0xf000e000-0xf002dfff on acpi0
gic0: using spi 64 to 991
gic0: SPIs: 992, IDs: 16383
gic0: Start searching for Re-Distributor
gic0: CPU0 Re-Distributor has been found
gic0: CPU0 Re-Distributor woke up
gic0: CPU0 enabled CPU interface via system registers
generic_timer0: <ARM Generic Timer> irq 4,5,6 on acpi0
generic_timer0: allocated irq for 'sec-phys'
generic_timer0: allocated irq for 'phys'
generic_timer0: allocated irq for 'virt'
generic_timer0: could not allocate irq for optional interrupt 'hyp-phys'
generic_timer0: could not allocate irq for optional interrupt 'hyp-virt'
Timecounter "ARM MPCore Timecounter" frequency 25000000 Hz quality 1000
Event timer "ARM MPCore Eventtimer" frequency 25000000 Hz quality 1000
efirtc0: <EFI Realtime Clock>
efirtc0: registered as a time-of-day clock, resolution 1.000000s
ram0: reserving memory region:   1000-800000
ram0: reserving memory region:   824000-d6800000
ram0: reserving memory region:   d82c6000-dfcca000
ram0: reserving memory region:   dfd4e000-dfd65000
ram0: reserving memory region:   dfd88000-e0000000
ram0: reserving memory region:   100000000-120000000
ram0: reserving excluded region: d6800000-d82c5fff
ram0: reserving excluded region: dfcca000-dfd4dfff
ram0: reserving excluded region: dfd65000-dfd83fff
ram0: reserving excluded region: e0000000-e07fffff
pmu0: <Performance Monitoring Unit> on acpi0
pmu0: MADT: cpu 0 (mpidr 0) irq 0 level-triggered
pmu0: MADT: cpu 1 (mpidr 1) irq 0 level-triggered
cpu0: <ACPI CPU> on acpi0
cpu0: switching to generic Cx mode
cpu1: <ACPI CPU> on acpi0
acpi_syscontainer0: <System Container> on acpi0
vmbus0: <Hyper-V Vmbus> on acpi_syscontainer0
vmgenc0: <VM Generation Counter> on acpi0
acpi_ged0: <Generic Event Device> irq 3 on acpi0
acpi_ged0: Raw IRQ 35
uart0: <PrimeCell UART (PL011)> iomem 0xeffec000-0xeffecfff irq 0 on acpi0
uart0: console (115200,n,8,1)
uart0: fast interrupt
uart0: PPS capture mode: DCD
uart1: <PrimeCell UART (PL011)> iomem 0xeffeb000-0xeffebfff irq 1 on acpi0
uart1: fast interrupt
uart1: PPS capture mode: DCD
vmbus_res0: <Hyper-V Vmbus Resource> irq 2 on acpi0
armv8crypto0: <AES-CBC,AES-XTS,AES-GCM>
crypto: assign armv8crypto0 driver id 0, flags 0xe000000
crypto: assign cryptosoft0 driver id 1, flags 0x6000000
AcpiOsExecute: task queue not started
Device configuration finished.
procfs registered
Timecounters tick every 10.000 msec
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
crypto: <crypto device>
vlan: initialized, using hash tables with chaining
lo0: bpf attached
IPsec: Initialized Security Association Processing.
tcp_init: net.inet.tcp.tcbhashsize auto tuned to 32768
usb_needs_explore_all: no devclass
AcpiOsExecute: enqueue 1 pending tasks
Trying to mount root from zfs:zroot/ROOT/default []...
CPU  0: ARM Neoverse-N1 r3p1 affinity:  0
                   Cache Type = <64 byte D-cacheline,64 byte I-cacheline,PIPT
IC                                                                             
                                                                              
ache,64 byte ERG,64 byte CWG>
 Instruction Set Attributes 0 = <DP,RDM,Atomic,CRC32,SHA2,SHA1,AES+PMULL>
 Instruction Set Attributes 1 = <RCPC-8.3,DCPoP>
 Instruction Set Attributes 2 = <>
         Processor Features 0 = <CSV3,CSV2,GIC,AdvSIMD+HP,FP+HP,EL3,EL2,EL1,EL0
                                                                               
                                                                            
32>
         Processor Features 1 = <>
      Memory Model Features 0 = <TGran4,TGran64,TGran16,16bit ASID,256TB PA>
      Memory Model Features 1 = <PAN+ATS1E1,8bit VMID,HAF+DS>
      Memory Model Features 2 = <32bit CCIDX,48bit VA,UAO>
             Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6
Breakpoi                                                                       
                                                                               
     nts,PMUv3 v8.1,Debugv8>
             Debug Features 1 = <>
         Auxiliary Features 0 = <>
         Auxiliary Features 1 = <>
AArch32 Instruction Set Attributes 5 = <RDM,CRC32,SHA2,SHA1,AES+VMULL,SEVL>
AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP
VFPv3                                                                          
                                                                               
  +v4,AdvSIMD>
AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP Arith,SIMDHP
Arith,SIMDSP,SIMD                                                              
                                                                               
              Int,SIMDLS,FPDNaN,FPFtZ>
 L1 cache: 64KB (instruction), 64KB (data)
 L2 cache: 1024KB (unified)
CPU  1: ARM Neoverse-N1 r3p1 affinity:  1
 L1 cache: 64KB (instruction), 64KB (data)
 L2 cache: 1024KB (unified)
Release APs...gic0: Start searching for Re-Distributor
panic: data abort with spinlock held
cpuid = 1
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self
KDB: enter: panic
[ thread pid 11 tid 100004 ]
Stopped at      kdb_enter+0x44: str     xzr, [x19, #3328]
db> APs not started
panic: smp_after_idle_runnable: SMP not started yet
cpuid = 0
time = 4
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x13c
panic() at panic+0x44
smp_after_idle_runnable() at smp_after_idle_runnable+0xb0
mi_startup() at mi_startup+0x1fc
virtdone() at virtdone+0x70
Uptime: 4s

This time smp_started is zero, as init_secondary has not got completed for all
the secondary CPUs.
This is blocking the bringup of FreeBSD in Azure ARM64.

This problem has a small history, at first FreeBSD was unable to setup PPI in
all the secondary processors during installation boot, and Kyle has fixed that
with https://reviews.freebsd.org/rG172af24449cd8d34339172d125832b7ecd274213.
But post that we are able to boot install, but post installation we are hitting
this panic.

-- 
You are receiving this mail because:
You are the assignee for the bug.