Re: After 13.1 install, "panic: AP #1 (PHY #1) failed!" with SuperMicro X10SRL-F motherboard

From: Anubhav (Re: FreeBSD) <anubhav+freebsd_at_hawaii.edu>
Date: Tue, 13 Dec 2022 20:35:24 UTC
(Please email me too when you reply.)


On Fri, Dec 9, 2022 at 9:35 AM Anubhav/FreeBSD wrote:

> The computer server with ...
>
> SuperMicro X10SRL-F motherboard (LGA 2011-V3, C612 chipset),
> Intel Xeon E5-1620 V3 CPU
>
> ... was working just fine with FreeBSD 12.x & 13.0. 13.0 was
> installed from scratch with ZFS on root.
>
> Two days ago I updated the OS to 13.1-p5 in a new boot environment
> ("freebsd-update -r 13.1-RELEASE upgrade"; "freebsd-update install";
> reboot; "freebsd-update install"). I did so over ssh.
>
> After a day, I could not connect to the computer via ssh. When I checked,
> lots of error messages from sshd were *flying* on the console (failed to
> take a photo). I could not do anything on the console. (The computer is
> connected to video & keyboard via software KVM; there is no physical serial
> connection.)
>
> After reboot of 13.1-p5, a "panic" happens all the 3-4 times I tried ...
>
> (transcribed from the photo of the screen after booting in verbose mode)
> SMP: Added CPU 1 (AP)
> MADT: Found CPU APIC ID 3 ACPI ID 3: enabled
> SMP: Added CPU 3 (AP)
> MADT: Found CPU APIC ID 5 ACPI ID 5: enabled
> SMP: Added CPU 5 (AP)
> MADT: Found CPU APIC ID 7 ACPI ID 7: enabled
> SMP: Added CPU 7 (AP)
> Event timer "LAPIC" quality 600
> LAPIC: ipi_wait() us multiplier 64 (r 5400080 tsc 3500095930)
> ACPI APIC Table: <SUPERM SMCI--MB>
> Package ID shift: 4
> L3 cache shift: 4
> L2 cache shift: 1
> L1 cache shift: 1
> Core ID shift: 1
> AP boot address: 0x98000
> panic: AP #1 (PHY #1) failed!
> cpuid = 0
> time = 1
> KDB: stack backtrace
> #0 0xffffffff80c694a5 at kdb_backtrace+0x65
> #1 0xffffffff80c1bb5f at vpanic+0x17f
> #2 0xffffffff80c1b983 at panic+0x43
> #3 0xffffffff81093633 at native_start_all_aps+0x633
> #4 0xffffffff81092ce1 at cpu_mp_start+0x1a1
> #5 0xffffffff80c7c32a at mp_start+0x9a
> #6 0xffffffff80ba970f at mi_startup+0xdf
> #7 0xffffffff80385022 at btext+0x22
> Uptime: 1s
>
>
> ... What is going on here, or what had happened with 13.1 install
> that the machine panics?
>
> Booting with any of 13.0-p1[13] boot environments makes
> no difference.
>
> ...

After removing the machine from the rack (included disconnection
of RaidMachine 24-bay disk enclosure from the LSI HBA card
installed in the machine), it booted right up (with already
installed FreeBSD 13.1-p5 on the internal disk) as if nothing
had happened! There was no panic or any "AP #1 (PHY #1)
failed!"-like messages.

How? Why?

If the machine still had panicked (after removal from the rack),
then I could have tried ...
- updating the BIOS;
- booting from 13.[01] image from a USB flash stick;
- installing 13.[01] from scratch.

Now, I do not know how much I can trust the machine to
not fail (panic again on a reboot).


- Anubhav