Re: VCPUOP_send_nmi returns -38

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Tue, 11 Jan 2022 10:07:28 UTC
On 11/01/2022 11:50, Andriy Gapon wrote:
> 
> Recently I got a report of crashes related to using procstat -k on one of our 
> systems.  The system runs FreeBSD 12.2 on an AWS Xen-based instance (can get 
> more specifics about it later).

The instance type is t2.large.
Here are all lines from verbose boot that mention Xen:

XEN: Hypervisor version 4.2 detected.
Disabling MSI-X interrupt migration due to Xen hypervisor bug.
XEN: disabling emulated disks
XEN: disabling emulated nics
Hypervisor: Origin = "XenVMMXenVMM"
x2APIC available but disabled due to running under XEN
ACPI APIC Table: <Xen HVM>
Xen interrupts: unable to register PIRQ EOI map
Xen interrupt system initialized
ACPI: RSDP 0x00000000000EA020 000024 (v02 Xen   )
ACPI: XSDT 0x00000000FC00E2A0 000054 (v01 Xen    HVM      00000000 HVML 00000000)
ACPI: FACP 0x00000000FC00DF60 0000F4 (v04 Xen    HVM      00000000 HVML 00000000)
ACPI: DSDT 0x00000000FC0021C0 00BD19 (v02 Xen    HVM      00000000 INTL 20090123)
ACPI: APIC 0x00000000FC00E060 0000D8 (v02 Xen    HVM      00000000 HVML 00000000)
ACPI: HPET 0x00000000FC00E1B0 000038 (v01 Xen    HVM      00000000 HVML 00000000)
ACPI: WAET 0x00000000FC00E1F0 000028 (v01 Xen    HVM      00000000 HVML 00000000)
ACPI: SSDT 0x00000000FC00E220 000031 (v02 Xen    HVM      00000000 INTL 20090123)
ACPI: SSDT 0x00000000FC00E260 000033 (v02 Xen    HVM      00000000 INTL 20090123)
acpi0: <Xen> on motherboard
xenpci0: <Xen Platform Device> port 0xc000-0xc0ff mem 0xf2000000-0xf2ffffff irq 
28 at device 3.0 on pci0
xenpv0: <Xen PV bus> on motherboard
granttable0: <Xen Grant-table Device> on xenpv0
xen_et0: <Xen PV Clock> on xenpv0
Event timer "XENTIMER" frequency 1000000000 Hz quality 950
Timecounter "XENTIMER" frequency 1000000000 Hz quality 950
xen_et0: registered as a time-of-day clock, resolution 0.000001s
xenstore0: <XenStore> on xenpv0
xsd_dev0: <Xenstored user-space device> on xenpv0
evtchn0: <Xen event channel user-space device> on xenpv0
privcmd0: <Xen privileged interface user-space device> on xenpv0
gntdev0: <Xen grant-table user-space device> on xenpv0
debug0: <Xen debug handler> on xenpv0
xenballoon0: <Xen Balloon Device> on xenstore0
<Xen Control Device> on xenstore0
xs_dev0: <Xenstore user-space device> on xenstore0
xenbusb_front0: <Xen Frontend Devices> on xenstore0
xn0: <Virtual Network Interface> at device/vif/0 on xenbusb_front0
xenbusb_back0: <Xen Backend Devices> on xenstore0
xbd0: 40960MB <Virtual Block Device> at device/vbd/768 on xenbusb_front0
xbd6: 141312MB <Virtual Block Device> at device/vbd/51808 on xenbusb_front0
xbd1: 204800MB <Virtual Block Device> at device/vbd/51728 on xenbusb_front0
xbd2: 204800MB <Virtual Block Device> at device/vbd/51744 on xenbusb_front0
xen_et0: providing initial system time

> It immediately reminded me of an older issue (Subject: Xen (HVM) and NMI) where 
> the root cause was that NMIs were delivered as regular interrupts.
> But 12.2 has the newer code that delivers NMIs as NMIs.
> 
> After some investigation it became evident that NMIs are not delivered at all.
> I modified send_nmi() in sys/x86/xen/xen_apic.c to capture and report errors 
> from HYPERVISOR_vcpu_op(VCPUOP_send_nmi) calls.
> That revealed that the call returns -38 which appears to mean ENOSYS.
> 
> I am not sure what that could mean.
> Perhaps NMI is somehow disabled in the Xen configuration (for that specific 
> instance type)?
> I am out of better ideas.
> 
> P.S.
> It appears that FreeBSD does not expect that an IPI, including NMI, can fail.
> So, there is no way to propagate the error to callers.
> I think that we could either printf it or, perhaps, even panic on such a failure.
> 


-- 
Andriy Gapon