Re: FreeBSD-13.1 and Xen-4.15.0 crash formerly working NetBSD-99.77 domu's
- In reply to: Brian Buhrow : "FreeBSD-13.1 and Xen-4.15.0 crash formerly working NetBSD-99.77 domu's"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 02 Jun 2022 10:37:47 UTC
On Thu, Jun 02, 2022 at 02:16:35AM -0700, Brian Buhrow wrote: > hello. I just finished updating one of my FreeBSD Xen servers from 12.2-stable to > 13.1-release. After a very painful upgrade process, which I'll detail in another e-mail, I > discovered that my domu's running NetBSD-99.77, which had been running without any trouble > whatsoever in pv mode, are failing with an unhandled general protection fault. Is this a known > issue, a configuration error on my part, a bug in the NetBSD kernel, or something completely > unknown? The details from xen from a sample crash are listed below. I'm happy to get more > details, if they would be helpful. Just tell me what you would like to see and how I might go > about getting it. Any light anyone can shed on this issue would be helpful. > > -thanks > -Brian > > Xen host details: > > %uname -a > FreeBSD xen-lothlorien.nfbcal.org 13.1-RELEASE FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64 > > Xen dmesg output: > > BC85DEB0, 0038 (r1 INTEL DQ67SW 1072009 AMI. 4) > (XEN) ACPI: ASF! BC85DEE8, 00A0 (r32 INTEL DQ67SW 1 TFSM F4240) > (XEN) ACPI: DMAR BC85DF88, 00E8 (r1 INTEL DQ67SW 1 INTL 1) > (XEN) System RAM: 32683MB (33467896kB) > (XEN) Domain heap initialised > (XEN) ACPI: 32/64X FACS address mismatch in FADT - bcbdbf80/0000000000000000, using 32 > (XEN) IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23 > (XEN) Enabling APIC mode: Flat. Using 1 I/O APICs > (XEN) PCI: Not using MCFG for segment 0000 bus 00-3f > (XEN) Switched to APIC driver x2apic_cluster > (XEN) CPU0: 1600 ... 3100 MHz > (XEN) xstate: size: 0x340 and states: 0x7 > (XEN) Speculative mitigation facilities: > (XEN) Hardware features: IBRS/IBPB STIBP L1D_FLUSH SSBD > (XEN) Compiled-in support: SHADOW_PAGING > (XEN) Xen settings: BTI-Thunk N/A, SPEC_CTRL: IBRS+ SSBD-, Other: IBPB L1D_FLUSH BRANCH_HARDEN > (XEN) L1TF: believed vulnerable, maxphysaddr L1D 46, CPUID 36, Safe address 1000000000 > (XEN) Support for HVM VMs: MSR_SPEC_CTRL RSB EAGER_FPU > (XEN) Support for PV VMs: MSR_SPEC_CTRL RSB EAGER_FPU > (XEN) XPTI (64-bit PV only): Dom0 enabled, DomU enabled (without PCID) > (XEN) PV L1TF shadowing: Dom0 disabled, DomU disabled > (XEN) Using scheduler: SMP Credit Scheduler rev2 (credit2) > (XEN) Initializing Credit2 scheduler > (XEN) Platform timer is 14.318MHz HPET > (XEN) Detected 3093.000 MHz processor. > (XEN) Intel VT-d iommu 0 supported page sizes: 4kB > (XEN) Intel VT-d iommu 1 supported page sizes: 4kB > (XEN) Intel VT-d Snoop Control not enabled. > (XEN) Intel VT-d Dom0 DMA Passthrough not enabled. > (XEN) Intel VT-d Queued Invalidation enabled. > (XEN) Intel VT-d Interrupt Remapping enabled. > (XEN) Intel VT-d Posted Interrupt not enabled. > (XEN) Intel VT-d Shared EPT tables not enabled. > (XEN) I/O virtualisation enabled > (XEN) - Dom0 mode: Relaxed > (XEN) Interrupt remapping enabled > (XEN) Enabled directed EOI with ioapic_ack_old on! > (XEN) ENABLING IO-APIC IRQs > (XEN) -> Using old ACK method > (XEN) Allocated console ring of 16 KiB. > (XEN) VMX: Supported advanced features: > (XEN) - APIC MMIO access virtualisation > (XEN) - APIC TPR shadow > (XEN) - Extended Page Tables (EPT) > (XEN) - Virtual-Processor Identifiers (VPID) > (XEN) - Virtual NMI > (XEN) - MSR direct-access bitmap > (XEN) - Unrestricted Guest > (XEN) HVM: ASIDs enabled. > (XEN) VMX: Disabling executable EPT superpages due to CVE-2018-12207 > (XEN) HVM: VMX enabled > (XEN) HVM: Hardware Assisted Paging (HAP) detected > (XEN) HVM: HAP page sizes: 4kB, 2MB > (XEN) Brought up 4 CPUs > (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource > (XEN) Dom0 has maximum 440 PIRQs > (XEN) Bogus DMIBAR 0xfed18001 on 0000:00:00.0 > (XEN) WARNING: PVH is an experimental mode with limited functionality > (XEN) Initial low memory virq threshold set at 0x4000 pages. > (XEN) Scrubbing Free RAM in background > (XEN) Std. Loglevel: Errors and warnings > (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) > (XEN) Xen is relinquishing VGA console. > (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input) > (XEN) Freed 604kB init memory > > <Crash details here> > > (XEN) d2v0 Unhandled general protection fault fault/trap [#13, ec=0000] > (XEN) domain_crash_sync called from entry.S: fault at ffff82d040300b58 x86_64/entry.S#create_bounce_frame+0x14f/0x167 > (XEN) Domain 2 (vcpu#0) crashed on cpu#1: > (XEN) ----[ Xen-4.15.0 x86_64 debug=n Not tainted ]---- > (XEN) CPU: 1 > (XEN) RIP: e033:[<ffffffff8021f27a>] > (XEN) RFLAGS: 0000000000000202 EM: 1 CONTEXT: pv guest (d2v0) > (XEN) rax: 0000000000070106 rbx: 000000000197d000 rcx: 0000000000000277 This is NetBSD trying to write to CR_PAT MSR (see rcx being 0x277). You will need to boot the guest with: msr_related=1 In the xl.cfg config file. This has been fixed in NetBSD HEAD, but the fix hasn't made it to any release IIRC: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/x86/x86/pmap.c See Revision 1.410. This is because of a change in Xen 4.15 that made the handling of MSRs accesses more strict, as a hardening effort resulting from XSA-351: https://xenbits.xen.org/xsa/advisory-351.html Thanks, Roger.