[Bug 271990] IRQ mapping table is full after stress devctl disable/enable

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 14 Jun 2023 11:08:54 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271990

            Bug ID: 271990
           Summary: IRQ mapping table is full after stress devctl
                    disable/enable
           Product: Base System
           Version: CURRENT
          Hardware: arm64
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: arm
          Assignee: freebsd-arm@FreeBSD.org
          Reporter: osamaabb@amazon.com

Reproduction steps:
-------------------
1. Create an AWS EC2 instance from one of the following AMIs in us-east-1
1.1: ami-0b55af91f40cd29ee - FreeBSD 14.0-CURRENT-arm64-20230525 UEFI
1.2: ami-0fdc715f878897386 - FreeBSD 13.2-STABLE-arm64-20230601 UEFI
1.3: ami-0e1fd0c2493efe1d1 - FreeBSD 12.4-STABLE-arm64-2023-06-01

2. run the following reset loop script:
#!/bin/sh
while true
do
devctl disable ena0
devctl enable ena0
done

Result:
-------
Crashes every time. 100% reproducible.

***The same test does not fail on intel based instances.***

Stack trace:
------------
2023-06-14T08:05:02.374Z        panic: IRQ mapping table is full.
        2023-06-14T08:05:02.374Z        cpuid = 18
        2023-06-14T08:05:02.374Z        time = 1686729902
        2023-06-14T08:05:02.374Z        KDB: stack backtrace:
        2023-06-14T08:05:02.374Z        db_trace_self() at db_trace_self
        2023-06-14T08:05:02.374Z        db_trace_self_wrapper() at
db_trace_self_wrapper+0x30
        2023-06-14T08:05:02.374Z        vpanic() at vpanic+0x13c
        2023-06-14T08:05:02.374Z        panic() at panic+0x44
        2023-06-14T08:05:02.374Z        intr_map_irq() at intr_map_irq+0xb0
        2023-06-14T08:05:02.374Z        intr_alloc_msix() at
intr_alloc_msix+0x1d8
        2023-06-14T08:05:02.374Z        generic_pcie_acpi_alloc_msix() at
generic_pcie_acpi_alloc_msix+0x78
        2023-06-14T08:05:02.374Z        pci_alloc_msix_method() at
pci_alloc_msix_method+0x168
        2023-06-14T08:05:02.374Z       
ena_enable_msix_and_set_admin_interrupts() at
ena_enable_msix_and_set_admin_interrupts+0x10c
        2023-06-14T08:05:02.374Z        ena_attach() at ena_attach+0x65c
        2023-06-14T08:05:02.375Z        device_attach() at device_attach+0x3f8
        2023-06-14T08:05:02.375Z        device_probe_and_attach() at
device_probe_and_attach+0x7c
        2023-06-14T08:05:02.375Z        devctl2_ioctl() at devctl2_ioctl+0x44c
        2023-06-14T08:05:02.375Z        devfs_ioctl() at devfs_ioctl+0xd4
        2023-06-14T08:05:02.375Z        vn_ioctl() at vn_ioctl+0xc0
        2023-06-14T08:05:02.375Z        devfs_ioctl_f() at devfs_ioctl_f+0x20
        2023-06-14T08:05:02.375Z        kern_ioctl() at kern_ioctl+0x2dc
        2023-06-14T08:05:02.375Z        sys_ioctl() at sys_ioctl+0x118
        2023-06-14T08:05:02.375Z        do_el0_sync() at do_el0_sync+0x520
        2023-06-14T08:05:02.375Z        handle_el0_sync() at
handle_el0_sync+0x44
        2023-06-14T08:05:02.375Z        --- exception, esr 0x56000000
        2023-06-14T08:05:02.375Z        Uptime: 4m1s
        2023-06-14T08:05:02.375Z        Dumping 2053 out of 64453
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
        2023-06-14T08:10:36.676Z        Dump complete
        2023-06-14T08:10:37.976Z        UEFI firmware (version built at
09:00:00 on Nov 1 2018)
        2023-06-14T08:10:38.076Z       
[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[0m[35m[40m[2J[01;01H[2J[01;01H[0m[37m[40m[01;01HConsoles:
EFI console
        2023-06-14T08:10:38.076Z        Reading loader env vars from
/efi/freebsd/loader.env
        2023-06-14T08:10:38.076Z        Setting currdev to disk0p1:
        2023-06-14T08:10:38.076Z        FreeBSD/arm64 EFI loader, Revision 1.1
        2023-06-14T08:10:38.076Z        (Thu May 25 06:36:21 UTC 2023
root@releng1.nyi.freebsd.org)
        2023-06-14T08:10:38.076Z        
        2023-06-14T08:10:38.076Z        Command line arguments: loader.efi
        2023-06-14T08:10:38.176Z        Image base: 0x7856f000
        2023-06-14T08:10:38.176Z        EFI version: 2.70
        2023-06-14T08:10:38.176Z        EFI Firmware: EDK II (rev 1.00)
        2023-06-14T08:10:38.176Z        Console: efi (0x1000)
        2023-06-14T08:10:38.176Z        Load Path: \EFI\BOOT\BOOTAA64.EFI
        2023-06-14T08:10:38.176Z        Load Device:
PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(1,GPT,B61C1E65-FAFA-11ED-84CB-002590EC5BF2,0x3,0x10418)
        2023-06-14T08:10:38.176Z        BootCurrent: 0001 


Initial investigation results:
------------------------------
Tried to reproduce the issue on Intel based instances, no reproduction even
after 50k up/down iteration.
Looked into the fbsd ena driver [1] up/down flows, saw that the driver does the
pci_msix_allocate/release and bus_allocation/release in the correct order.

[1] https://github.com/amzn/amzn-drivers/tree/master/kernel/fbsd/ena

Since the pci/bus APIs should be platform agnostic (?) I assume it to be an
issue with ARM side of the kernel

-- 
You are receiving this mail because:
You are the assignee for the bug.