[Bug 271990] IRQ mapping table is full after stress devctl disable/enable
Date: Wed, 14 Jun 2023 11:08:54 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271990 Bug ID: 271990 Summary: IRQ mapping table is full after stress devctl disable/enable Product: Base System Version: CURRENT Hardware: arm64 OS: Any Status: New Severity: Affects Many People Priority: --- Component: arm Assignee: freebsd-arm@FreeBSD.org Reporter: osamaabb@amazon.com Reproduction steps: ------------------- 1. Create an AWS EC2 instance from one of the following AMIs in us-east-1 1.1: ami-0b55af91f40cd29ee - FreeBSD 14.0-CURRENT-arm64-20230525 UEFI 1.2: ami-0fdc715f878897386 - FreeBSD 13.2-STABLE-arm64-20230601 UEFI 1.3: ami-0e1fd0c2493efe1d1 - FreeBSD 12.4-STABLE-arm64-2023-06-01 2. run the following reset loop script: #!/bin/sh while true do devctl disable ena0 devctl enable ena0 done Result: ------- Crashes every time. 100% reproducible. ***The same test does not fail on intel based instances.*** Stack trace: ------------ 2023-06-14T08:05:02.374Z panic: IRQ mapping table is full. 2023-06-14T08:05:02.374Z cpuid = 18 2023-06-14T08:05:02.374Z time = 1686729902 2023-06-14T08:05:02.374Z KDB: stack backtrace: 2023-06-14T08:05:02.374Z db_trace_self() at db_trace_self 2023-06-14T08:05:02.374Z db_trace_self_wrapper() at db_trace_self_wrapper+0x30 2023-06-14T08:05:02.374Z vpanic() at vpanic+0x13c 2023-06-14T08:05:02.374Z panic() at panic+0x44 2023-06-14T08:05:02.374Z intr_map_irq() at intr_map_irq+0xb0 2023-06-14T08:05:02.374Z intr_alloc_msix() at intr_alloc_msix+0x1d8 2023-06-14T08:05:02.374Z generic_pcie_acpi_alloc_msix() at generic_pcie_acpi_alloc_msix+0x78 2023-06-14T08:05:02.374Z pci_alloc_msix_method() at pci_alloc_msix_method+0x168 2023-06-14T08:05:02.374Z ena_enable_msix_and_set_admin_interrupts() at ena_enable_msix_and_set_admin_interrupts+0x10c 2023-06-14T08:05:02.374Z ena_attach() at ena_attach+0x65c 2023-06-14T08:05:02.375Z device_attach() at device_attach+0x3f8 2023-06-14T08:05:02.375Z device_probe_and_attach() at device_probe_and_attach+0x7c 2023-06-14T08:05:02.375Z devctl2_ioctl() at devctl2_ioctl+0x44c 2023-06-14T08:05:02.375Z devfs_ioctl() at devfs_ioctl+0xd4 2023-06-14T08:05:02.375Z vn_ioctl() at vn_ioctl+0xc0 2023-06-14T08:05:02.375Z devfs_ioctl_f() at devfs_ioctl_f+0x20 2023-06-14T08:05:02.375Z kern_ioctl() at kern_ioctl+0x2dc 2023-06-14T08:05:02.375Z sys_ioctl() at sys_ioctl+0x118 2023-06-14T08:05:02.375Z do_el0_sync() at do_el0_sync+0x520 2023-06-14T08:05:02.375Z handle_el0_sync() at handle_el0_sync+0x44 2023-06-14T08:05:02.375Z --- exception, esr 0x56000000 2023-06-14T08:05:02.375Z Uptime: 4m1s 2023-06-14T08:05:02.375Z Dumping 2053 out of 64453 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% 2023-06-14T08:10:36.676Z Dump complete 2023-06-14T08:10:37.976Z UEFI firmware (version built at 09:00:00 on Nov 1 2018) 2023-06-14T08:10:38.076Z [2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[0m[35m[40m[2J[01;01H[2J[01;01H[0m[37m[40m[01;01HConsoles: EFI console 2023-06-14T08:10:38.076Z Reading loader env vars from /efi/freebsd/loader.env 2023-06-14T08:10:38.076Z Setting currdev to disk0p1: 2023-06-14T08:10:38.076Z FreeBSD/arm64 EFI loader, Revision 1.1 2023-06-14T08:10:38.076Z (Thu May 25 06:36:21 UTC 2023 root@releng1.nyi.freebsd.org) 2023-06-14T08:10:38.076Z 2023-06-14T08:10:38.076Z Command line arguments: loader.efi 2023-06-14T08:10:38.176Z Image base: 0x7856f000 2023-06-14T08:10:38.176Z EFI version: 2.70 2023-06-14T08:10:38.176Z EFI Firmware: EDK II (rev 1.00) 2023-06-14T08:10:38.176Z Console: efi (0x1000) 2023-06-14T08:10:38.176Z Load Path: \EFI\BOOT\BOOTAA64.EFI 2023-06-14T08:10:38.176Z Load Device: PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(1,GPT,B61C1E65-FAFA-11ED-84CB-002590EC5BF2,0x3,0x10418) 2023-06-14T08:10:38.176Z BootCurrent: 0001 Initial investigation results: ------------------------------ Tried to reproduce the issue on Intel based instances, no reproduction even after 50k up/down iteration. Looked into the fbsd ena driver [1] up/down flows, saw that the driver does the pci_msix_allocate/release and bus_allocation/release in the correct order. [1] https://github.com/amzn/amzn-drivers/tree/master/kernel/fbsd/ena Since the pci/bus APIs should be platform agnostic (?) I assume it to be an issue with ARM side of the kernel -- You are receiving this mail because: You are the assignee for the bug.