[Bug 272469] Broadcom mpi3mr driver: MSIX allocation fail on DELL PowerEdge R7625 system

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 12 Jul 2023 11:48:01 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272469

            Bug ID: 272469
           Summary: Broadcom mpi3mr driver: MSIX allocation fail on DELL
                    PowerEdge R7625 system
           Product: Base System
           Version: 13.2-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: chandrakanth.patil@broadcom.com

Created attachment 243352
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=243352&action=edit
msix_table_dump

mpi3mr avenger driver:

system details: 
1. Dell PowerEdge R7625 with 196 physical cores and 256 logical cores

mpi3mr driver will allocate the single msix for handshaking with the driver
during
the initial load phase using pci_alloc_msix() API. After allocating the single
msix, the driver is sending the get IOC_FACTS commands to firmware through
which the driver will fetch all the controller properties. The issue is the
driver is not getting the interrupt for IOC_FACTS completion leads to timeout
which in turn leads to driver load failure. but the driver can see that the
command is completed by the firmware if it polls the reply queue.
After creating the single msix in the driver, the vmstat -i in the OS should
show the interrupt but it is not showing so the interrupt binding is failing.
ideally in this case the pci_alloc_msix() API should throw some error during
allocation but it is not throwing any error.

Note: 
     1. This issue is happening only on this specific server where the number
of 
        CPUs are > 128 (total CPUs are 256).
     2. But when we reduce the number of cores to 24 in the BIOS then the
driver 
        is working without any issues. 

We have dumped the MSIX table before and after the allocation of a single msix
and after the command times out. Please find it in the attachment.

I wanted to understand if is there any OS limitation w.r.t MSIX allocation on
larger cores system.

Please find attached driver logs and MSIX table dump.

-- 
You are receiving this mail because:
You are the assignee for the bug.