[Bug 243225] "mpr0: Out of chain frames" boot hang after clang 9.0.1 import (probably timing, not compiler related)

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 31 Mar 2025 12:05:22 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243225

Lorenzo Perone <lopez.on.the.lists@yellowspace.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lopez.on.the.lists@yellowsp
                   |                            |ace.net

--- Comment #7 from Lorenzo Perone <lopez.on.the.lists@yellowspace.net> ---
Hi there,

I have the exact same problem, on a DELL PowerEdge R440 with the DELL PERC
HBA330 which is the LSI SAS 3008. I can report the following:

With FreeBSD 14.2, the same happens with an unmodified memstick installer or a
distribution over pxe / netboot.

Setting

hw.mpr.max_chains=2048

in /boot/loader.conf (in my netboot environment) helps booting, at least most
of the rounds. After the system is booted, I can see the controller and the 2
drives attached to it, was able to create a zfs pool and copy data with normal
(expectable) speeds on them without problems.

BUT: As soon as I add

zfs_load="YES" 

(or for that matter geom_mirror_load="YES". Any GEOM module will probably do?)

it goes back doing the same thing it was doing before, exactly as initially
reported by Terry Kennedy (stuck in a reinitialization loop)

So it does indeed look like there is a timing problem somehow. If the kernel
tries to access the controller/drives in the boot process (such as the zfs and
geom modules do), the mpr driver gets stuck. 

Note that booting off the EFI partition, as well as the root partition with
according loader.conf, on those drives also works until I put in
"zfs_load=YES".
Which means that all the loader logic works, but as soon as the driver kicks
in, it doesnt:
- Reading loader.efi (as EFI/BOOT/freebsd.efi) from da0p1 or da1p1 (the drives
attached to the controller) works
- Even reading /boot/loader.conf from the mirror zroot pool (boot pool) works
- Loading the kernel works
- But as soon as I put zfs_load or geom_mirror_load in /boot/loader.conf, it
will load them, then initialize the rest of the kernel, but it will get stuck
with the "initialization loop".

Note that sometimes even removing the load_xxx lines does not help for some
reboots, but will boot eventually after a cold start. So what I see very much
matches the observations of Terry Kennedy about this problem behaving very
erraticly.

I am available for any tests for a limited time (I'll resort to another
controller if I can on this machine...).

Best Regards,

Lorenzo

-- 
You are receiving this mail because:
You are the assignee for the bug.