14-stable on AMD7950X: Good and bad news

From: Chris Torek <chris.torek_at_gmail.com>
Date: Mon, 25 Mar 2024 02:11:01 UTC
I built and booted up the latest 14-stable tree on my
AMD7950X machine:

Good news: the mysterious AHCI adapter problem is gone, presumably
because of the new PCI range allocation code. So now both sets of SATA
ports work (at least, the drive I've left plugged in to the previously-failing
port now shows up).

Bad news: building drm-61-kmod, then loading amdgpu.ko,
causes a crash.

The immediate problem is that vm_phys_fictitious_unreg_range()
does this:

        rw_wlock(&vm_phys_fictitious_reg_lock);
        seg = RB_FIND(fict_tree, &vm_phys_fictitious_tree, &tmp);
        if (seg->start != start || seg->end != end) {

At line 1115, `seg` is NULL, so we die with a kernel segfault. It's probably
a good idea to add a NULL test here since RB_FIND can return NULL.
(Presumably just stick `sig == NULL ||` in front of the start/end tests.)

It's not clear why the unregister is failing though, as the drm code
seems correct at first glance.

It *is* clear why it's unregistering, though, as the console printed:

    drmn0: could not load firmware image 'amdgpu/psp_13_0_5_toc.bin'

and the expected subsequent cleanup messages (and now I've run
out of Stuff I Just Know Off-Hand at this point so I'll have to dig
into this more).

Chris