[Bug 267028] kernel panics when booting with both (zfs,ko or vboxnetflt,ko or acpi_wmi.ko) and amdgpu.ko

From: <bugzilla-noreply_at_freebsd.org>
Date: Sun, 29 Dec 2024 02:45:37 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267028

--- Comment #336 from Mark Millard <marklmi26-fbsd@yahoo.com> ---
(In reply to George Mitchell from comment #335)


For reference: going backwards through the found_modules
list (via also using my extra recorded data) is the
following. It is pairs of my modlist_newmod_hist and
then a the prior node's link.tqe_next value that
should agree with the the prior modAddr.

(kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos]
$48 = {modAddr = 0xfffff80004718180, containerAddr = 0xfffff8000362f300,
modnameAddr = 0xffffffff82ea6025 "amdgpu_raven_vcn_bin_fw", version = 1}
(kgdb) print
modlist_newmod_hist[modlist_newmod_hist_pos-1].modAddr->link.tqe_next
$49 = (struct modlist *) 0xfffff80004718180
(kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-1]
$50 = {modAddr = 0xfffff800047182c0, containerAddr = 0xfffff8000362f480,
modnameAddr = 0xffffffff82e62026 "amdgpu_raven_mec2_bin_fw", version = 1}
(kgdb) print
modlist_newmod_hist[modlist_newmod_hist_pos-2].modAddr->link.tqe_next
$51 = (struct modlist *) 0xfffff800047182c0
(kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-2]
$52 = {modAddr = 0xfffff80003647d40, containerAddr = 0xfffff80003169180,
modnameAddr = 0xffffffff82e1e010 "amdgpu_raven_mec_bin_fw", version = 1}
(kgdb) print
modlist_newmod_hist[modlist_newmod_hist_pos-3].modAddr->link.tqe_next
$53 = (struct modlist *) 0xfffff80003647d40
(kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-3]
$54 = {modAddr = 0xfffff80004718240, containerAddr = 0xfffff80003169300,
modnameAddr = 0xffffffff82e12009 "amdgpu_raven_rlc_bin_fw", version = 1}
(kgdb) print
modlist_newmod_hist[modlist_newmod_hist_pos-4].modAddr->link.tqe_next
$55 = (struct modlist *) 0xfffff80004718240
(kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-4]
$56 = {modAddr = 0xfffff800035bb8c0, containerAddr = 0xfffff80003169600,
modnameAddr = 0xffffffff829f6010 "amdgpu_raven_ce_bin_fw", version = 1}
(kgdb) print
modlist_newmod_hist[modlist_newmod_hist_pos-5].modAddr->link.tqe_next
$57 = (struct modlist *) 0xfffff80000000007
(kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-5]
$58 = {modAddr = 0xfffff80004b90140, containerAddr = 0xfffff80004c42000,
modnameAddr = 0xffffffff829ef000 "amdgpu_raven_me_bin_fw", version = 1}
(kgdb) print
modlist_newmod_hist[modlist_newmod_hist_pos-6].modAddr->link.tqe_next
$59 = (struct modlist *) 0xfffff80004b90140
(kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-6]
$60 = {modAddr = 0xfffff80004b90180, containerAddr = 0xfffff80004c42300,
modnameAddr = 0xffffffff829e7025 "amdgpu_raven_pfp_bin_fw", version = 1}
(kgdb) print
modlist_newmod_hist[modlist_newmod_hist_pos-7].modAddr->link.tqe_next
$61 = (struct modlist *) 0xfffff80004b90180

$57's (modlist_newmod_hist_pos-5's) link.tqe_next does not agree with
$56's modlist_newmod_hist[modlist_newmod_hist_pos-4].modAddr , again by having
the value 0xfffff80000000007 .

The code did not stop when 0xfffff80000000007 was stored into that
link.tqe_next instance, unfortunately.

There is something just before that was unusual in the core.9.txt ( or,
as named here, core.txt.9 ): I think it is the first time I've seen any
"WARNING !drm_modeset_is_locked . . ." messages BEFORE the first part of
the first trap(/panic?) reported. In this example, it looks like:

<6>[drm] Initialized amdgpu 3.40.0 20150101 for drmn0 on minor 0
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at
/usr/ports/graphics/drm-510-kmod/work/drm-kmod-drm_v5.10.163_7/drivers/gpu/drm/drm_atomic_helper.c:619
. . .
WARNING !drm_modeset_is_locked(&plane->mutex) failed at
/usr/ports/graphics/drm-510-kmod/work/drm-kmod-drm_v5.10.163_7/drivers/gpu/drm/drm_atomic_helper.c:894
kernel trap 22 with interrupts disabled
                            kernel trap 22 with interrupts disabled
 panic: modlist_lookup: a prior tqe_next changed!
. . .

I wonder if that is some sort of consequence of my attempt to
have the hardware monitoring three 8-Byte address ranges for
being written to.

As stands, I do not see how the results provide any specific
additional useful-evidence that I can identify.

The only thing that I've thought of is to add printf reporting of
the address argument passed to each attempted db_hwatchpoint_cmd
use to help validate that I have that code doing what I intended.

-- 
You are receiving this mail because:
You are the assignee for the bug.