[Bug 277671] 14-RELEASE/14-STABLE crash with heavy disk IO on AMD Asus x670e motherboard and Intel i225 (igc) breakage NIC non-functioning

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 13 Mar 2024 15:22:36 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277671

            Bug ID: 277671
           Summary: 14-RELEASE/14-STABLE crash with heavy disk IO on AMD
                    Asus x670e motherboard and Intel i225 (igc) breakage
                    NIC non-functioning
           Product: Base System
           Version: 14.0-STABLE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: cam@neo-zeon.de

Using a Samsung 1TB Samsung 850 Pro (ZFS), I'm able to pretty reliably crash my
box when running 'monerod' from 'net-p2p/monero-cli'

Interestingly, I haven't been able to reproduce this (yet) with other IO loads
including bonnie/bonnie++

This is on an Asus Crosshair x670 Extreme motherboard. This didn't start
happening until I upgraded to BIOS 1709 and later. So why not downgrade the
BIOS? Because it's impossible to anything lower after upgrading to 1709.

But this seems to be more than a simple BIOS bug. I believe it's likely related
to the AGESA version.

This bug seems it could be tangentially related:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272507

With the upgrade to 1709, keyboard/mouse input stopped working (could've been
an issue with USB not functioning properly after the update) and network
stopped working (an intel i225). The IO related crash also did not occur until
this upgrade.

Since upgrading to newer versions of the BIOS, keyboard/mouse input started
working again. The onboard i225 still doesn't (it's recognized by 'ifconfig
igc0', network settings are accepted and applied, but there is no network
connectivity). I've worked around the nic issue by installing an Intel x540
tx2.

If I downgrade back to 1709, keyboard/mouse input still works, so probably the
BIOS downgrade doesn't downgrade the AGESA version (which is why I suspect this
is AGESA related).

I suspect the BIOS (AGESA) upgrade has altered and/or introduced some platform
level bugs that are either Asus specific, AMD x670 specific, or some
intersection of the 2.

When the crash occurs, the screen immediately goes black (so no text to share)
and /var/crash isn't populated. If requested, perhaps I could try mounting
/var/crash from an NVME drive (assuming this is a SATA specific issue) and/or
run swap from an NMVE drive to see if I can get a crash dump.

-- 
You are receiving this mail because:
You are the assignee for the bug.