[Bug 264141] nvme(4): Heavy load to SSD wedges 13.1 system: Controller in fatal status, resetting ... Resetting controller due to a timeout and possible hot unplug.
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 22 May 2022 05:50:25 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264141 --- Comment #8 from crb <crb@ChrisBowman.com> --- Replacing nvme with nda results in similar looking messages from both nvme0 and nda0 (theses didn't show up in a remote ssh session so that I could cut and paste them). I don't think the cards get to hot. The machine has 3 fans that spin up with cpu temperature and as I mentioned earlier the card has a heat sync. When I link while building world with 32 jobs I do hear the fans ramp ever so slightly but mostly they're quiet. I doubt it's cabling as these SSDs were directly inserted in to an M2 slot and I seated the last one securely a few days ago. It could be power, this is a bit of a hacked system (I gutted a Sun Ultra 40 and replaced the contents with this reusing the power supply) but I don't have a way to eliminate power as a possibility right now. Theoretically this system should be able to deliver 1000W and I only have the motherboard, processor, 64 G memory, the SSD, 2 ethernet cards (one a Mellanox CX3 using fiber) and 6 spinning drives which are basically quiet. Power seems unlikely as the system seems otherwise rock solid with load except when hitting the SSD hard. This (unfortunately) seems to be completely repeatable now simple by copying a couple of repo over 10G ether from a remote nfs machine to the local SSD while the machine is otherwise completely idle. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug.