[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug
Date: Mon, 26 Jun 2023 17:25:42 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262969 --- Comment #14 from Timothy Guo <firemeteor@users.sourceforge.net> --- (In reply to crb from comment #11) I would like to share my follow up experience of this issue. In short, the problem magically goes away after I wipe the disk and recreated the pool from backup. The same system (hardware and SW) has been working without issue for about half a year now. Unfortunately, I couldn't locate a conclusive offender during the entire procedure. One thing I would like to note also is the 3.3V rail of the PSU. When I was still suffering from the issue, I also discovered 3.3V rail under-voltage, probably thanks to the hint from @crb's bug. I first read the out of range Voltage value from BIOS, and then confirmed the issue through direct measurement with a Voltage meter directly from the PSU pin-out. So it's true that the issue could really be power related. But it's unfortunate that I can't tell who is the offender, is the NVME drawing too much power due to firmware bug? Or is a failing PSU leading to NVME failure? I contacted my PSU vendor and got the feedback that the wire connector may be aged and increased the resistance. Maybe my Voltage measuring attempt fixed the wiring connection, maybe the wipe-out and rebuild worked-around a potential firmware bug. The issue just suddenly goes away, as it suddenly comes (Note: I couldn't remember any re-assembling of the hardware build when it suddenly comes, though.) The only part that I'm sure is the power failure is real and highly related. A stronger PSU might have simply avoided the problem altogether? -- You are receiving this mail because: You are the assignee for the bug.