nvme detached
Dan Langille
dan at langille.org
Wed Aug 4 17:16:05 UTC 2021
Yesterday I had an NVME stick detach. This degraded a zpool but zpools status indicated the device was still online. Yet it was not visible in /dev/.
More details are at https://gist.github.com/dlangille/bc8af0f5a098d3a106fa5fbf40a88d42
I first noticed the issue with multiple ssh sessions freezing up.
Then Nagios started alerting. A reboot cleared this up. scrubs did not find any errors.
The /var/log/messages entries below.
Thank you.
Aug 3 15:06:02 knew kernel: nvme0: Resetting controller due to a timeout.
Aug 3 15:06:02 knew kernel: nvme0: resetting controller
Aug 3 15:06:32 knew kernel: nvme0: controller ready did not become 0 within 30500 ms
Aug 3 15:06:32 knew kernel: nvme0: failing queued i/o
Aug 3 15:06:32 knew kernel: nvme0: IDENTIFY (06) sqid:0 cid:0 nsid:0 cdw10:00000001 cdw11:00000000
Aug 3 15:06:32 knew kernel: nvme0: ABORTED - BY REQUEST (00/07) sqid:0 cid:0 cdw0:0
Aug 3 15:06:32 knew kernel: nvme0: failing outstanding i/o
Aug 3 15:06:32 knew kernel: nvme0: READ sqid:2 cid:123 nsid:1 lba:250153507 len:5
Aug 3 15:06:32 knew kernel: nvme0: ABORTED - BY REQUEST (00/07) sqid:2 cid:123 cdw0:0
Aug 3 15:06:32 knew kernel: nvme0: failing outstanding i/o
Aug 3 15:06:32 knew kernel: nvme0: WRITE sqid:3 cid:118 nsid:1 lba:454009346 len:1
Aug 3 15:06:32 knew kernel: nvme0: ABORTED - BY REQUEST (00/07) sqid:3 cid:118 cdw0:0
Aug 3 15:06:32 knew kernel: nvme0: failing outstanding i/o
Aug 3 15:06:32 knew kernel: nvme0: WRITE sqid:4 cid:122 nsid:1 lba:454009345 len:1
Aug 3 15:06:32 knew kernel: nvme0: ABORTED - BY REQUEST (00/07) sqid:4 cid:122 cdw0:0
Aug 3 15:06:32 knew kernel: nvd0: detached
--
Dan Langille
dan at langille.org
More information about the freebsd-questions
mailing list