Re: nvme timeout issues with hardware and bhyve vm's
Date: Thu, 07 Dec 2023 23:09:29 UTC
On Thu, 7 Dec 2023 14:38:37 -0800 Pete Wright <pete@nomadlogic.org> wrote: > > > On 10/13/23 7:34 PM, Warner Losh wrote: > > > > > > > the messages i posted in the start of the thread are from the VM itself > > (13.2-RELEASE). The zpool on the hypervisor (13.2-RELEASE) showed no > > such issues. > > > > Based on your comment about the improvements in 14 I'll focus my > > efforts > > on my workstation, it seemed to happen regularly so hopefully i can > > find > > a repo case. > > > > > > Let me now if you see similar messages in stable/14. I think I've fixed > > all the > > issues with timeouts, though you shouldn't ever seem them in a vm setup > > unless something else weird is going on. > > > > > Hi Warner, just resurfacing this thread because I've had a few lockups > on my workstation running 14.0-STABLE. I was able to capture a photo of > the hang and this seems to be the most important line: > > nvme0: Resetting controller due to a timeout and possible hot unplug. > > When I scan the device after reboot I don't see any errors, but if there > is a particular thing I should check via nvmecontrol please let me know. > Also, since it mentions possible hot unplug I wonder if this is > hardware/firmware related to my system? > > Anyway, haven't found a repro case yet but it has locked up a few times > the past two weeks. > > -pete > > > -- > Pete Wright > pete@nomadlogic.org If I myself encounter this kind of problem ON BARE METAL HARDWARE, I would usually suspect *Overheating caused hang of NVMe controller or PCI bridge on SSD, or *Unstable physical connection (bad contact) first. -- Tomoaki AOKI <junchoon@dec.sakura.ne.jp>