Re: nvme timeout issues with hardware and bhyve vm's
- In reply to: Maxim Sobolev : "Re: nvme timeout issues with hardware and bhyve vm's"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 08 Dec 2023 01:33:02 UTC
Thanks. It may be worth checking the temp periodically and warning the user in case it is too high (70ºC+ or something). Even for devices that allow internal throttling, a user might wish to know whether the device neads a (better) heatsink. > On Dec 7, 2023, at 5:02 PM, Maxim Sobolev <sobomax@freebsd.org> wrote: > > How quickly it heats up depends on lots of factors. Usually those devices burn some 3-7 watts per stick at 100% load, so maybe this would give you some idea. At least some of them support several toggleable performance modes, which use throttling internally to limit power consumption to a certain level (man nvmecontril). It helped me recently to make a system stable, which otherwise would hang with timeout after reaching 70-75C until I got the chance to take it apart and attach a heatsinks to the nvmes. Once the temperature dropped to <= 50C the drives become 100% stable. > > -Max > > On Thu, Dec 7, 2023, 4:07 PM Bakul Shah <bakul@iitbombay.org <mailto:bakul@iitbombay.org>> wrote: >> On Dec 7, 2023, at 3:59 PM, Warner Losh <imp@bsdimp.com <mailto:imp@bsdimp.com>> wrote: >> > >> > >> > *Overheating caused hang of NVMe controller or PCI bridge on SSD, or >> > >> > Yes. Most drive's firmware when it overheats resets. There might be something >> > that the pci code can do when this happens to retrain the link, reprogram the >> > config registers, etc. >> >> How quickly can the device heat up? Can it be queried frequently >> enough act before it overheats by throttling io? >> >> >> >>