Re: nvme timeout issues with hardware and bhyve vm's

From: Maxim Sobolev <sobomax_at_freebsd.org>
Date: Fri, 08 Dec 2023 01:02:57 UTC
How quickly it heats up depends on lots of factors. Usually those devices
burn some 3-7 watts per stick at 100% load, so maybe this would give you
some idea. At least some of them support several toggleable performance
modes, which use throttling internally to limit power consumption to a
certain level (man nvmecontril). It helped me recently to make a system
stable, which otherwise would hang with timeout after reaching 70-75C until
I got the chance to take it apart and attach a heatsinks to the nvmes. Once
the temperature dropped to <= 50C the drives become 100% stable.

-Max

On Thu, Dec 7, 2023, 4:07 PM Bakul Shah <bakul@iitbombay.org> wrote:

> On Dec 7, 2023, at 3:59 PM, Warner Losh <imp@bsdimp.com> wrote:
> >
> >
> >  *Overheating caused hang of NVMe controller or PCI bridge on SSD, or
> >
> > Yes. Most drive's firmware when it overheats resets. There might be
> something
> > that the pci code can do when this happens to retrain the link,
> reprogram the
> > config registers, etc.
>
> How quickly can the device heat up? Can it be queried frequently
> enough act before it overheats by throttling io?
>
>
>
>
>