Re: nvme device errors & zfs

From: Peter 'PMc' Much <pmc_at_citylink.dinoex.sub.org>
Date: Tue, 05 Nov 2024 13:16:15 UTC
On 2024-11-05, Dave Cottlehuber <dch@FreeBSD.org> wrote:

> I would hope temperature throttling would not be quite so brutal, to
> remove itself from the bus entirely, but its a reasonable explanation.

It might be a reasonable choice to protect the data first.
Also people will then notice that there is a problem and not complain
about bad performance.

If a more elegant reaction is desired, that might be implemented
by obtaining the current temperature and dynamically issuing some
"nvmecontrol power -p x -w y ..." as appropriate. (From what I hear,
these options behave rather device specific, so some testing may
be required)

https://gitr.daemon.contact/tools/tree/heatctl.rb#n218

I'm not yet doing temperature-driven nvme performance steering, but
practically everything else: fan engage, scrub pausing, cpu consumtion
(via rctl) etc.

cheerio,
PMc