Re: nvme controller reset failures on recent -CURRENT
- In reply to: Pete Wright : "Re: nvme controller reset failures on recent -CURRENT"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 13 Feb 2024 20:45:57 UTC
Hi all, > Am 13.02.2024 um 20:56 schrieb Pete Wright <pete@nomadlogic.org>: > 1. M.2 nvme really does need proper cooling, much more so than traditional SATA/SAS/SCSI drives. I recently found a tool named "Scrutiny" that presents a nice dashboard of all your disk devices and their SMART data including crucial points like temperature. Pros: Open source Nice web UI Uses smartmontools to gather the data, not reinventing the wheel Agents that can be called from cron jobs for many OSes including FreeBSD Alerting via a variety of communication channels Cons: Central hub best run on Linux plus docker compose No authentication whatsoever, so strictly internal use No grouping or any organisation of systems so does not scale beyond tens of servers I found a couple of problematic HDDs and SSDs right after deploying it which regular SMART tests overlooked. https://github.com/AnalogJ/scrutiny Look for the Hub/Spoke deployment if you are willing to use e.g. a Linux VM to run the tool, then point your FreeBSD systems at that. It probably can be deployed strictly on FreeBSD, too, using the manual installation instructions. HTH, kind regards, Patrick