From nobody Tue Feb 13 20:45:57 2024 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TZCz80RvRz59kjV for ; Tue, 13 Feb 2024 20:46:12 +0000 (UTC) (envelope-from pmh@hausen.com) Received: from mail2.pluspunkthosting.de (mail2.pluspunkthosting.de [217.29.33.228]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4TZCz66pf3z4SqM for ; Tue, 13 Feb 2024 20:46:10 +0000 (UTC) (envelope-from pmh@hausen.com) Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of pmh@hausen.com designates 217.29.33.228 as permitted sender) smtp.mailfrom=pmh@hausen.com Received: from smtpclient.apple (87.138.185.145) by mail2.pluspunkthosting.de (Axigen) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) ESMTPSA id 1372A4; Tue, 13 Feb 2024 21:46:09 +0100 From: "Patrick M. Hausen" Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.300.61.1.2\)) Subject: Re: nvme controller reset failures on recent -CURRENT Date: Tue, 13 Feb 2024 21:45:57 +0100 References: <65cddfff-84ab-45e4-bcc5-84fc8f5784cb@nomadlogic.org> To: FreeBSD current In-Reply-To: <65cddfff-84ab-45e4-bcc5-84fc8f5784cb@nomadlogic.org> Message-Id: X-Mailer: Apple Mail (2.3774.300.61.1.2) X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.80 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.997]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+a:mail2.pluspunkthosting.de]; MIME_GOOD(-0.10)[text/plain]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; RCVD_COUNT_ONE(0.00)[1]; ASN(0.00)[asn:16188, ipnet:217.29.32.0/20, country:DE]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_ALL(0.00)[]; R_DKIM_NA(0.00)[]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DMARC_NA(0.00)[hausen.com]; TO_DN_ALL(0.00)[]; ARC_NA(0.00)[] X-Rspamd-Queue-Id: 4TZCz66pf3z4SqM Hi all, > Am 13.02.2024 um 20:56 schrieb Pete Wright : > 1. M.2 nvme really does need proper cooling, much more so than = traditional SATA/SAS/SCSI drives. I recently found a tool named "Scrutiny" that presents a nice dashboard of all your disk devices and their SMART data including crucial points like temperature. Pros: Open source Nice web UI Uses smartmontools to gather the data, not reinventing the wheel Agents that can be called from cron jobs for many OSes including FreeBSD Alerting via a variety of communication channels Cons: Central hub best run on Linux plus docker compose No authentication whatsoever, so strictly internal use No grouping or any organisation of systems so does not scale beyond tens = of servers I found a couple of problematic HDDs and SSDs right after deploying it which regular SMART tests overlooked. https://github.com/AnalogJ/scrutiny Look for the Hub/Spoke deployment if you are willing to use e.g. a Linux VM to run the tool, then point your FreeBSD systems at that. It probably can be deployed strictly on FreeBSD, too, using the manual installation instructions. HTH, kind regards, Patrick=