From nobody Tue Nov 05 01:14:43 2024 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Xj9Nv6wnmz5cp31 for ; Tue, 05 Nov 2024 01:14:55 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Xj9Nv57Fgz4pZl for ; Tue, 5 Nov 2024 01:14:55 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pj1-x102e.google.com with SMTP id 98e67ed59e1d1-2e2bb1efe78so3467089a91.1 for ; Mon, 04 Nov 2024 17:14:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1730769294; x=1731374094; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=8RrC948142/hoeAVeAzOdY9ha8W1/8iwClAgJ4lHIHc=; b=sr31Uxf/0E8rbny/WxhIzpNGcgm8Jfo7wEmXCxaIx8itWNLGXMcy1+yf12kBWahrP4 M4FX61Rlrd/Kopz5LfcGmE5NmXH49ALieqJzmWTHZZtpJRbGMYYxnTvb/dcegxEY3bT+ 7/RqEzHOLtKmGV2XyAVxTDDj4/auIl9GpZa5qV5Goa7T1TNKQdKSBVCjHP+YWUwLmSln kFzZaOrvKiBuoa316xXvM85aZomZ9a1ElDDeSUAdHS/sh3v6siqwj8IMyYfJVIrTPGOy zezMmsSKiZoLZyyxKXiXQWuONjMX5JttXpcHfghjFGWWItY4tUtAJg5wf1YKcgkSZOxk uY6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730769294; x=1731374094; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8RrC948142/hoeAVeAzOdY9ha8W1/8iwClAgJ4lHIHc=; b=GHsRlmt0KTPjHWdd/Xl7sDgaCNDWmkAJETbaqzuVH+rSwrS545vTtTcHi69umULtKj FoVmcb2LPLwQjh1xceo6no2QglMAVETFsElNQDtD0vuUO7AT69IDVFJJiLdlb2BXGJV9 Ji5FI2MYigkgT9RrdiiZsvfxGMTc/aFKOMsW4X1f7hjyXIHu/J/9m7aLngufgq4G107a Qz5my7mE01Ci8/919NFSB7Ye4jHE0yN43xCbFMwB/T/ee6HB/Sz/M1jwGW2//zSZatVg LFuns3oMsAv+fJcoCIP1djfM/tQoQ9OnRNupOuk6QlpE9SIfELgbBG1lUDt26zY9H0JA uKyA== X-Gm-Message-State: AOJu0YyigoYQXSEHOjJvR5gIU2t7U7XuOOXlHH5vSbBOIDOqqMgV8xRv 2YgzE0+Po9gRrjU7Vi6nC9xT+P5sfP2SsUbPJ0fLlnnIVssuYprAEjBocU9d0K5yXvfYknv3cq7 yV2j8/g+VLs4qHTElgmazKTIzCa55WXq+QhQcPRj9BdRNII6rfzpyAA== X-Google-Smtp-Source: AGHT+IGP6G/rMsDjRLFuekcZkhAIh5eu3gALCckX2U25KX1mYPSxBrwyAIiObTNism65h47j0UBf2md/NOAWAiHfKec= X-Received: by 2002:a17:90b:4d0d:b0:2e9:3056:71dd with SMTP id 98e67ed59e1d1-2e94bce054fmr21611681a91.7.1730769294056; Mon, 04 Nov 2024 17:14:54 -0800 (PST) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@FreeBSD.org MIME-Version: 1.0 References: <3293802b-3785-4715-8a6b-0802afb6f908@app.fastmail.com> In-Reply-To: <3293802b-3785-4715-8a6b-0802afb6f908@app.fastmail.com> From: Warner Losh Date: Mon, 4 Nov 2024 18:14:43 -0700 Message-ID: Subject: Re: nvme device errors & zfs To: Dave Cottlehuber Cc: freebsd-fs Content-Type: multipart/alternative; boundary="000000000000f5fce406262020be" X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Queue-Id: 4Xj9Nv57Fgz4pZl X-Spamd-Bar: ---- --000000000000f5fce406262020be Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Nov 4, 2024 at 10:31=E2=80=AFAM Dave Cottlehuber = wrote: > What's the best way to see error counters or states on an nvme > device? > Sadly, I think dmesg | grep nvme and/or trolling through /var/log/messages. Nvme drives don't generally keep good counters of errors... > I have a typical mirrored nvme zpool, that reported enough errors > in a burst last week, that 1 drive dropped off the bus [1]. > > After a reboot, it resilvered, I cleared the errors, and it seems > fine according to repeated scrubs and a few days of use. > > I was unable to see any errors from the nvme drive itself, but > as its (just) in warranty for 2 more weeks I'd like to know > if I should return it. > > I installed ports `sysutils/nvme-cli` and didn't see anything > of note there either: > > $ doas nvme smart-log /dev/nvme1 > 0xc0484e41: opc: 0x2 fuse: 0 cid 0 nsid:0xffffffff cmd2: 0 cmd3: 0 > : cdw10: 0x7f0002 cdw11: 0 cdw12: 0 cdw13: 0 > : cdw14: 0 cdw15: 0 len: 0x200 is_read: 0 > <--- 0 cid: 0 status 0 > Smart Log for NVME device:nvme1 namespace-id:ffffffff > critical_warning : 0 > temperature : 39 C > available_spare : 100% > available_spare_threshold : 10% > percentage_used : 3% > data_units_read : 121681067 > data_units_written : 86619659 > host_read_commands : 695211450 > host_write_commands : 2187823697 > controller_busy_time : 2554 > power_cycles : 48 > power_on_hours : 6342 > unsafe_shutdowns : 38 > media_errors : 0 > num_err_log_entries : 0 > Warning Temperature Time : 0 > Critical Composite Temperature Time : 0 > This suggests that the only 'badness' is 38 unsafe shutdowns (likely power failures), since either there were a bunch all at once (maybe when installing) or you've had power off events every week... There's been no reported media errors (or the drive hasn't done a good job of remembering them, though most NVME is better than most for that). > Temperature Sensor 1 : 39 C > Temperature Sensor 2 : 43 C > Thermal Management T1 Trans Count : 0 > Thermal Management T2 Trans Count : 0 > Thermal Management T1 Total Time : 0 > Thermal Management T2 Total Time : 0 > There's been no time where the drive overheated either. That's good. > [1]: zpool status > status: One or more devices are faulted in response to persistent errors. > Sufficient replicas exist for the pool to continue functioning in= a > degraded state. > action: Replace the faulted device, or use 'zpool clear' to mark the devi= ce > repaired. > scan: scrub repaired 0B in 00:17:59 with 0 errors on Thu Oct 31 16:24:3= 6 > 2024 > config: > > NAME STATE READ WRITE CKSUM > zroot DEGRADED 0 0 0 > mirror-0 DEGRADED 0 0 0 > gpt/zfs0 ONLINE 0 0 0 > gpt/zfs1 FAULTED 0 0 0 too many errors > I'm not sure how to reconcile this in the face of the above. I'd have to see the dmesg / messages logs for any non-boot messages for nvme / nda. For bad drives at work, I typically see something like: /var/log/messages.0.bz2:Nov 3 02:48:54 c001 kernel: nvme2: Resetting controller due to a timeout. /var/log/messages.0.bz2:Nov 3 02:48:54 c001 kernel: nvme2: Waiting for reset to complete /var/log/messages.0.bz2:Nov 3 02:49:05 c001 kernel: nvme2: controller ready did not become 0 within 10500 ms for drives that just 'hang' which would cause ZFS to drop them out. I'd see if there's new firmware or return the drive. I also see: nvme8: READ sqid:3 cid:117 nsid:1 lba:1875786352 len:1024 nvme8: nsid:0x1 rsvd2:0 rsvd3:0 mptr:0 prp1:0x40defd000 prp2:0x1395a2400 nvme8: cdw10: 0x6fce3a70 cdw11:0 cdw12:0x3ff cdw13:0 cdw14:0 cdw15:0 nvme8: UNRECOVERED READ ERROR (02/81) crd:0 m:1 dnr:1 p:1 sqid:3 cid:117 cdw0:0 (nda8:nvme8:0:0:1): READ. NCB: opc=3D2 fuse=3D0 nsid=3D1 prp1=3D0 prp2=3D0 cdw=3D6fce3a70 0 3ff 0 0 0 (nda8:nvme8:0:0:1): CAM status: NVME Status Error (nda8:nvme8:0:0:1): Error 5, Retries exhausted g_vfs_done():nda8p8[READ(offset=3D960402063360, length=3D1048576)]error =3D= 5 when there's a media error. But the brand of NVMe drives we by report this as an error: c029.for002.ix# nvmecontrol logpage -p 2 nvme8 SMART/Health Information Log =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D Critical Warning State: 0x04 Available spare: 0 Temperature: 0 Device reliability: 1 Read only: 0 Volatile memory backup: 0 [[... but this says the drive has lost data ]] Power cycles: 106 Power on hours: 30250 Unsafe shutdowns: 19 Media errors: 3 No. error info log entries: 3 Warning Temp Composite Time: 0 Error Temp Composite Time: 0 Temperature 1 Transition Count: 0 Temperature 2 Transition Count: 0 Total Time For Temperature 1: 0 Total Time For Temperature 2: 0 so there's 3 media errors. I can read the log page to find the LBA too (I'm working on enhancing the errors we report for NVMe to include LBA of first error too, but that's not there yet). But since you don't have any media errors, I'd check history to see if the nvme drives are resetting (either successfully or not). But I don't know how to get that data from just the drive logs. Warner --000000000000f5fce406262020be Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Mon, Nov 4, 2024 at 10:31=E2=80=AF= AM Dave Cottlehuber <dch@freebsd.org<= /a>> wrote:
W= hat's the best way to see error counters or states on an nvme
device?

I have a typical mirrored nvme zpool, that reported enough errors
in a burst last week, that 1 drive dropped off the bus [1].

After a reboot, it resilvered, I cleared the errors, and it seems
fine according to repeated scrubs and a few days of use.

I was unable to see any errors from the nvme drive itself, but
as its (just) in warranty for 2 more weeks I'd like to know
if I should return it.

I installed ports `sysutils/nvme-cli` and didn't see anything
of note there either:

$ doas nvme smart-log /dev/nvme1
0xc0484e41: opc: 0x2 fuse: 0 cid 0 nsid:0xffffffff cmd2: 0 cmd3: 0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : cdw10: 0x7f0002 cdw11: 0 cdw12: 0 cdw1= 3: 0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : cdw14: 0 cdw15: 0 len: 0x200 is_read: = 0
<--- 0 cid: 0 status 0
Smart Log for NVME device:nvme1 namespace-id:ffffffff
critical_warning=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 : 0
temperature=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0: 39 C
available_spare=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0: 100%
available_spare_threshold=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 10%
percentage_used=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0: 3%
data_units_read=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0: 121681067
data_units_written=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 : 86619659
host_read_commands=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 : 695211450
host_write_commands=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0: 2187823697
controller_busy_time=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= : 2554
power_cycles=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 : 48
power_on_hours=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 : 6342
unsafe_shutdowns=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 : 38
media_errors=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 : 0
num_err_log_entries=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0: 0
Warning Temperature Time=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 0
Critical Composite Temperature Time : 0

This suggests that the only 'badness' is 38 unsafe shutdowns (like= ly power failures), since

Temperature Sensor 1=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= : 39 C
Temperature Sensor 2=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= : 43 C
Thermal Management T1 Trans Count=C2=A0 =C2=A0: 0
Thermal Management T2 Trans Count=C2=A0 =C2=A0: 0
Thermal Management T1 Total Time=C2=A0 =C2=A0 : 0
Thermal Management T2 Total Time=C2=A0 =C2=A0 : 0

=
There's been no time where the drive overheated either. That= 's good.
=C2=A0
[1]: zpool status
status: One or more devices are faulted in response to persistent errors. =C2=A0 =C2=A0 =C2=A0 =C2=A0 Sufficient replicas exist for the pool to conti= nue functioning in a
=C2=A0 =C2=A0 =C2=A0 =C2=A0 degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark th= e device
=C2=A0 =C2=A0 =C2=A0 =C2=A0 repaired.
=C2=A0 scan: scrub repaired 0B in 00:17:59 with 0 errors on Thu Oct 31 16:2= 4:36 2024
config:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 NAME=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 STATE=C2= =A0 =C2=A0 =C2=A0READ WRITE CKSUM
=C2=A0 =C2=A0 =C2=A0 =C2=A0 zroot=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0DEGRADED= =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A00
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mirror-0=C2=A0 =C2=A0 DEGRADED=C2=A0 =C2= =A0 =C2=A00=C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A00
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 gpt/zfs0=C2=A0 ONLINE=C2=A0 =C2= =A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A00
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 gpt/zfs1=C2=A0 FAULTED=C2=A0 =C2= =A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A00=C2=A0 too many error= s

I'm not sure how to reconcile thi= s in the face of the above. I'd have to see the
dmesg / messa= ges logs for any non-boot messages for nvme / nda.=C2=A0 For bad drives
at work, I typically see something like:

/var/lo= g/messages.0.bz2:Nov =C2=A03 02:48:54 c001 kernel: nvme2: Resetting control= ler due to a timeout.
/var/log/messages.0.bz2:Nov =C2=A03 02:48:54 c001 = kernel: nvme2: Waiting for reset to complete
/var/log/messages.0.bz2:Nov= =C2=A03 02:49:05 c001 kernel: nvme2: controller ready did not become 0 wit= hin 10500 ms

for drives that just 'hang' whi= ch would cause ZFS to drop them out. I'd see if there's new firmwar= e or return
the drive.

I also see:
=
nvme8: READ sqid:3 cid:117 nsid:1 lba:1875786352 len:1024
nvme8: ns= id:0x1 rsvd2:0 rsvd3:0 mptr:0 prp1:0x40defd000 prp2:0x1395a2400
nvme8: c= dw10: 0x6fce3a70 cdw11:0 cdw12:0x3ff cdw13:0 cdw14:0 cdw15:0
nvme8: UNRE= COVERED READ ERROR (02/81) crd:0 m:1 dnr:1 p:1 sqid:3 cid:117 cdw0:0
(nd= a8:nvme8:0:0:1): READ. NCB: opc=3D2 fuse=3D0 nsid=3D1 prp1=3D0 prp2=3D0 cdw= =3D6fce3a70 0 3ff 0 0 0
(nda8:nvme8:0:0:1): CAM status: NVME Status Erro= r
(nda8:nvme8:0:0:1): Error 5, Retries exhausted
g_vfs_done():nda8p8[= READ(offset=3D960402063360, length=3D1048576)]error =3D 5
when there's a media error. But the brand of NVMe drives we= by report this as an error:

c029.for002.ix# nvmec= ontrol logpage -p 2 nvme8
SMART/Health Information Log
=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3DCritical Warning State: =C2=A0 =C2=A0 =C2=A0 =C2=A0 0x04
=C2=A0Availabl= e spare: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0
=C2=A0Temper= ature: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0
= =C2=A0Device reliability: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01
=C2= =A0Read only: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 0
=C2=A0Volatile memory backup: =C2=A0 =C2=A0 =C2=A0 =C2=A00<= /div>
[[... but this says the drive has lost data ]]
Power cy= cles: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 106Power on hours: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 30= 250
Unsafe shutdowns: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1= 9
Media errors: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 3
No. error info log entries: =C2=A0 =C2=A0 3
Warning Temp Com= posite Time: =C2=A0 =C2=A00
Error Temp Composite Time: =C2=A0 =C2=A0 =C2= =A00
Temperature 1 Transition Count: 0
Temperature 2 Transition Count= : 0
Total Time For Temperature 1: =C2=A0 0
Total Time For Temperature= 2: =C2=A0 0

so there's 3 media errors. I = can read the log page to find the LBA too (I'm working on
enh= ancing the errors we report for NVMe to include LBA of first error too, but= that's not
there yet).

But since yo= u don't have any media errors, I'd check history to see if the nvme= drives
are resetting (either successfully or not). But I don'= ;t know how to get that data from just
the drive logs.
=
Warner
--000000000000f5fce406262020be--