From nobody Fri Jul 21 03:26:07 2023
X-Original-To: scsi@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R6Zhs0hNqz4nwLH
	for <scsi@mlmmj.nyi.freebsd.org>; Fri, 21 Jul 2023 03:26:21 +0000 (UTC)
	(envelope-from wlosh@bsdimp.com)
Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4R6Zhr5ZWgz3DBV
	for <scsi@freebsd.org>; Fri, 21 Jul 2023 03:26:20 +0000 (UTC)
	(envelope-from wlosh@bsdimp.com)
Authentication-Results: mx1.freebsd.org;
	none
Received: by mail-ej1-x62f.google.com with SMTP id a640c23a62f3a-9922d6f003cso237769866b.0
        for <scsi@freebsd.org>; Thu, 20 Jul 2023 20:26:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bsdimp-com.20221208.gappssmtp.com; s=20221208; t=1689909979; x=1690514779;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=lBPJQQLUPIAujvywawulGetLsBEuZ+/GDH3xHRepNYU=;
        b=28sSZa5jwc1eLWJxDuAgu1vGy1svseVYTKYxA/qnSdTdTzvqUP6sTFgp0ar4lgJPEF
         BZfh1ykxhtBQIIirtkz+D2DHS5/6/s9AZISGRoTVtjXdIenDL5y7HlRk/rSANKU2nfnP
         ozjaI7lEbymfeM0uP5tUX2AC695xCs/xTdOXT0/+bF6S5uZTzLRGz94Vl4qmpuXZzQqq
         DDPBkh/kx2YPafAS+W3C3aAntrPZX5I4UAO8p6cJqjkSAD1UWwRHitqTz3WUObdveqjc
         bO+SKEBeKApDTK8+hqUFr/IibzQD8e7YKDw344bJKIb1SSnYQ8zLRv6iC8jO/yyUi4dD
         D6pA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1689909979; x=1690514779;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=lBPJQQLUPIAujvywawulGetLsBEuZ+/GDH3xHRepNYU=;
        b=WR2UB7V4RZmJyFh+ErCmArImzpuJ9twRjWfQ8zHRf9SNC8X2rll1IuCo+A6figTEMt
         YZHPT5YVqImF+OIsy6XORvi46FiSGWlJfQwN2ybOabLIifZsuI7JYcB3JJb5229dxME3
         NIunmGyp35WRVS/hyQZUw0C8EtvNxs4a/MWxxs0199ca7KRl+/48IWkug5Yv3JTZReg5
         yDXDs2bKI99AZJPW8xk6H8ceXYO6ifWjb4B7D7uz8DnO57A5MVmqkQqDfSyvinQ7hqp7
         gueMNVHw34iHEcmCGibtDDgsVUyHUNqGLTt7QMKocaGr+VrZHMn+AWiW+zV5UCaS34tn
         kuiw==
X-Gm-Message-State: ABy/qLbKHRjgyKBhhi8sHwJHfDu3WIgFj7ckings//Ab9UgPDpWeV+gj
	j55xJbQapdLwUyhhceV5O8a0j6YcHt8ozPOZYh7EsQ==
X-Google-Smtp-Source: APBJJlEr4sZPZKt1pSvRmqKfaAP9GYJSKGJ8QkUVQZLj5hdhgIZselRTINa8olnaY5hG5PtRxVQS8bHbanBl3K9H9PY=
X-Received: by 2002:a17:906:8451:b0:994:1fd2:cf96 with SMTP id
 e17-20020a170906845100b009941fd2cf96mr588912ejy.0.1689909978614; Thu, 20 Jul
 2023 20:26:18 -0700 (PDT)
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-scsi
List-Help: <mailto:scsi+help@freebsd.org>
List-Post: <mailto:scsi@freebsd.org>
List-Subscribe: <mailto:scsi+subscribe@freebsd.org>
List-Unsubscribe: <mailto:scsi+unsubscribe@freebsd.org>
Sender: owner-freebsd-scsi@freebsd.org
X-BeenThere: freebsd-scsi@freebsd.org
MIME-Version: 1.0
References: <CANCZdfokEoRtNp0en=9pjLQSQ+jtmfwH3OOwz1z09VcwWpE+xg@mail.gmail.com>
 <CAOtMX2g4+SDWg9WKbwZcqh4GpRan593O6qtNf7feoVejVK0YyQ@mail.gmail.com>
 <CANCZdfq5qti5uzWLkZaQEpyd5Q255sQeaR_kC_OQinmE9Qcqaw@mail.gmail.com>
 <CAOtMX2iwnpHL6b2-1D4N4Bi4eKoLnGK4=+gUowXGS_gtyDOkig@mail.gmail.com>
 <CANCZdfr-y8HYBb6GCFqZ7LAarxUAGb36Y6j+bo+WiDwUT5uR7A@mail.gmail.com>
 <CANCZdfptEG=+xa3m31Ngre26ZQxZ_Fqsfjmh+tVHgP2XpqhZ7g@mail.gmail.com> <7df852e4-5df5-de51-70a6-08bcbcb2f757@interlog.com>
In-Reply-To: <7df852e4-5df5-de51-70a6-08bcbcb2f757@interlog.com>
From: Warner Losh <imp@bsdimp.com>
Date: Thu, 20 Jul 2023 21:26:07 -0600
Message-ID: <CANCZdfoed3meq_z90aC=BP7RE_Gk+Oq6K1sptO4E0s6jT_ge6Q@mail.gmail.com>
Subject: Re: ASC/ASCQ Review
To: dgilbert@interlog.com
Cc: Alan Somers <asomers@freebsd.org>, scsi@freebsd.org
Content-Type: multipart/alternative; boundary="000000000000fa60eb0600f6d3dc"
X-Rspamd-Queue-Id: 4R6Zhr5ZWgz3DBV
X-Spamd-Bar: ----
X-Spamd-Result: default: False [-4.00 / 15.00];
	REPLY(-4.00)[];
	ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]
X-Rspamd-Pre-Result: action=no action;
	module=replies;
	Message is reply to one we originated

--000000000000fa60eb0600f6d3dc
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thu, Jul 20, 2023, 9:18 PM Douglas Gilbert <dgilbert@interlog.com> wrote=
:

> On 2023-07-19 11:41, Warner Losh wrote:
> > btw, it also occurs to me that if I do add a 'secondary' table, then yo=
u
> could
> > use it to generate a unique errno and experiment
> > with that w/o affecting the main code until that stuff was mature.
> >
> > I'm not sure I'll do that now, since I've found maybe 10 asc/ascq pairs
> that I'd
> > like to tag as 'if trying harder, retry, otherwise fail' since re-retry
> needs
> > have changed a lot since cam was written in the late 90s and at least
> some of
> > the asc/ascq pairs I'm looking at haven't changed since the initial
> import, but
> > that's based on a tiny sampling of the data I have and is preliminary a=
t
> best. I
> > may just change it to reflect modern usage.
>
> Hi,
> If you are looking for up-to-date [20230325] asc/ascq tables in C you cou=
ld
> borrow mine at https://github.com/doug-gilbert/sg3_utils in
> lib/sg_lib_data.c
> starting at line 745 .
> In testing/sg_chk_asc.c is a small test program for checking that the
> table in
> sg_lib_data.c agrees with the file that T10 supplies:
>       https://www.t10.org/lists/asc-num.txt


Thanks for the pointer. I'd already updated CAM's tables for that...

what I'm doing now is to make sure CAM's reactions to the asc/ascq is good
for the modern drives... it's a good idea though to create a program for
our table to match...

Warner


> Doug Gilbert
>
> > On Fri, Jul 14, 2023 at 5:34=E2=80=AFPM Warner Losh <imp@bsdimp.com
> > <mailto:imp@bsdimp.com>> wrote:
> >
> >
> >
> >     On Fri, Jul 14, 2023 at 12:31=E2=80=AFPM Alan Somers <asomers@freeb=
sd.org
> >     <mailto:asomers@freebsd.org>> wrote:
> >
> >         On Fri, Jul 14, 2023 at 11:05=E2=80=AFAM Warner Losh <imp@bsdim=
p.com
> >         <mailto:imp@bsdimp.com>> wrote:
> >          >
> >          >
> >          >
> >          > On Fri, Jul 14, 2023, 11:12 AM Alan Somers <
> asomers@freebsd.org
> >         <mailto:asomers@freebsd.org>> wrote:
> >          >>
> >          >> On Thu, Jul 13, 2023 at 12:14=E2=80=AFPM Warner Losh <imp@b=
sdimp.com
> >         <mailto:imp@bsdimp.com>> wrote:
> >          >> >
> >          >> > Greetings,
> >          >> >
> >          >> > i've been looking closely at failed drives for $WORK
> lately. I've
> >         noticed that a lot of errors that kinda sound like fatal errors
> have
> >         SS_RDEF set on them.
> >          >> >
> >          >> > What's the process for evaluating whether those error
> codes are
> >         worth retrying. There are several errors that we seem to be
> seeing
> >         (preliminary read of the data) before the drive gives up the
> ghost
> >         altogether. For those cases, I'd like to post more specific
> lists.
> >         Should I do that here?
> >          >> >
> >          >> > Independent of that, I may want to have a more aggressive
> 'fail
> >         fast' policy than is appropriate for my work load (we have a lo=
t
> of data
> >         that's a copy of a copy of a copy, so if we lose it, we don't
> care:
> >         we'll just delete any files we can't read and get on with life,
> though I
> >         know others will have a more conservative attitude towards data
> that
> >         might be precious and unique). I can set the number of retries
> lower, I
> >         can do some other hacks for disks that tell the disk to fail
> faster, but
> >         I think part of the solution is going to have to be failing for
> some
> >         sense-code/ASC/ASCQ tuples that we don't want to fail in
> upstream or the
> >         general case. I was thinking of identifying those and creating =
a
> 'global
> >         quirk table' that gets applied after the drive-specific quirk
> table that
> >         would let $WORK override the defaults, while letting others kee=
p
> the
> >         current behavior. IMHO, it would be better to have these
> separate rather
> >         than in the global data for tracking upstream...
> >          >> >
> >          >> > Is that clear, or should I give concrete examples?
> >          >> >
> >          >> > Comments?
> >          >> >
> >          >> > Warner
> >          >>
> >          >> Basically, you want to change the retry counts for certain
> ASC/ASCQ
> >          >> codes only, on a site-by-site basis?  That sounds
> reasonable.  Would
> >          >> it be configurable at runtime or only at build time?
> >          >
> >          >
> >          > I'd like to change the default actions. But maybe we just do
> that for
> >         everyone and assume modern drives...
> >          >
> >          >> Also, I've been thinking lately that it would be real nice
> if READ
> >          >> UNRECOVERABLE could be translated to EINTEGRITY instead of
> EIO.  That
> >          >> would let consumers know that retries are pointless, but
> that the data
> >          >> is probably healable.
> >          >
> >          >
> >          > Unlikely, unless you've tuned things to not try for long at
> recovery...
> >          >
> >          > But regardless... do you have a concrete example of a use
> case?
> >         There's a number of places that map any error to EIO. And I'd
> like a use
> >         case before we expand the errors the lower layers return...
> >          >
> >          > Warner
> >
> >         My first use-case is a user-space FUSE file system.  It only ha=
s
> >         access to errnos, not ASC/ASCQ codes.  If we do as I suggest,
> then it
> >         could heal a READ UNRECOVERABLE by rewriting the sector, wherea=
s
> other
> >         EIO errors aren't likely to be healed that way.
> >
> >
> >     Yea... but READ UNRECOVERABLE is kinda hit or miss...
> >
> >         My second use-case is ZFS.  zfsd treats checksum errors
> differently
> >         from I/O errors.  A checksum error normally means that a read
> returned
> >         wrong data.  But I think that READ UNRECOVERABLE should also
> count.
> >         After all, that means that the disk's media returned wrong data
> which
> >         was detected by the disk's own EDC/ECC.  I've noticed that zfsd
> seems
> >         to fault disks too eagerly when their only problem is READ
> >         UNRECOVERABLE errors.  Mapping it to EINTEGRITY, or even a new
> error
> >         code, would let zfsd be tuned better.
> >
> >
> >     EINTEGRITY would then mean two different things. UFS returns in whe=
n
> >     checksums fail for critical filesystem errors. I'm not saying no,
> per se,
> >     just that it conflates two different errors.
> >
> >     I think both of these use cases would be better served by CAM's
> publishing
> >     of the errors to devctl today. Here's some example data from a
> system I'm
> >     looking at:
> >
> >     system=3DCAM subsystem=3Dperiph type=3Dtimeout device=3Dda36 serial=
=3D"12345"
> >     cam_status=3D"0x44b" timeout=3D30000 CDB=3D"28 00 4e b7 cb a3 00 04=
 cc 00 "
> >       timestamp=3D1634739729.312068
> >     system=3DCAM subsystem=3Dperiph type=3Dtimeout device=3Dda36 serial=
=3D"12345"
> >     cam_status=3D"0x44b" timeout=3D30000 CDB=3D"28 00 20 6b d5 56 00 00=
 c0 00 "
> >       timestamp=3D1634739729.585541
> >     system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda36 serial=
=3D"12345"
> >     cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 03 11 00" CDB=
=3D"28 00
> ad 1a
> >     35 96 00 00 56 00 " timestamp=3D1641979267.469064
> >     system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda36 serial=
=3D"12345"
> >     cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 03 11 00" CDB=
=3D"28 00
> ad 1a
> >     35 96 00 01 5e 00 "  timestamp=3D1642252539.693699
> >     system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda39 serial=
=3D"12346"
> >     cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 04 02 00" CDB=
=3D"2a 00
> 01 2b
> >     c8 f6 00 07 81 00 "  timestamp=3D1669603144.090835
> >
> >     Here we get the sense key, the asc and the ascq in the scsi_sense
> data (I'm
> >     currently looking at expanding this to the entire sense buffer,
> since it
> >     includes how hard the drive tried to read the data on media and
> hardware
> >     errors).  It doesn't include nvme data, but does include ata data
> (I'll have
> >     to add that data, now that I've noticed it is missing).  With the
> sense data
> >     and the CDB you know what kind of error you got, plus what block
> didn't
> >     read/write correctly. With the extended sense data, you can find ou=
t
> even
> >     more details that are sense-key dependent...
> >
> >     So I'm unsure that trying to shoehorn our imperfect knowledge of
> what's
> >     retriable, fixable, should be written with zeros into the kernel an=
d
> >     converting that to a separate errno would give good results, and
> tapping
> >     into this stream daemons that want to make more nuanced calls about
> disks
> >     might be the better way to go. One of the things I'm planning for
> $WORK is
> >     to enable the retry time limit of one of the mode pages so that we
> fail
> >     faster and can just delete the file with the 'bad' block that we'd
> get
> >     eventually if we allowed the full, default error processing to run,
> but that
> >     'slow path' processing kills performance for all other users of the
> >     drive...  I'm unsure how well that will work out (and I know I'm
> lucky that
> >     I can always recover any data for my application since it's just a
> cache).
> >
> >     I'd be interested to hear what others have to say here thought,
> since my
> >     focus on this data is through the lense of my rather specialized
> application...
> >
> >     Warner
> >
> >     P.S. That was generated with this rule if you wanted to play with
> it...
> >     You'd have to translate absolute disk blocks to a partition and an
> offset
> >     into the filesystem, then give the filesystem a chance to tell you
> what of
> >     its data/metadata that block is used for...
> >
> >     # Disk errors
> >     notify 10 {
> >              match "system"          "CAM";
> >              match "subsystem"       "periph";
> >              match "device"          "[an]?da[0-9]+";
> >              action "logger -t diskerr -p daemon.info <
> http://daemon.info> $_
> >     timestamp=3D$timestamp";
> >     };
> >
>
>

--000000000000fa60eb0600f6d3dc
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div><br><br><div class=3D"gmail_quote"><div dir=3D"ltr" =
class=3D"gmail_attr">On Thu, Jul 20, 2023, 9:18 PM Douglas Gilbert &lt;<a h=
ref=3D"mailto:dgilbert@interlog.com" target=3D"_blank" rel=3D"noreferrer">d=
gilbert@interlog.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
>On 2023-07-19 11:41, Warner Losh wrote:<br>
&gt; btw, it also occurs to me that if I do add a &#39;secondary&#39; table=
, then you could <br>
&gt; use it to generate a unique errno and experiment<br>
&gt; with that w/o affecting the main code until that stuff was mature.<br>
&gt; <br>
&gt; I&#39;m not sure I&#39;ll do that now, since I&#39;ve found maybe 10 a=
sc/ascq pairs that I&#39;d <br>
&gt; like to tag as &#39;if trying harder, retry, otherwise fail&#39; since=
 re-retry needs <br>
&gt; have changed a lot since cam was written in the late 90s and at least =
some of <br>
&gt; the asc/ascq pairs I&#39;m looking at haven&#39;t changed since the in=
itial import, but <br>
&gt; that&#39;s based on a tiny sampling of the data I have and is prelimin=
ary at best. I <br>
&gt; may just change it to reflect modern usage.<br>
<br>
Hi,<br>
If you are looking for up-to-date [20230325] asc/ascq tables in C you could=
<br>
borrow mine at <a href=3D"https://github.com/doug-gilbert/sg3_utils" rel=3D=
"noreferrer noreferrer noreferrer" target=3D"_blank">https://github.com/dou=
g-gilbert/sg3_utils</a> in lib/sg_lib_data.c<br>
starting at line 745 .<br>
In testing/sg_chk_asc.c is a small test program for checking that the table=
 in<br>
sg_lib_data.c agrees with the file that T10 supplies:<br>
=C2=A0 =C2=A0 =C2=A0 <a href=3D"https://www.t10.org/lists/asc-num.txt" rel=
=3D"noreferrer noreferrer noreferrer" target=3D"_blank">https://www.t10.org=
/lists/asc-num.txt</a></blockquote></div></div><div dir=3D"auto"><br></div>=
<div dir=3D"auto">Thanks for the pointer. I&#39;d already updated CAM&#39;s=
 tables for that...</div><div dir=3D"auto"><br></div><div dir=3D"auto">what=
 I&#39;m doing now is to make sure CAM&#39;s reactions to the asc/ascq is g=
ood for the modern drives... it&#39;s a good idea though to create a progra=
m for our table to match...</div><div dir=3D"auto"><br></div><div dir=3D"au=
to">Warner</div><div dir=3D"auto"><br></div><div dir=3D"auto"><div class=3D=
"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;=
border-left:1px #ccc solid;padding-left:1ex"><br>
Doug Gilbert<br>
<br>
&gt; On Fri, Jul 14, 2023 at 5:34=E2=80=AFPM Warner Losh &lt;<a href=3D"mai=
lto:imp@bsdimp.com" rel=3D"noreferrer noreferrer" target=3D"_blank">imp@bsd=
imp.com</a> <br>
&gt; &lt;mailto:<a href=3D"mailto:imp@bsdimp.com" rel=3D"noreferrer norefer=
rer" target=3D"_blank">imp@bsdimp.com</a>&gt;&gt; wrote:<br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0On Fri, Jul 14, 2023 at 12:31=E2=80=AFPM Alan Somer=
s &lt;<a href=3D"mailto:asomers@freebsd.org" rel=3D"noreferrer noreferrer" =
target=3D"_blank">asomers@freebsd.org</a><br>
&gt;=C2=A0 =C2=A0 =C2=A0&lt;mailto:<a href=3D"mailto:asomers@freebsd.org" r=
el=3D"noreferrer noreferrer" target=3D"_blank">asomers@freebsd.org</a>&gt;&=
gt; wrote:<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0On Fri, Jul 14, 2023 at 11:05=E2=80=
=AFAM Warner Losh &lt;<a href=3D"mailto:imp@bsdimp.com" rel=3D"noreferrer n=
oreferrer" target=3D"_blank">imp@bsdimp.com</a><br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&lt;mailto:<a href=3D"mailto:imp@bsdi=
mp.com" rel=3D"noreferrer noreferrer" target=3D"_blank">imp@bsdimp.com</a>&=
gt;&gt; wrote:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt; On Fri, Jul 14, 2023, 11:12 AM =
Alan Somers &lt;<a href=3D"mailto:asomers@freebsd.org" rel=3D"noreferrer no=
referrer" target=3D"_blank">asomers@freebsd.org</a><br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&lt;mailto:<a href=3D"mailto:asomers@=
freebsd.org" rel=3D"noreferrer noreferrer" target=3D"_blank">asomers@freebs=
d.org</a>&gt;&gt; wrote:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; On Thu, Jul 13, 2023 at 12:=
14=E2=80=AFPM Warner Losh &lt;<a href=3D"mailto:imp@bsdimp.com" rel=3D"nore=
ferrer noreferrer" target=3D"_blank">imp@bsdimp.com</a><br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&lt;mailto:<a href=3D"mailto:imp@bsdi=
mp.com" rel=3D"noreferrer noreferrer" target=3D"_blank">imp@bsdimp.com</a>&=
gt;&gt; wrote:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt; Greetings,<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt; i&#39;ve been looking =
closely at failed drives for $WORK lately. I&#39;ve<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0noticed that a lot of errors that kin=
da sound like fatal errors have<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0SS_RDEF set on them.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt; What&#39;s the process=
 for evaluating whether those error codes are<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0worth retrying. There are several err=
ors that we seem to be seeing<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(preliminary read of the data) before=
 the drive gives up the ghost<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0altogether. For those cases, I&#39;d =
like to post more specific lists.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Should I do that here?<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt; Independent of that, I=
 may want to have a more aggressive &#39;fail<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0fast&#39; policy than is appropriate =
for my work load (we have a lot of data<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0that&#39;s a copy of a copy of a copy=
, so if we lose it, we don&#39;t care:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0we&#39;ll just delete any files we ca=
n&#39;t read and get on with life, though I<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0know others will have a more conserva=
tive attitude towards data that<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0might be precious and unique). I can =
set the number of retries lower, I<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0can do some other hacks for disks tha=
t tell the disk to fail faster, but<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I think part of the solution is going=
 to have to be failing for some<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sense-code/ASC/ASCQ tuples that we do=
n&#39;t want to fail in upstream or the<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0general case. I was thinking of ident=
ifying those and creating a &#39;global<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0quirk table&#39; that gets applied af=
ter the drive-specific quirk table that<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0would let $WORK override the defaults=
, while letting others keep the<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0current behavior. IMHO, it would be b=
etter to have these separate rather<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0than in the global data for tracking =
upstream...<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt; Is that clear, or shou=
ld I give concrete examples?<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt; Comments?<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt; Warner<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; Basically, you want to chan=
ge the retry counts for certain ASC/ASCQ<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; codes only, on a site-by-si=
te basis?=C2=A0 That sounds reasonable.=C2=A0 Would<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; it be configurable at runti=
me or only at build time?<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt; I&#39;d like to change the defa=
ult actions. But maybe we just do that for<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0everyone and assume modern drives...<=
br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; Also, I&#39;ve been thinkin=
g lately that it would be real nice if READ<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; UNRECOVERABLE could be tran=
slated to EINTEGRITY instead of EIO.=C2=A0 That<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; would let consumers know th=
at retries are pointless, but that the data<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; is probably healable.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt; Unlikely, unless you&#39;ve tun=
ed things to not try for long at recovery...<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt; But regardless... do you have a=
 concrete example of a use case?<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0There&#39;s a number of places that m=
ap any error to EIO. And I&#39;d like a use<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0case before we expand the errors the =
lower layers return...<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt; Warner<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0My first use-case is a user-space FUS=
E file system.=C2=A0 It only has<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0access to errnos, not ASC/ASCQ codes.=
=C2=A0 If we do as I suggest, then it<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0could heal a READ UNRECOVERABLE by re=
writing the sector, whereas other<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0EIO errors aren&#39;t likely to be he=
aled that way.<br>
&gt; <br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0Yea... but READ UNRECOVERABLE is kinda hit or miss.=
..<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0My second use-case is ZFS.=C2=A0 zfsd=
 treats checksum errors differently<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0from I/O errors.=C2=A0 A checksum err=
or normally means that a read returned<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0wrong data.=C2=A0 But I think that RE=
AD UNRECOVERABLE should also count.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0After all, that means that the disk&#=
39;s media returned wrong data which<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0was detected by the disk&#39;s own ED=
C/ECC.=C2=A0 I&#39;ve noticed that zfsd seems<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0to fault disks too eagerly when their=
 only problem is READ<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0UNRECOVERABLE errors.=C2=A0 Mapping i=
t to EINTEGRITY, or even a new error<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0code, would let zfsd be tuned better.=
<br>
&gt; <br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0EINTEGRITY would then mean two different things. UF=
S returns in when<br>
&gt;=C2=A0 =C2=A0 =C2=A0checksums fail for critical=C2=A0filesystem errors.=
 I&#39;m not saying no, per se,<br>
&gt;=C2=A0 =C2=A0 =C2=A0just that it conflates two different errors.<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0I think both of these use cases would be better ser=
ved by CAM&#39;s publishing<br>
&gt;=C2=A0 =C2=A0 =C2=A0of the errors to devctl today. Here&#39;s some exam=
ple data from a system I&#39;m<br>
&gt;=C2=A0 =C2=A0 =C2=A0looking at:<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Dtimeout devi=
ce=3Dda36 serial=3D&quot;12345&quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0cam_status=3D&quot;0x44b&quot; timeout=3D30000 CDB=
=3D&quot;28 00 4e b7 cb a3 00 04 cc 00 &quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0timestamp=3D1634739729.312068<br>
&gt;=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Dtimeout devi=
ce=3Dda36 serial=3D&quot;12345&quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0cam_status=3D&quot;0x44b&quot; timeout=3D30000 CDB=
=3D&quot;28 00 20 6b d5 56 00 00 c0 00 &quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0timestamp=3D1634739729.585541<br>
&gt;=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Derror device=
=3Dda36 serial=3D&quot;12345&quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0cam_status=3D&quot;0x4cc&quot; scsi_status=3D2 scsi=
_sense=3D&quot;72 03 11 00&quot; CDB=3D&quot;28 00 ad 1a<br>
&gt;=C2=A0 =C2=A0 =C2=A035 96 00 00 56 00 &quot; timestamp=3D1641979267.469=
064<br>
&gt;=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Derror device=
=3Dda36 serial=3D&quot;12345&quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0cam_status=3D&quot;0x4cc&quot; scsi_status=3D2 scsi=
_sense=3D&quot;72 03 11 00&quot; CDB=3D&quot;28 00 ad 1a<br>
&gt;=C2=A0 =C2=A0 =C2=A035 96 00 01 5e 00 &quot; =C2=A0timestamp=3D16422525=
39.693699<br>
&gt;=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Derror device=
=3Dda39 serial=3D&quot;12346&quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0cam_status=3D&quot;0x4cc&quot; scsi_status=3D2 scsi=
_sense=3D&quot;72 04 02 00&quot; CDB=3D&quot;2a 00 01 2b<br>
&gt;=C2=A0 =C2=A0 =C2=A0c8 f6 00 07 81 00 &quot; =C2=A0timestamp=3D16696031=
44.090835<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0Here we get the sense key, the asc and the ascq in =
the scsi_sense data (I&#39;m<br>
&gt;=C2=A0 =C2=A0 =C2=A0currently looking at expanding this to the entire s=
ense buffer, since it<br>
&gt;=C2=A0 =C2=A0 =C2=A0includes how hard the drive tried to read the data =
on media and hardware<br>
&gt;=C2=A0 =C2=A0 =C2=A0errors).=C2=A0 It doesn&#39;t include nvme data, bu=
t does include ata data (I&#39;ll have<br>
&gt;=C2=A0 =C2=A0 =C2=A0to add that data, now that I&#39;ve noticed it is m=
issing).=C2=A0 With the sense data<br>
&gt;=C2=A0 =C2=A0 =C2=A0and the CDB you know what kind of error you got, pl=
us what block didn&#39;t<br>
&gt;=C2=A0 =C2=A0 =C2=A0read/write correctly. With the extended sense data,=
 you can find out even<br>
&gt;=C2=A0 =C2=A0 =C2=A0more details that are sense-key dependent...<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0So I&#39;m unsure that trying to shoehorn our imper=
fect knowledge of what&#39;s<br>
&gt;=C2=A0 =C2=A0 =C2=A0retriable, fixable, should be written with zeros in=
to the kernel and<br>
&gt;=C2=A0 =C2=A0 =C2=A0converting that to a separate errno would give good=
 results, and tapping<br>
&gt;=C2=A0 =C2=A0 =C2=A0into this stream daemons that want to make more nua=
nced calls about disks<br>
&gt;=C2=A0 =C2=A0 =C2=A0might be the better way to go. One of the things I&=
#39;m planning for $WORK is<br>
&gt;=C2=A0 =C2=A0 =C2=A0to enable the retry time limit of one of the mode p=
ages so that we fail<br>
&gt;=C2=A0 =C2=A0 =C2=A0faster and can just delete the file with the &#39;b=
ad&#39; block that we&#39;d get<br>
&gt;=C2=A0 =C2=A0 =C2=A0eventually if we allowed the full, default error pr=
ocessing to run, but that<br>
&gt;=C2=A0 =C2=A0 =C2=A0&#39;slow path&#39; processing kills performance fo=
r all other users of the<br>
&gt;=C2=A0 =C2=A0 =C2=A0drive...=C2=A0 I&#39;m unsure how well that will wo=
rk out (and I know I&#39;m lucky that<br>
&gt;=C2=A0 =C2=A0 =C2=A0I can always recover any data for my application si=
nce it&#39;s just a cache).<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0I&#39;d be interested to hear what others have to s=
ay here thought, since my<br>
&gt;=C2=A0 =C2=A0 =C2=A0focus on this data is through the lense of my rathe=
r specialized application...<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0Warner<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0P.S. That was generated with this rule if you wante=
d to play with it...<br>
&gt;=C2=A0 =C2=A0 =C2=A0You&#39;d have to translate absolute disk blocks to=
 a partition and an offset<br>
&gt;=C2=A0 =C2=A0 =C2=A0into the filesystem, then give the filesystem a cha=
nce to tell you what of<br>
&gt;=C2=A0 =C2=A0 =C2=A0its data/metadata that block is used for...<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0# Disk errors<br>
&gt;=C2=A0 =C2=A0 =C2=A0notify 10 {<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 match &quot;system&quo=
t; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&quot;CAM&quot;;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 match &quot;subsystem&=
quot; =C2=A0 =C2=A0 =C2=A0 &quot;periph&quot;;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 match &quot;device&quo=
t; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&quot;[an]?da[0-9]+&quot;;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 action &quot;logger -t=
 diskerr -p <a href=3D"http://daemon.info" rel=3D"noreferrer noreferrer nor=
eferrer" target=3D"_blank">daemon.info</a> &lt;<a href=3D"http://daemon.inf=
o" rel=3D"noreferrer noreferrer noreferrer" target=3D"_blank">http://daemon=
.info</a>&gt; $_<br>
&gt;=C2=A0 =C2=A0 =C2=A0timestamp=3D$timestamp&quot;;<br>
&gt;=C2=A0 =C2=A0 =C2=A0};<br>
&gt; <br>
<br>
</blockquote></div></div></div>

--000000000000fa60eb0600f6d3dc--