From nobody Fri Jul 21 03:18:44 2023
X-Original-To: scsi@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R6ZXF0gwdz4nrT7
	for <scsi@mlmmj.nyi.freebsd.org>; Fri, 21 Jul 2023 03:18:53 +0000 (UTC)
	(envelope-from dgilbert@interlog.com)
Received: from mp-relay-01.fibernetics.ca (mp-relay-01.fibernetics.ca [208.85.217.136])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(Client did not present a certificate)
	by mx1.freebsd.org (Postfix) with ESMTPS id 4R6ZXD1ZSGz4l1x;
	Fri, 21 Jul 2023 03:18:52 +0000 (UTC)
	(envelope-from dgilbert@interlog.com)
Authentication-Results: mx1.freebsd.org;
	dkim=none;
	spf=pass (mx1.freebsd.org: domain of dgilbert@interlog.com designates 208.85.217.136 as permitted sender) smtp.mailfrom=dgilbert@interlog.com;
	dmarc=none
Received: from mailpool-fe-01.fibernetics.ca (mailpool-fe-01.fibernetics.ca [208.85.217.144])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by mp-relay-01.fibernetics.ca (Postfix) with ESMTPS id 8DB1CE1C04;
	Fri, 21 Jul 2023 03:18:45 +0000 (UTC)
Received: from localhost (mailpool-mx-01.fibernetics.ca [208.85.217.140])
	by mailpool-fe-01.fibernetics.ca (Postfix) with ESMTP id 7DD8B3CAB7;
	Fri, 21 Jul 2023 03:18:45 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at 
X-Spam-Flag: NO
X-Spam-Score: -0.199
X-Spam-Level:
X-Spam-Status: No, score=-0.199 tagged_above=-999 required=5
	tests=[ALL_TRUSTED=-1, BAYES_50=0.8, URIBL_BLOCKED=0.001]
	autolearn=no autolearn_force=no
Received: from mailpool-fe-01.fibernetics.ca ([208.85.217.144])
	by localhost (mail-mx-01.fibernetics.ca [208.85.217.140]) (amavisd-new, port 10024)
	with ESMTP id pksqIB5eRMRX; Fri, 21 Jul 2023 03:18:44 +0000 (UTC)
Received: from [192.168.48.17] (host-192.252-165-26.dyn.295.ca [192.252.165.26])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits))
	(No client certificate requested)
	(Authenticated sender: dgilbert@interlog.com)
	by mail.ca.inter.net (Postfix) with ESMTPSA id 84D8A3CAB5;
	Fri, 21 Jul 2023 03:18:44 +0000 (UTC)
Message-ID: <7df852e4-5df5-de51-70a6-08bcbcb2f757@interlog.com>
Date: Thu, 20 Jul 2023 23:18:44 -0400
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-scsi
List-Help: <mailto:scsi+help@freebsd.org>
List-Post: <mailto:scsi@freebsd.org>
List-Subscribe: <mailto:scsi+subscribe@freebsd.org>
List-Unsubscribe: <mailto:scsi+unsubscribe@freebsd.org>
Sender: owner-freebsd-scsi@freebsd.org
X-BeenThere: freebsd-scsi@freebsd.org
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.13.0
Reply-To: dgilbert@interlog.com
Subject: Re: ASC/ASCQ Review
Content-Language: en-CA
To: Warner Losh <imp@bsdimp.com>, Alan Somers <asomers@freebsd.org>
Cc: scsi@freebsd.org
References: <CANCZdfokEoRtNp0en=9pjLQSQ+jtmfwH3OOwz1z09VcwWpE+xg@mail.gmail.com>
 <CAOtMX2g4+SDWg9WKbwZcqh4GpRan593O6qtNf7feoVejVK0YyQ@mail.gmail.com>
 <CANCZdfq5qti5uzWLkZaQEpyd5Q255sQeaR_kC_OQinmE9Qcqaw@mail.gmail.com>
 <CAOtMX2iwnpHL6b2-1D4N4Bi4eKoLnGK4=+gUowXGS_gtyDOkig@mail.gmail.com>
 <CANCZdfr-y8HYBb6GCFqZ7LAarxUAGb36Y6j+bo+WiDwUT5uR7A@mail.gmail.com>
 <CANCZdfptEG=+xa3m31Ngre26ZQxZ_Fqsfjmh+tVHgP2XpqhZ7g@mail.gmail.com>
From: Douglas Gilbert <dgilbert@interlog.com>
In-Reply-To: <CANCZdfptEG=+xa3m31Ngre26ZQxZ_Fqsfjmh+tVHgP2XpqhZ7g@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spamd-Result: default: False [-3.30 / 15.00];
	NEURAL_HAM_LONG(-1.00)[-1.000];
	NEURAL_HAM_MEDIUM(-1.00)[-0.999];
	NEURAL_HAM_SHORT(-1.00)[-0.998];
	R_SPF_ALLOW(-0.20)[+ip4:208.85.217.0/24];
	MIME_GOOD(-0.10)[text/plain];
	TO_MATCH_ENVRCPT_SOME(0.00)[];
	MLMMJ_DEST(0.00)[scsi@freebsd.org];
	R_DKIM_NA(0.00)[];
	FROM_EQ_ENVFROM(0.00)[];
	ASN(0.00)[asn:36493, ipnet:208.85.216.0/21, country:CA];
	MIME_TRACE(0.00)[0:+];
	HAS_REPLYTO(0.00)[dgilbert@interlog.com];
	DMARC_NA(0.00)[interlog.com];
	REPLYTO_ADDR_EQ_FROM(0.00)[];
	RCVD_COUNT_FIVE(0.00)[5];
	RCVD_VIA_SMTP_AUTH(0.00)[];
	RCVD_TLS_LAST(0.00)[];
	ARC_NA(0.00)[];
	RCPT_COUNT_THREE(0.00)[3];
	FROM_HAS_DN(0.00)[];
	TO_DN_SOME(0.00)[];
	MID_RHS_MATCH_FROM(0.00)[]
X-Rspamd-Queue-Id: 4R6ZXD1ZSGz4l1x
X-Spamd-Bar: ---

On 2023-07-19 11:41, Warner Losh wrote:
> btw, it also occurs to me that if I do add a 'secondary' table, then you could 
> use it to generate a unique errno and experiment
> with that w/o affecting the main code until that stuff was mature.
> 
> I'm not sure I'll do that now, since I've found maybe 10 asc/ascq pairs that I'd 
> like to tag as 'if trying harder, retry, otherwise fail' since re-retry needs 
> have changed a lot since cam was written in the late 90s and at least some of 
> the asc/ascq pairs I'm looking at haven't changed since the initial import, but 
> that's based on a tiny sampling of the data I have and is preliminary at best. I 
> may just change it to reflect modern usage.

Hi,
If you are looking for up-to-date [20230325] asc/ascq tables in C you could
borrow mine at https://github.com/doug-gilbert/sg3_utils in lib/sg_lib_data.c
starting at line 745 .
In testing/sg_chk_asc.c is a small test program for checking that the table in
sg_lib_data.c agrees with the file that T10 supplies:
      https://www.t10.org/lists/asc-num.txt

Doug Gilbert

> On Fri, Jul 14, 2023 at 5:34 PM Warner Losh <imp@bsdimp.com 
> <mailto:imp@bsdimp.com>> wrote:
> 
> 
> 
>     On Fri, Jul 14, 2023 at 12:31 PM Alan Somers <asomers@freebsd.org
>     <mailto:asomers@freebsd.org>> wrote:
> 
>         On Fri, Jul 14, 2023 at 11:05 AM Warner Losh <imp@bsdimp.com
>         <mailto:imp@bsdimp.com>> wrote:
>          >
>          >
>          >
>          > On Fri, Jul 14, 2023, 11:12 AM Alan Somers <asomers@freebsd.org
>         <mailto:asomers@freebsd.org>> wrote:
>          >>
>          >> On Thu, Jul 13, 2023 at 12:14 PM Warner Losh <imp@bsdimp.com
>         <mailto:imp@bsdimp.com>> wrote:
>          >> >
>          >> > Greetings,
>          >> >
>          >> > i've been looking closely at failed drives for $WORK lately. I've
>         noticed that a lot of errors that kinda sound like fatal errors have
>         SS_RDEF set on them.
>          >> >
>          >> > What's the process for evaluating whether those error codes are
>         worth retrying. There are several errors that we seem to be seeing
>         (preliminary read of the data) before the drive gives up the ghost
>         altogether. For those cases, I'd like to post more specific lists.
>         Should I do that here?
>          >> >
>          >> > Independent of that, I may want to have a more aggressive 'fail
>         fast' policy than is appropriate for my work load (we have a lot of data
>         that's a copy of a copy of a copy, so if we lose it, we don't care:
>         we'll just delete any files we can't read and get on with life, though I
>         know others will have a more conservative attitude towards data that
>         might be precious and unique). I can set the number of retries lower, I
>         can do some other hacks for disks that tell the disk to fail faster, but
>         I think part of the solution is going to have to be failing for some
>         sense-code/ASC/ASCQ tuples that we don't want to fail in upstream or the
>         general case. I was thinking of identifying those and creating a 'global
>         quirk table' that gets applied after the drive-specific quirk table that
>         would let $WORK override the defaults, while letting others keep the
>         current behavior. IMHO, it would be better to have these separate rather
>         than in the global data for tracking upstream...
>          >> >
>          >> > Is that clear, or should I give concrete examples?
>          >> >
>          >> > Comments?
>          >> >
>          >> > Warner
>          >>
>          >> Basically, you want to change the retry counts for certain ASC/ASCQ
>          >> codes only, on a site-by-site basis?  That sounds reasonable.  Would
>          >> it be configurable at runtime or only at build time?
>          >
>          >
>          > I'd like to change the default actions. But maybe we just do that for
>         everyone and assume modern drives...
>          >
>          >> Also, I've been thinking lately that it would be real nice if READ
>          >> UNRECOVERABLE could be translated to EINTEGRITY instead of EIO.  That
>          >> would let consumers know that retries are pointless, but that the data
>          >> is probably healable.
>          >
>          >
>          > Unlikely, unless you've tuned things to not try for long at recovery...
>          >
>          > But regardless... do you have a concrete example of a use case?
>         There's a number of places that map any error to EIO. And I'd like a use
>         case before we expand the errors the lower layers return...
>          >
>          > Warner
> 
>         My first use-case is a user-space FUSE file system.  It only has
>         access to errnos, not ASC/ASCQ codes.  If we do as I suggest, then it
>         could heal a READ UNRECOVERABLE by rewriting the sector, whereas other
>         EIO errors aren't likely to be healed that way.
> 
> 
>     Yea... but READ UNRECOVERABLE is kinda hit or miss...
> 
>         My second use-case is ZFS.  zfsd treats checksum errors differently
>         from I/O errors.  A checksum error normally means that a read returned
>         wrong data.  But I think that READ UNRECOVERABLE should also count.
>         After all, that means that the disk's media returned wrong data which
>         was detected by the disk's own EDC/ECC.  I've noticed that zfsd seems
>         to fault disks too eagerly when their only problem is READ
>         UNRECOVERABLE errors.  Mapping it to EINTEGRITY, or even a new error
>         code, would let zfsd be tuned better.
> 
> 
>     EINTEGRITY would then mean two different things. UFS returns in when
>     checksums fail for critical filesystem errors. I'm not saying no, per se,
>     just that it conflates two different errors.
> 
>     I think both of these use cases would be better served by CAM's publishing
>     of the errors to devctl today. Here's some example data from a system I'm
>     looking at:
> 
>     system=CAM subsystem=periph type=timeout device=da36 serial="12345"
>     cam_status="0x44b" timeout=30000 CDB="28 00 4e b7 cb a3 00 04 cc 00 "
>       timestamp=1634739729.312068
>     system=CAM subsystem=periph type=timeout device=da36 serial="12345"
>     cam_status="0x44b" timeout=30000 CDB="28 00 20 6b d5 56 00 00 c0 00 "
>       timestamp=1634739729.585541
>     system=CAM subsystem=periph type=error device=da36 serial="12345"
>     cam_status="0x4cc" scsi_status=2 scsi_sense="72 03 11 00" CDB="28 00 ad 1a
>     35 96 00 00 56 00 " timestamp=1641979267.469064
>     system=CAM subsystem=periph type=error device=da36 serial="12345"
>     cam_status="0x4cc" scsi_status=2 scsi_sense="72 03 11 00" CDB="28 00 ad 1a
>     35 96 00 01 5e 00 "  timestamp=1642252539.693699
>     system=CAM subsystem=periph type=error device=da39 serial="12346"
>     cam_status="0x4cc" scsi_status=2 scsi_sense="72 04 02 00" CDB="2a 00 01 2b
>     c8 f6 00 07 81 00 "  timestamp=1669603144.090835
> 
>     Here we get the sense key, the asc and the ascq in the scsi_sense data (I'm
>     currently looking at expanding this to the entire sense buffer, since it
>     includes how hard the drive tried to read the data on media and hardware
>     errors).  It doesn't include nvme data, but does include ata data (I'll have
>     to add that data, now that I've noticed it is missing).  With the sense data
>     and the CDB you know what kind of error you got, plus what block didn't
>     read/write correctly. With the extended sense data, you can find out even
>     more details that are sense-key dependent...
> 
>     So I'm unsure that trying to shoehorn our imperfect knowledge of what's
>     retriable, fixable, should be written with zeros into the kernel and
>     converting that to a separate errno would give good results, and tapping
>     into this stream daemons that want to make more nuanced calls about disks
>     might be the better way to go. One of the things I'm planning for $WORK is
>     to enable the retry time limit of one of the mode pages so that we fail
>     faster and can just delete the file with the 'bad' block that we'd get
>     eventually if we allowed the full, default error processing to run, but that
>     'slow path' processing kills performance for all other users of the
>     drive...  I'm unsure how well that will work out (and I know I'm lucky that
>     I can always recover any data for my application since it's just a cache).
> 
>     I'd be interested to hear what others have to say here thought, since my
>     focus on this data is through the lense of my rather specialized application...
> 
>     Warner
> 
>     P.S. That was generated with this rule if you wanted to play with it...
>     You'd have to translate absolute disk blocks to a partition and an offset
>     into the filesystem, then give the filesystem a chance to tell you what of
>     its data/metadata that block is used for...
> 
>     # Disk errors
>     notify 10 {
>              match "system"          "CAM";
>              match "subsystem"       "periph";
>              match "device"          "[an]?da[0-9]+";
>              action "logger -t diskerr -p daemon.info <http://daemon.info> $_
>     timestamp=$timestamp";
>     };
>