From nobody Thu Feb 15 05:29:37 2024 X-Original-To: freebsd-scsi@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Tb3Y32kCHz5BC4M for ; Thu, 15 Feb 2024 05:29:59 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from cu1176c.smtpx.saremail.com (cu1176c.smtpx.saremail.com [195.16.148.151]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Tb3Y15KpZz4Ghn for ; Thu, 15 Feb 2024 05:29:57 +0000 (UTC) (envelope-from borjam@sarenet.es) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=sarenet.es header.s=saremail header.b=bUsbajOj; dmarc=pass (policy=none) header.from=sarenet.es; spf=pass (mx1.freebsd.org: domain of borjam@sarenet.es designates 195.16.148.151 as permitted sender) smtp.mailfrom=borjam@sarenet.es Received: from localhost (unknown [194.30.0.235]) by sieve-smtp-backend02.sarenet.es (Postfix) with ESMTP id 5AF6960C227 for ; Thu, 15 Feb 2024 06:29:49 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=sarenet.es; h= x-mailer:date:date:message-id:subject:subject:mime-version :content-transfer-encoding:content-type:content-type:from:from :received:received:received:received; s=saremail; t=1707974989; bh=UMkzcH3CC5pZ6Cv5yQWBHYVpEAPIAgWs6ftwoHKt/ZQ=; b=bUsbajOjem7y WM2P75I5EPocE18ZKQfkoWQ/Df2MANxni1kvKj3bGLZF9psdBl0qnR0p3eIqvTwV rcjzM22Yf51IQjaw4ozNh2jeXi+p9OqP0d0aDhr6nZajWwG5Srd1eRQZIhYSfevE 2XRlzDqWiDXyDh11IP3k/EXGFjNBG7Xs/4Pks2hsgSHjU9bZvEKjmnBfJ+1Uxqdz 4P18hU4dbLt7YyUGISrbYTM5MBAApmcw9+VWXKpNCepfndBmvB2uO10VlJ626t+v k6O3DZQiTjf29ulRKCdXlJQhb2gaHI7cYerlx6qOrNVAPCzBve+EDB4yXsJQ1AHE 1O3pUPtaCw== Received: from sieve-smtp-backend02.sarenet.es ([194.30.0.95]) by localhost (dkim-disclaimer04.saremail.com [194.30.0.235]) (amavisd-new, port 10024) with ESMTP id xwnxhA8C5KEt for ; Thu, 15 Feb 2024 06:29:49 +0100 (CET) Received: from localhost (unknown [194.30.0.235]) by sieve-smtp-backend02.sarenet.es (Postfix) with ESMTP id 445B160C1D4 for ; Thu, 15 Feb 2024 06:29:49 +0100 (CET) X-Amavis-Modified: Mail body modified (using disclaimer) - dkim-disclaimer04.saremail.com Received: from sieve-smtp-backend02.sarenet.es ([194.30.0.95]) by localhost (dkim-disclaimer04.saremail.com [194.30.0.235]) (amavisd-new, port 10023) with ESMTP id RDUlvxHvDHc0 for ; Thu, 15 Feb 2024 06:29:49 +0100 (CET) Received: from smtpclient.apple (unknown [192.148.167.11]) AUTENTIFICADOSAREMAIL by sieve-smtp-backend02.sarenet.es (Postfix) with ESMTPA id C687360C1AF for ; Thu, 15 Feb 2024 06:29:47 +0100 (CET) From: Borja Marcos Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: SCSI subsystem List-Archive: https://lists.freebsd.org/archives/freebsd-scsi List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-scsi@freebsd.org X-BeenThere: freebsd-scsi@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.400.31\)) Subject: Possible CAM regression: Error handling (retries) change between 13.2 and 14 Message-Id: <060129E4-71A8-4F02-B4AE-D4AB788F31A0@sarenet.es> Date: Thu, 15 Feb 2024 06:29:37 +0100 To: freebsd-scsi@freebsd.org X-Mailer: Apple Mail (2.3774.400.31) X-dominio-dkim: sarenet.es X-rutado-saremail: smtp1176 X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.98 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.98)[-0.981]; DMARC_POLICY_ALLOW(-0.50)[sarenet.es,none]; R_SPF_ALLOW(-0.20)[+ip4:195.16.148.0/24]; R_DKIM_ALLOW(-0.20)[sarenet.es:s=saremail]; MIME_GOOD(-0.10)[text/plain]; ASN(0.00)[asn:3262, ipnet:195.16.128.0/19, country:ES]; MIME_TRACE(0.00)[0:+]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; APPLE_MAILER_COMMON(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-scsi@freebsd.org]; RCVD_COUNT_FIVE(0.00)[5]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; DWL_DNSWL_NONE(0.00)[sarenet.es:dkim]; TO_DN_NONE(0.00)[]; RCVD_TLS_LAST(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-scsi@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[sarenet.es:+] X-Rspamd-Queue-Id: 4Tb3Y15KpZz4Ghn Hi, I have updated a system from 13.2 to 14 and I have noticed a change in = SATA error handling. Although I am replacing the troublesome disk I am not sure whether this = is a regression. The server where it is running has been showing some SATA errors stating = that the command was retried.=20 Feb 4 20:07:04 micro1 kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. = ACB: 61 08 28 01 00 40 00 00 00 00 00 00 Feb 4 20:07:04 micro1 kernel: (ada0:ahcich0:0:0:0): CAM status: ATA = Status Error Feb 4 20:07:04 micro1 kernel: (ada0:ahcich0:0:0:0): ATA status: 41 = (DRDY ERR), error: 10 (IDNF ) Feb 4 20:07:04 micro1 kernel: (ada0:ahcich0:0:0:0): RES: 41 10 28 01 00 = 40 00 00 00 00 00 Feb 4 20:07:04 micro1 kernel: (ada0:ahcich0:0:0:0): Retrying command, 3 = more tries remain However, seems the command was retried. I am not sure whether there is = some backplane change or it is really a disk problem. That said, the ZFS pool was scrubbed monthly and there was never any = corruption.Moreover it has quite a lot of disk I/O and I haven=E2=80=99t = seen any hiccups.=20 After updating to 14 I have seen a similar pattern, but error handling = has changed: Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. = ACB: 61 08 f8 01 00 40 00 00 00 00 00 00 Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): CAM status: = Auto-Sense Retrieval Failed Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): Error 5, = Unretryable error Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. = ACB: 61 08 f8 03 00 40 00 00 00 00 00 00 Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): CAM status: = Auto-Sense Retrieval Failed Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): Error 5, = Unretryable error Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. = ACB: 61 08 f8 9f 50 40 5d 01 00 00 00 00 Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): CAM status: = Auto-Sense Retrieval Failed Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): Error 5, = Unretryable error Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. = ACB: 61 08 f8 a1 50 40 5d 01 00 00 00 00 Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): CAM status: = Auto-Sense Retrieval Failed Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): Error 5, = Unretryable error Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. = ACB: 60 10 10 02 00 40 00 00 00 00 00 00 Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): CAM status: = Auto-Sense Retrieval Failed Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): Error 5, = Unretryable error Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. = ACB: 60 10 10 9e 50 40 5d 01 00 00 00 00 Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): CAM status: = Auto-Sense Retrieval Failed Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): Error 5, = Unretryable error Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. = ACB: 60 10 10 a0 50 40 5d 01 00 00 00 00 Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): CAM status: = Auto-Sense Retrieval Failed Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): Error 5, = Unretryable error Feb 9 16:59:52 micro1 ZFS[4228]: vdev I/O failure, zpool=3Dunpul = path=3D/dev/ada0 offset=3D270336 size=3D8192 error=3D5 Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. = ACB: 60 10 10 02 00 40 00 00 00 00 00 00 Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): CAM status: = Auto-Sense Retrieval Failed Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): Error 5, = Unretryable error Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. = ACB: 60 10 10 9e 50 40 5d 01 00 00 00 00 Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): CAM status: = Auto-Sense Retrieval Failed Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): Error 5, = Unretryable error Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. = ACB: 60 10 10 a0 50 40 5d 01 00 00 00 00 Feb 9 16:59:52 micro1 kernel: (ada0:ahcich0:0:0:0): CAM status: = Auto-Sense Retrieval Failed Now, the questions are: - Were some errors considered retryable on FreeBSD < 14 and since = FreeBSD 14 they are unretryable now? - Is this new FreeBSD 14 behavior a bug or a feature? I mean, was it = wrong to consider these errors retryable before? - Or, of course, is this a coincidence and the disk has just begun to = show its age right after I have updated to FreeBSD 14? Hardware information: It is a HP Microserver Gen 8 with its builtin AHCI controller: ahci0: port = 0x10c0-0x10c7,0x10c8-0x10cb,0x10d0-0x10d7,0x10d8-0x10db,0x10e0-0x10ff = mem 0xfacd0000-0xfacd07ff irq 17 at device 31.2 on pci0 ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier supported ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ahcich2: at channel 2 on ahci0 ahcich3: at channel 3 on ahci0 ahcich4: at channel 4 on ahci0 ahcich5: at channel 5 on ahci0 ahciem0: on ahci0 And the affected disk is a 3 TB Western Digital Red=20 ada0: ACS-2 ATA SATA 3.x device ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 2861588MB (5860533168 512 byte sectors) ada0: quirks=3D0x1<4K> ses0: ada0,pass0 in 'Slot 00', SATA Slot: scbus0 target 0 Thanks! Borja.