From nobody Mon Jan 03 16:15:59 2022 X-Original-To: freebsd-drivers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 27825192913A for ; Mon, 3 Jan 2022 16:16:11 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ua1-x933.google.com (mail-ua1-x933.google.com [IPv6:2607:f8b0:4864:20::933]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JSLTR0Qt5z4bCh for ; Mon, 3 Jan 2022 16:16:11 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-ua1-x933.google.com with SMTP id p1so36501681uap.9 for ; Mon, 03 Jan 2022 08:16:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=GzC7Vfg4hSekeaSHQU8++uaHK8ruRajI7jZE1EnwJ9A=; b=HMjeIekzBmFjSussX671GMwuyMlZSjr4yqprJsdZdRT2HC0hURM5HqsowYeFZzz6Zu lwZGZYb0CK7LklYWRn6ffuZjMQGmlwHCHhkXOD9JVIpRvfxHuA+jWx8wD8PlToV2z/A9 cHmFAO9SNBbplr9iuW2ZY+Iib78+BudLi246a7tGHaJ+Clo2ojL4ZJren3gfkU+CwoVC L36wj9qWFXG1tQmljLOmdCGQ9W4+2xtUqfxTtU4ronLKwjrtItnVLk+9REx3tISlvFG2 A/jjww8y4ZLmFQQnvjp7Tt2LhdMBcNTjFO8iNveyX4YOvG7CRLqdQF7b7tSVGw7epR22 J9Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GzC7Vfg4hSekeaSHQU8++uaHK8ruRajI7jZE1EnwJ9A=; b=ZxUF1wjBOOl4PadixBJ8easjVSZ4zem/S6eWSZS5SQXgBxIRHaXgki/Ypuhefim9Km b1Ub+4Uc4PhnU71EPErNDPW5NFcDuZffKpKmeTo8poRyXofoBu+jq27jh491enckgMWn Sz3KQ9eBNUALZU6tcF5YLs54+dV6czT/wo+Qhg1o60rG4vzkt0FIJP+Qt6uDyq5rw0B/ LztlOCusAMfAf9baxY8VHkTfPjsCQqRHBoYCm8VFrfM4dW0JvGPgmDEiUBIikbmSDmO5 T4Vy4rWSGD+IAorhpHzvkeoVuwwyQ2VAbAtQAUD4KzSEUCWpYgxVdmzrlhqrB2otUZn4 bBLQ== X-Gm-Message-State: AOAM532+EGlg3kFWAiGjY2fZUc9BrjcQa6yP53GsjfI40gl3lfo4lDUk Ma3YFIzPuW7n9+YBp+bJxq2bJa+srS01sItOHCgqmj8tl1o= X-Google-Smtp-Source: ABdhPJwxVE7GlgpeY0WTPO012h1mmIrcGenemNQuZuOTVN4fnhQppWiiKXwzOgaZxrwCYemm3Nn6oEXWgmeY6TiOcvQ= X-Received: by 2002:a05:6102:3e86:: with SMTP id m6mr5381253vsv.77.1641226570252; Mon, 03 Jan 2022 08:16:10 -0800 (PST) List-Id: Writing device drivers for FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-drivers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-drivers@freebsd.org MIME-Version: 1.0 References: <823709224.21545967.1641225670315.JavaMail.zimbra@stormshield.eu> In-Reply-To: <823709224.21545967.1641225670315.JavaMail.zimbra@stormshield.eu> From: Warner Losh Date: Mon, 3 Jan 2022 09:15:59 -0700 Message-ID: Subject: Re: SSD trim crash To: lee.matthews.external@stormshield.eu Cc: freebsd-drivers@freebsd.org Content-Type: multipart/alternative; boundary="000000000000b75e5a05d4afd639" X-Rspamd-Queue-Id: 4JSLTR0Qt5z4bCh X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N --000000000000b75e5a05d4afd639 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable You can set the trim method to nothing w/o that patch. kern.cam.ada.X.delete_method=3DDISABLE There's also a mechanism through quirks that can disable known troublemakers, though to date we've only disable those drives that cause data corruption with TRIMs. FreeBSD makes heavy use of TRIMs at times to improve the info the drive's firmware has on what's actually in use. Some drives love this and their write amplification is reduced (so performance and longevity increased). Other drives, as you've discovered, implement TRIMs in a sub-optimal manner. I've seen too many TRIMs cause drives to hang, but it's never been as severe as you are seeing (I have seen it on reboot once or twice, but my usual heartache with TRIMs has been just really bad performance). Older versions of FreeBSD didn't have TRIM collapsing in any real sense, so would send lots and lots of TRIMs to drives which often times would lead to poor performance. UFS and ada have both been enhanced to reduce that in newer versions of FreeBSD= . FreeBSD 11.3 is out of support. You may have slightly better luck with a newer FreeBSD in general (there's more TRIM management in newer FreeBSDs, especially 13.0 and newer). There's a chance that the improvements will help, but if the drives firmware is bad enough, disabling trim on the models of drives where there are problems may be your only option. Warner On Mon, Jan 3, 2022 at 9:01 AM Lee MATTHEWS via freebsd-drivers < freebsd-drivers@freebsd.org> wrote: > Hello, > > I have been tasked with working on an intermittent problem that has been > affecting some of our products. During in-house testing, we observe that > occasionally our system crashes and reboots. After reboot, the system get= s > blocked in the BIOS and it does not advance any further because the BIOS > does not detect the SSD disk. The crash appears to happen randomly during > testing, as it can take anything from several hours of continuous testing > to a week of running without stop for the crash to occur. The only way to > get the system back up and running again is to power down and then up the > machine, at which point the disk once again becomes visible to the BIOS. > > Our products are based on FreeBSD version 11.3. > > The crash appears to be linked to the type of disk that is being used. We > have one brand of disk that has never had a problem (even after a month o= f > continuous testing) and others that, after testing, fail and after reboot > get blocked in the BIOS. > > > Logging via the serial console, I have observed the following crash : > > -- > [2021-12-10 20:56:10] ahcich1: Timeout on slot 5 port 0 > [2021-12-10 20:56:10] ahcich1: is 00000000 cs 00000020 ss 00000000 rs > 00000020 tfd c0 serr 00000000 cmd 0000c517 > [2021-12-10 20:56:11] ahcich1: (ada0:ahcich1:0:0:0): DSM TRIM. ACB: 06 01 > 00 00 00 40 00 00 00 00 01 00 > [2021-12-10 20:56:11] AHCI reset... > [2021-12-10 20:56:11] (ada0:ahcich1:0:0:0): CAM status: Command timeout > [2021-12-10 20:56:11] ahcich1: (ada0:ahcich1:0:SATA connect time=3D100us > status=3D00000133 > [2021-12-10 20:56:11] 0:0): Retrying command > [2021-12-10 20:56:11] ahcich1: AHCI reset: device found > [2021-12-10 20:56:42] ahcich1: AHCI reset: device not ready after 31000ms > (tfd =3D 00000080) > [2021-12-10 20:57:15] ahcich1: Timeout on slot 6 port 0 > [2021-12-10 20:57:15] ahcich1: is 00000000 cs 00000040 ss 00000000 rs > 00000040 tfd 80 serr 00000000 cmd 0000c617 > [2021-12-10 20:57:15] ahcich1: (aprobe0:ahcich1:0:0:0): ATA_IDENTIFY. ACB= : > ec 00 00 00 00 40 00 00 00 00 00 00 > [2021-12-10 20:57:16] AHCI reset... > [2021-12-10 20:57:16] (aprobe0:ahcich1:0:0:0): CAM status: Command timeou= t > [2021-12-10 20:57:16] ahcich1: (aprobe0:SATA connect time=3D100us > status=3D00000133 > [2021-12-10 20:57:16] ahcich1:0:ahcich1: AHCI reset: device found > [2021-12-10 20:57:16] 0:0): Retrying command > [2021-12-10 20:57:47] ahcich1: AHCI reset: device not ready after 31000ms > (tfd =3D 00000080) > [2021-12-10 20:58:20] ahcich1: Timeout on slot 7 port 0 > [2021-12-10 20:58:20] ahcich1: is 00000000 cs 00000080 ss 00000000 rs > 00000080 tfd 80 serr 00000000 cmd 0000c717 > [2021-12-10 20:58:20] ahcich1: (aprobe0:ahcich1:0:0:0): ATA_IDENTIFY. ACB= : > ec 00 00 00 00 40 00 00 00 00 00 00 > [2021-12-10 20:58:21] AHCI reset... > [2021-12-10 20:58:21] (aprobe0:ahcich1:0:0:0): CAM status: Command timeou= t > [2021-12-10 20:58:21] ahcich1: (aprobe0:ahcich1:0:SATA connect time=3D100= us > status=3D00000133 > [2021-12-10 20:58:21] 0:0): ahcich1: AHCI reset: device found > [2021-12-10 20:58:21] Error 5, Retries exhausted > [2021-12-10 20:58:52] ahcich1: AHCI reset: device not ready after 31000ms > (tfd =3D 00000080) > [2021-12-10 20:59:25] ahcich1: Timeout on slot 8 port 0 > [2021-12-10 20:59:25] ahcich1: is 00000000 cs 00000100 ss 00000000 rs > 00000100 tfd 80 serr 00000000 cmd 0000c817 > [2021-12-10 20:59:25] ahcich1: (aprobe0:ahcich1:0:0:0): ATA_IDENTIFY. ACB= : > ec 00 00 00 00 40 00 00 00 00 00 00 > [2021-12-10 20:59:26] AHCI reset... > [2021-12-10 20:59:26] (aprobe0:ahcich1:0:0:0): CAM status: Command timeou= t > [2021-12-10 20:59:26] ahcich1: (aprobe0:SATA connect time=3D100us > status=3D00000133 > [2021-12-10 20:59:26] ahcich1:0:ahcich1: 0:AHCI reset: device found > [2021-12-10 20:59:26] 0): Error 5, Retry was blocked > [2021-12-10 20:59:26] ada0 at ahcich1 bus 0 scbus0 target 0 lun 0 > [2021-12-10 20:59:26] ada0: s/n > 200914802684 detached > [2021-12-10 20:59:26] g_vfs_done():ufs/log[WRITE(offset=3D5309202432, > length=3D32768)]error =3D 6 > [2021-12-10 20:59:26] pass0 at ahcich1 bus 0 scbus0 target 0 lun 0 > [2021-12-10 20:59:26] g_vfs_done():pass0: 40103000>ufs/log[WRITE(offset=3D65027883008, length=3D16384)] s/n > 200914802684error =3D 6 > [2021-12-10 20:59:26] detached > [2021-12-10 20:59:26] > g_vfs_done():(pass0:ahcich1:0:ufs/log[WRITE(offset=3D65124368384, > length=3D32768)]0:error =3D 6 > [2021-12-10 20:59:26] 0): g_vfs_done():Periph destroyed > [2021-12-10 20:59:26] ufs/log[WRITE(offset=3D65124368384, > length=3D32768)]error =3D 6 > [2021-12-10 20:59:26] g_vfs_done():ufs/log[WRITE(offset=3D65126465536, > length=3D32768)]error =3D 6 > [2021-12-10 20:59:26] g_vfs_done():ufs/log[WRITE(offset=3D65126465536, > length=3D32768)]error =3D 6 > [2021-12-10 20:59:26] > g_vfs_done():g_vfs_done():ufs/log[WRITE(offset=3D24702976, > length=3D512)]ufs/log[WRITE(offset=3D65126662144, length=3D16384)]error = =3D 6 > [2021-12-10 20:59:27] panic: cannot reassign paging buffer > [2021-12-10 20:59:27] cpuid =3D 0 > [2021-12-10 20:59:27] __HardenedBSD_version =3D 1100056 __FreeBSD_version= =3D > 1103500 > [2021-12-10 20:59:27] version =3D NS-BSD 4.2.0.beta--HBSD #0 : Wed Dec 1 > 14:00:48 CET 2021 > [2021-12-10 20:59:27] build@BuildFreeBSD-11.3-hardened > :/home/build/build/kernel/work-OPTIM/sys/amd64/compile/NETASQ.XL.SMP.HW.R= ELEASE > [2021-12-10 20:59:27] Uptime: 2d0h46m1s > [2021-12-10 20:59:27] Dumping 2004 out of 8064 > MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > [2021-12-10 21:00:29] Dump complete > [2021-12-10 21:00:31] =C3=BEVersion 2.15.1236. Copyright (C) 2012 America= n > Megatrends, Inc. > -- > > It appears that after sending a SATA TRIM command, that the disk stops > responding. I've also observed another crash that happens just after a TR= IM > command. > > According to camcontrol, the disk supports the TRIM command : > > -- > camcontrol identify /dev/ada0 > pass0: ACS-4 ATA SATA 3.x device > pass0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes) > > protocol ATA/ATAPI-11 SATA 3.x > device model WDC PC SA530 SDASB8Y256G > firmware revision 40103000 > serial number 200914802684 > WWN 5001b448b13c08ca > cylinders 16383 > heads 16 > sectors/track 63 > sector size logical 512, physical 512, offset 0 > LBA supported 268435455 sectors > LBA48 supported 500118192 sectors > PIO supported PIO4 > DMA supported WDMA2 UDMA6 > media RPM non-rotating > Zoned-Device Commands no > > Feature Support Enabled Value Vendor > read ahead yes yes > write cache yes yes > flush cache yes yes > overlap no > Tagged Command Queuing (TCQ) no no > Native Command Queuing (NCQ) yes 32 tags > NCQ Queue Management no > NCQ Streaming no > Receive & Send FPDMA Queued no > SMART yes yes > microcode download yes yes > security yes no > power management yes yes > advanced power management yes yes 128/0x80 > automatic acoustic management no no > media status notification no no > power-up in Standby no no > write-read-verify no no > unload no no > general purpose logging yes yes > free-fall no no > Data Set Management (DSM/TRIM) yes > DSM - max 512byte blocks yes 8 > DSM - deterministic read yes zeroed > Host Protected Area (HPA) no > -- > > > I found this patch concerning a similar problem : > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D222802 > > I've modified sys/cam/ata/ata_da.c to deactivate the trim functionality. > > -- > +diff --git sys/cam/ata/ata_da.c sys/cam/ata/ata_da.c > +index cba52c5458be..ad256da0a495 100644 > +--- sys/cam/ata/ata_da.c > ++++ sys/cam/ata/ata_da.c > +@@ -1798,7 +1798,7 @@ adaregister(struct cam_periph *periph, void *arg) > + softc->disk->d_flags =3D DISKFLAG_DIRECT_COMPLETION | > DISKFLAG_CANZONE; > + if (softc->flags & ADA_FLAG_CAN_FLUSHCACHE) > + softc->disk->d_flags |=3D DISKFLAG_CANFLUSHCACHE; > +- if (softc->flags & ADA_FLAG_CAN_TRIM) { > ++ if (0 /*softc->flags & ADA_FLAG_CAN_TRIM */) { > + softc->disk->d_flags |=3D DISKFLAG_CANDELETE; > + softc->disk->d_delmaxsize =3D softc->params.secsize * > + ATA_DSM_RANGE_MAX * > -- > > I've been running a set of tests on one of our products for over two week= s > with the trim deactivated (using the above patch) and there have been no > crashes. > > Could this issue be an internal disk firmware problem? > > Is it possible that camcontrol reports that the disk supports TRIM yet in > reality it isn't supported, or isn't supported fully? > > I've observed that this crash doesn't happen on the first TRIM command, i= s > it possible that a set of SATA commands destabilize the disk firmware and > cause it to crash? > > Is it possible that the timeout isn't long enough for the TRIM command? > > Thanks in advance for any help. > > Best wishes, > Lee Matthews > > > > > > > > > > > --000000000000b75e5a05d4afd639 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
You can set the trim method to nothing w/o that patch. ker= n.cam.ada.X.delete_method=3DDISABLE
There's also a mechanism throug= h quirks that can disable known troublemakers, though to date we've onl= y disable those drives that cause data corruption with TRIMs.

<= /div>
FreeBSD makes heavy use of TRIMs at times to improve the info the= drive's firmware has on what's actually in use.
Some dri= ves love this and their write amplification is reduced (so performance and = longevity increased). Other drives,
as you've discovere= d, implement TRIMs in a sub-optimal manner. I've seen too many TRIMs ca= use drives to hang,
but it's never been as severe as you are = seeing (I have seen it on reboot once or twice, but my usual heartache
with TRIMs has been just really bad performance). Older versions of F= reeBSD didn't have TRIM collapsing in any
real sense, so woul= d send lots and lots of TRIMs to drives which often times would lead to poo= r performance. UFS
and ada have both been enhanced to reduce that= in newer versions of FreeBSD.

FreeBSD 11.3 is out= of support. You may have slightly better luck with a newer FreeBSD in gene= ral (there's
more TRIM management in newer FreeBSDs, especial= ly 13.0 and newer). There's a chance that the improvements
wi= ll help, but if the drives firmware is bad enough, disabling trim on the mo= dels of drives where there are problems
may be your only option.<= /div>

Warner

<= div dir=3D"ltr" class=3D"gmail_attr">On Mon, Jan 3, 2022 at 9:01 AM Lee MAT= THEWS via freebsd-drivers <freebsd-drivers@freebsd.org> wrote:
Hello,

I have been tasked with working on an intermittent problem that has been af= fecting some of our products. During in-house testing, we observe that occa= sionally our system crashes and reboots. After reboot, the system gets bloc= ked in the BIOS and it does not advance any further because the BIOS does n= ot detect the SSD disk. The crash appears to happen randomly during testing= , as it can take anything from several hours of continuous testing to a wee= k of running without stop for the crash to occur. The only way to get the s= ystem back up and running again is to power down and then up the machine, a= t which point the disk once again becomes visible to the BIOS.

Our products are based on FreeBSD version 11.3.

The crash appears to be linked to the type of disk that is being used. We h= ave one brand of disk that has never had a problem (even after a month of c= ontinuous testing) and others that, after testing, fail and after reboot ge= t blocked in the BIOS.


Logging via the serial console, I have observed the following crash :

--
[2021-12-10 20:56:10] ahcich1: Timeout on slot 5 port 0
[2021-12-10 20:56:10] ahcich1: is 00000000 cs 00000020 ss 00000000 rs 00000= 020 tfd c0 serr 00000000 cmd 0000c517
[2021-12-10 20:56:11] ahcich1: (ada0:ahcich1:0:0:0): DSM TRIM. ACB: 06 01 0= 0 00 00 40 00 00 00 00 01 00
[2021-12-10 20:56:11] AHCI reset...
[2021-12-10 20:56:11] (ada0:ahcich1:0:0:0): CAM status: Command timeout
[2021-12-10 20:56:11] ahcich1: (ada0:ahcich1:0:SATA connect time=3D100us st= atus=3D00000133
[2021-12-10 20:56:11] 0:0): Retrying command
[2021-12-10 20:56:11] ahcich1: AHCI reset: device found
[2021-12-10 20:56:42] ahcich1: AHCI reset: device not ready after 31000ms (= tfd =3D 00000080)
[2021-12-10 20:57:15] ahcich1: Timeout on slot 6 port 0
[2021-12-10 20:57:15] ahcich1: is 00000000 cs 00000040 ss 00000000 rs 00000= 040 tfd 80 serr 00000000 cmd 0000c617
[2021-12-10 20:57:15] ahcich1: (aprobe0:ahcich1:0:0:0): ATA_IDENTIFY. ACB: = ec 00 00 00 00 40 00 00 00 00 00 00
[2021-12-10 20:57:16] AHCI reset...
[2021-12-10 20:57:16] (aprobe0:ahcich1:0:0:0): CAM status: Command timeout<= br> [2021-12-10 20:57:16] ahcich1: (aprobe0:SATA connect time=3D100us status=3D= 00000133
[2021-12-10 20:57:16] ahcich1:0:ahcich1: AHCI reset: device found
[2021-12-10 20:57:16] 0:0): Retrying command
[2021-12-10 20:57:47] ahcich1: AHCI reset: device not ready after 31000ms (= tfd =3D 00000080)
[2021-12-10 20:58:20] ahcich1: Timeout on slot 7 port 0
[2021-12-10 20:58:20] ahcich1: is 00000000 cs 00000080 ss 00000000 rs 00000= 080 tfd 80 serr 00000000 cmd 0000c717
[2021-12-10 20:58:20] ahcich1: (aprobe0:ahcich1:0:0:0): ATA_IDENTIFY. ACB: = ec 00 00 00 00 40 00 00 00 00 00 00
[2021-12-10 20:58:21] AHCI reset...
[2021-12-10 20:58:21] (aprobe0:ahcich1:0:0:0): CAM status: Command timeout<= br> [2021-12-10 20:58:21] ahcich1: (aprobe0:ahcich1:0:SATA connect time=3D100us= status=3D00000133
[2021-12-10 20:58:21] 0:0): ahcich1: AHCI reset: device found
[2021-12-10 20:58:21] Error 5, Retries exhausted
[2021-12-10 20:58:52] ahcich1: AHCI reset: device not ready after 31000ms (= tfd =3D 00000080)
[2021-12-10 20:59:25] ahcich1: Timeout on slot 8 port 0
[2021-12-10 20:59:25] ahcich1: is 00000000 cs 00000100 ss 00000000 rs 00000= 100 tfd 80 serr 00000000 cmd 0000c817
[2021-12-10 20:59:25] ahcich1: (aprobe0:ahcich1:0:0:0): ATA_IDENTIFY. ACB: = ec 00 00 00 00 40 00 00 00 00 00 00
[2021-12-10 20:59:26] AHCI reset...
[2021-12-10 20:59:26] (aprobe0:ahcich1:0:0:0): CAM status: Command timeout<= br> [2021-12-10 20:59:26] ahcich1: (aprobe0:SATA connect time=3D100us status=3D= 00000133
[2021-12-10 20:59:26] ahcich1:0:ahcich1: 0:AHCI reset: device found
[2021-12-10 20:59:26] 0): Error 5, Retry was blocked
[2021-12-10 20:59:26] ada0 at ahcich1 bus 0 scbus0 target 0 lun 0
[2021-12-10 20:59:26] ada0: <WDC PC SA530 SDASB8Y256G 40103000> s/n 2= 00914802684 detached
[2021-12-10 20:59:26] g_vfs_done():ufs/log[WRITE(offset=3D5309202432, lengt= h=3D32768)]error =3D 6
[2021-12-10 20:59:26] pass0 at ahcich1 bus 0 scbus0 target 0 lun 0
[2021-12-10 20:59:26] g_vfs_done():pass0: <WDC PC SA530 SDASB8Y256G 4010= 3000>ufs/log[WRITE(offset=3D65027883008, length=3D16384)] s/n 2009148026= 84error =3D 6
[2021-12-10 20:59:26]=C2=A0 detached
[2021-12-10 20:59:26] g_vfs_done():(pass0:ahcich1:0:ufs/log[WRITE(offset=3D= 65124368384, length=3D32768)]0:error =3D 6
[2021-12-10 20:59:26] 0): g_vfs_done():Periph destroyed
[2021-12-10 20:59:26] ufs/log[WRITE(offset=3D65124368384, length=3D32768)]e= rror =3D 6
[2021-12-10 20:59:26] g_vfs_done():ufs/log[WRITE(offset=3D65126465536, leng= th=3D32768)]error =3D 6
[2021-12-10 20:59:26] g_vfs_done():ufs/log[WRITE(offset=3D65126465536, leng= th=3D32768)]error =3D 6
[2021-12-10 20:59:26] g_vfs_done():g_vfs_done():ufs/log[WRITE(offset=3D2470= 2976, length=3D512)]ufs/log[WRITE(offset=3D65126662144, length=3D16384)]err= or =3D 6
[2021-12-10 20:59:27] panic: cannot reassign paging buffer
[2021-12-10 20:59:27] cpuid =3D 0
[2021-12-10 20:59:27] __HardenedBSD_version =3D 1100056 __FreeBSD_version = =3D 1103500
[2021-12-10 20:59:27] version =3D NS-BSD 4.2.0.beta--HBSD #0 : Wed Dec=C2= =A0 1 14:00:48 CET 2021
[2021-12-10 20:59:27]=C2=A0 =C2=A0 =C2=A0build@BuildFreeBSD-11.3-hardened:/= home/build/build/kernel/work-OPTIM/sys/amd64/compile/NETASQ.XL.SMP.HW.RELEA= SE
[2021-12-10 20:59:27] Uptime: 2d0h46m1s
[2021-12-10 20:59:27] Dumping 2004 out of 8064 MB:..1%..11%..21%..31%..41%.= .51%..61%..71%..81%..91%
[2021-12-10 21:00:29] Dump complete
[2021-12-10 21:00:31] =C3=BEVersion 2.15.1236. Copyright (C) 2012 American = Megatrends, Inc.=C2=A0 =C2=A0 =C2=A0
--

It appears that after sending a SATA TRIM command, that the disk stops resp= onding. I've also observed another crash that happens just after a TRIM= command.

According to camcontrol, the disk supports the TRIM command :

--
camcontrol identify /dev/ada0
pass0: <WDC PC SA530 SDASB8Y256G 40103000> ACS-4 ATA SATA 3.x device<= br> pass0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)

protocol=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ATA/ATAPI-11 SATA = 3.x
device model=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 WDC PC SA530 SDASB8Y256G
firmware revision=C2=A0 =C2=A0 =C2=A040103000
serial number=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0200914802684
WWN=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0500= 1b448b13c08ca
cylinders=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A016383
heads=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A016
sectors/track=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A063
sector size=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0logical 512, physical 5= 12, offset 0
LBA supported=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0268435455 sectors
LBA48 supported=C2=A0 =C2=A0 =C2=A0 =C2=A0500118192 sectors
PIO supported=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0PIO4
DMA supported=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0WDMA2 UDMA6
media RPM=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0non-rotating
Zoned-Device Commands no

Feature=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 Support=C2=A0 Enabled=C2=A0 =C2=A0Value=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0Vendor
read ahead=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0yes=C2=A0 =C2=A0 =C2=A0 yes
write cache=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 yes=C2=A0 =C2=A0 =C2=A0 yes
flush cache=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 yes=C2=A0 =C2=A0 =C2=A0 yes
overlap=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 no
Tagged Command Queuing (TCQ)=C2=A0 =C2=A0no=C2=A0 =C2=A0 =C2=A0 =C2=A0no Native Command Queuing (NCQ)=C2=A0 =C2=A0yes=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 32 tags
NCQ Queue Management=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0no
NCQ Streaming=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= no
Receive & Send FPDMA Queued=C2=A0 =C2=A0 no
SMART=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 yes=C2=A0 =C2=A0 =C2=A0 yes
microcode download=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0yes=C2=A0= =C2=A0 =C2=A0 yes
security=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0yes=C2=A0 =C2=A0 =C2=A0 no
power management=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0yes= =C2=A0 =C2=A0 =C2=A0 yes
advanced power management=C2=A0 =C2=A0 =C2=A0 yes=C2=A0 =C2=A0 =C2=A0 yes= =C2=A0 =C2=A0 =C2=A0128/0x80
automatic acoustic management=C2=A0 no=C2=A0 =C2=A0 =C2=A0 =C2=A0no
media status notification=C2=A0 =C2=A0 =C2=A0 no=C2=A0 =C2=A0 =C2=A0 =C2=A0= no
power-up in Standby=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 no=C2=A0 =C2= =A0 =C2=A0 =C2=A0no
write-read-verify=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 no=C2=A0 = =C2=A0 =C2=A0 =C2=A0no
unload=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0no=C2=A0 =C2=A0 =C2=A0 =C2=A0no
general purpose logging=C2=A0 =C2=A0 =C2=A0 =C2=A0 yes=C2=A0 =C2=A0 =C2=A0 = yes
free-fall=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 no=C2=A0 =C2=A0 =C2=A0 =C2=A0no
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks=C2=A0 =C2=A0 =C2=A0 =C2=A0yes=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 8
DSM - deterministic read=C2=A0 =C2=A0 =C2=A0 =C2=A0yes=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 zeroed
Host Protected Area (HPA)=C2=A0 =C2=A0 =C2=A0 no
--


I found this patch concerning a similar problem : https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D222802

I've modified sys/cam/ata/ata_da.c to deactivate the trim functionality= .

--
+diff --git sys/cam/ata/ata_da.c sys/cam/ata/ata_da.c
+index cba52c5458be..ad256da0a495 100644
+--- sys/cam/ata/ata_da.c
++++ sys/cam/ata/ata_da.c
+@@ -1798,7 +1798,7 @@ adaregister(struct cam_periph *periph, void *arg) +=C2=A0 =C2=A0 =C2=A0 =C2=A0softc->disk->d_flags =3D DISKFLAG_DIRECT_= COMPLETION | DISKFLAG_CANZONE;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (softc->flags & ADA_FLAG_CAN_FLUSHCAC= HE)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0softc->disk->= d_flags |=3D DISKFLAG_CANFLUSHCACHE;
+-=C2=A0 =C2=A0 =C2=A0 if (softc->flags & ADA_FLAG_CAN_TRIM) {
++=C2=A0 =C2=A0 =C2=A0 if (0 /*softc->flags & ADA_FLAG_CAN_TRIM */) = {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0softc->disk->= d_flags |=3D DISKFLAG_CANDELETE;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0softc->disk->= d_delmaxsize =3D softc->params.secsize *
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0ATA_DSM_RANGE_MAX *
--

I've been running a set of tests on one of our products for over two we= eks with the trim deactivated (using the above patch) and there have been n= o crashes.

Could this issue be an internal disk firmware problem?

Is it possible that camcontrol reports that the disk supports TRIM yet in r= eality it isn't supported, or isn't supported fully?

I've observed that this crash doesn't happen on the first TRIM comm= and, is it possible that a set of SATA commands destabilize the disk firmwa= re and cause it to crash?

Is it possible that the timeout isn't long enough for the TRIM command?=

Thanks in advance for any help.

Best wishes,
Lee Matthews










--000000000000b75e5a05d4afd639--