From nobody Mon Feb 19 17:00:33 2024 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Tdph74Rs4z5CC3F for ; Mon, 19 Feb 2024 17:00:39 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-lj1-x22b.google.com (mail-lj1-x22b.google.com [IPv6:2a00:1450:4864:20::22b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Tdph71cjKz4tpC for ; Mon, 19 Feb 2024 17:00:39 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-lj1-x22b.google.com with SMTP id 38308e7fff4ca-2d228a132acso30808661fa.0 for ; Mon, 19 Feb 2024 09:00:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1708362036; x=1708966836; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=0ek4PeJSFSbxIvl0QPEXJipULW56fvFuuS/M82ihHlw=; b=d7W6C4TQ6Y0ZeEogGuWwiLrdJDu3OMmfXyRBcidqmzTtbJwlzvPJ6V2jNNxq/bAvsU /01buCbominULUESfmkqFLHpYYlOXu4uWBHE1fCbgKQMKo3mvyNucc4WAHwaE+anHQJA 9BlDgCHotDSZT5H1/zNKLyRQ5bgxUqCzob2WlwO8dYt7FVYS6CQ0fuwzgfsy9W5+KbJ0 JyInbzU5bwjZ/GFwtHdWCAUbjE36wOXTWiRwjcj6qhwIIPMPSdd+y0RSqZuOwal8krUM R0ubx9WOUboCh0VRRBV6NqShCjf3AtRqx28di+xqE7FI4yXDMctj+HE31f+Jmlepgv5Q LArw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708362036; x=1708966836; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0ek4PeJSFSbxIvl0QPEXJipULW56fvFuuS/M82ihHlw=; b=EpZ7u2aky4Si8rC0pspaGY5HuYw01r8manYTE50Lb3OI+EAnLm6YyqcTqCfYlnrFQ/ FX/c327NO02Ew4DwEZ//2Kqd/S/lSkiCqWtSRIu0vOuLd4mWKcnCQZb9VaDP87RssXXE gxUXux6kipgskiXcnXbAxa5q121vDe2XOWhrhFxSKt+0MAcyz7hwDO3W9w9PR4ZFGnuN xvLCfl5LIOnKyojNJQ7CMr2VGMaPjUntvzO8bgHU6oIgVsuzghWLLoOBPAtLWIjsm81P faYDOkwPqcy8AlKnkLPMB2qucOeYC8CyKw8uOSvPoyBCIuzR+dBXLV7ts9+UYYILFMVt iLiA== X-Forwarded-Encrypted: i=1; AJvYcCUYC5PtHTtt7mgA84dhG+eHwXSsBa0p13asjhx598makVg3tmDZt9WR97R3isFcicYzN9jyFet1Xke73fIRQSblshQkZRd41S/4Wji4hjVgzA== X-Gm-Message-State: AOJu0Yw4YHb6hyVlY3DMoHX/iA98/TpCBS1EvLLCek1H0lH1rcrErdrb HV8sZcefqxfeFHM2/kTCf1E8wpwudPl7J40WCMCt75ms+bQHGkAVdlkgYfHpkN8o+V5JGoysT8L fB7IlhOYM2fRr+zVwHUUuG8aUWwiepbTza8recg== X-Google-Smtp-Source: AGHT+IFVGKK4guak3Y8xtSI6+4NDKEcb0uGrJFqdwW3thhk4FT5JT7w21g3c6QPwy87eWRj0voi2P5X3vZIN7G2V4pw= X-Received: by 2002:a2e:bc24:0:b0:2d2:427c:9ac5 with SMTP id b36-20020a2ebc24000000b002d2427c9ac5mr1563533ljf.37.1708362035995; Mon, 19 Feb 2024 09:00:35 -0800 (PST) List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 References: <6a73f7bf-dbe9-416b-a6d0-dded14285574@freebsd.org> In-Reply-To: <6a73f7bf-dbe9-416b-a6d0-dded14285574@freebsd.org> From: Warner Losh Date: Mon, 19 Feb 2024 10:00:33 -0700 Message-ID: Subject: Re: git: c01af41c3c8f - main - ata_da: add quirk to disable NCQ TRIM for Samsung 860/870 SSDs To: Andriy Gapon Cc: cglogic , src-committers , dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Content-Type: multipart/alternative; boundary="0000000000004db5760611bf087c" X-Spamd-Bar: ---- X-Rspamd-Queue-Id: 4Tdph71cjKz4tpC X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] --0000000000004db5760611bf087c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Feb 19, 2024 at 9:40=E2=80=AFAM Andriy Gapon wrot= e: > On 19/02/2024 18:06, cglogic wrote: > > Hello Warner, > > > > Thank you for detailed explanation. Looks like it has no sense for me t= o > worry > > about disabling NCQ TRIM on that old PC. > > > > However it would be great to have possibility to apply such hardware bu= g > > workarounds for affected controller/drive pairs only. > > Or, at least, have possibility to disable such quirks on per drive basi= s > via > > sysctl. I didn't find such a possibility. > > Yeah, we do not have a way to control ata_da (and scsi_da) quirks via > configuration. It would be nice to have it, indeed. > > Also, perhaps what ADA_Q_NCQ_TRIM_BROKEN could do is prefer a different > delete > method by default. That way, NCQ_DSM_TRIM option would not disappear > completely > and it would be possible to set with kern.cam.ada.N.delete_method sysctl. > Yea. In this case, though, the method is known to be bad. This badness might not be apparent until data is corrupted. Things seem to work. This is very dangerous to allow, even with a I know what I'm doing flag. We could do it, but I'm very uneasy given the data corruption that causes us to disable it in the first place... Given what I read https://bugzilla.kernel.org/show_bug.cgi?id=3D203475 we want NCQ Trim disabled entirely for these drives. Sometimes they work, until they don't, then people report similar errors to what Andriy reported. Warner > > On Monday, February 19th, 2024 at 5:18 PM, Warner Losh > wrote: > >> > >> > >> On Mon, Feb 19, 2024, 4:29=E2=80=AFAM cglogic >> > wrote: > >> > >> On Monday, February 19th, 2024 at 1:03 PM, Andriy Gapon > > >> wrote: > >> > >> > On 19/02/2024 12:47, cglogic wrote: > >> > > >> > > Hello Andriy, > >> > > > >> > > I use ZFS with autotrim enabled on Samsung 860 PRO connected t= o > Intel > >> AHCI SATA controller for 9 years without any issue. > >> > > >> > > >> > I think that it was released in 2018? > >> > >> You are right, it's 9 years old computer, SSD was added later. > >> > >> > > >> > > Can I disable this quirk locally or have I revert the patch to > >> continue use NCQ TRIM with this SSD? > >> > > >> > > >> > I don't think that there is a way to disable the quirk, so you'l= l > have > >> to revert. > >> > Note that ATA TRIM is still enabled, it's only NCQ TRIM that got > disabled. > >> > >> From my understanding ATA TRIM is non-queued TRIM as opposed to NC= Q > TRIM. > >> With only ATA TRIM enabled, drive will have increased latency when > large > >> amount of files deleted. > >> > >> However I'm not sure how ZFS handles TRIMs, maybe it groups severa= l > TRIMs > >> and then sends when drive is idle. > >> Another question is how this affects TRIM enabled swap partition. > >> > >> In general, if I'm correct here, disabling NCQ TRIM should not be > an issue > >> for desktop usage, except for large git repository updates, etc. > >> > >> > >> In general, it won't matter a ton. Unless you are seeing stalls becaus= e > of > >> read/write I/O waiting behind trims that have to go to the drive one a= t > a > >> time, the effects are in the noise. If you are seeing that, though, th= e > >> effects can be quite helpful in terms of latency... assuming the drive > is > >> cooperative. The benefits of it vary a lot from drive to drive as well= . > Some > >> early drives still serialize the trims, so the latency benefit is not > so > >> great. Some enterprise drives do trims instantly and keep some interna= l > state > >> that is helped a lot by them. But if NCQ trims are unstable, we likely > should > >> err on the side of disabling. > >> > >> The errors that Andriy was seeing typically are cable or controller > issues. In > >> his case, it was a controller issue (apparently, and surprisingly, > paired with > >> a specific type of drive). Linux has a specific workaround for the > pair, which > >> is unusual. But there's two issues at play. > >> > >> The first issue is that these drives should have NCQ Trim turned off, > >> according to Linux's commit log. This may be firmware revision based o= r > may > >> just be some people are luckier with newer firmware (it's hard to say > from the > >> mixed reports, but they read more like luck than version for the > sampling I > >> did). So Andriy's change is good. The second issue, and one I was > confused on > >> earlier, is that Linux goes a step further and disables NCQ entirely > for these > >> old controllers for 860 and 870 drives. This isn't too surprising. We > have a > >> workaround for PMP for these controllers already.... One can disable > NCQ > >> entirely by setting the tags to 0... Given the specificity of the > problem, and > >> the age of the controllers, I'm inclined to not implement the Linux > workaround > >> for this specific pair. > > > -- > Andriy Gapon > > --0000000000004db5760611bf087c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Mon, Feb 19, 2024 at 9:40=E2=80=AF= AM Andriy Gapon <avg@freebsd.org&= gt; wrote:
On 19= /02/2024 18:06, cglogic wrote:
> Hello Warner,
>
> Thank you for detailed explanation. Looks like it has no sense for me = to worry
> about disabling NCQ TRIM on that old PC.
>
> However it would be great to have possibility to apply such hardware b= ug
> workarounds for affected controller/drive pairs only.
> Or, at least, have possibility to disable such quirks on per drive bas= is via
> sysctl. I didn't find such a possibility.

Yeah, we do not have a way to control ata_da (and scsi_da)=C2=A0 quirks via=
configuration.=C2=A0 It would be nice to have it, indeed.

Also, perhaps what ADA_Q_NCQ_TRIM_BROKEN could do is prefer a different del= ete
method by default.=C2=A0 That way, NCQ_DSM_TRIM option would not disappear = completely
and it would be possible to set with kern.cam.ada.N.delete_method sysctl.

Yea. In this case, though, the method is= known to be bad. This badness might not be apparent
until data i= s corrupted. Things seem to work. This is very dangerous to allow, even wit= h a
I know what I'm doing flag. We could do it, but I'm v= ery uneasy given the data corruption
that causes us to disable it= in the first place...

disabl= ed entirely for these drives. Sometimes they work, until they don't, th= en people report
similar errors to what Andriy reported.

Warner
=C2=A0
> On Monday, February 19th, 2024 at 5:18 PM, Warner Losh <imp@bsdimp.com> wrote:
>>
>>
>> On Mon, Feb 19, 2024, 4:29=E2=80=AFAM cglogic <cglogic@protonmail.com
>> <mailto:cglogic@protonmail.com>> wrote:
>>
>>=C2=A0 =C2=A0 =C2=A0On Monday, February 19th, 2024 at 1:03 PM, Andr= iy Gapon <avg@FreeBSD.org>
>>=C2=A0 =C2=A0 =C2=A0wrote:
>>
>>=C2=A0 =C2=A0 =C2=A0> On 19/02/2024 12:47, cglogic wrote:
>>=C2=A0 =C2=A0 =C2=A0>
>>=C2=A0 =C2=A0 =C2=A0> > Hello Andriy,
>>=C2=A0 =C2=A0 =C2=A0> >
>>=C2=A0 =C2=A0 =C2=A0> > I use ZFS with autotrim enabled on Sa= msung 860 PRO connected to Intel
>>=C2=A0 =C2=A0 =C2=A0AHCI SATA controller for 9 years without any is= sue.
>>=C2=A0 =C2=A0 =C2=A0>
>>=C2=A0 =C2=A0 =C2=A0>
>>=C2=A0 =C2=A0 =C2=A0> I think that it was released in 2018?
>>
>>=C2=A0 =C2=A0 =C2=A0You are right, it's 9 years old computer, S= SD was added later.
>>
>>=C2=A0 =C2=A0 =C2=A0>
>>=C2=A0 =C2=A0 =C2=A0> > Can I disable this quirk locally or h= ave I revert the patch to
>>=C2=A0 =C2=A0 =C2=A0continue use NCQ TRIM with this SSD?
>>=C2=A0 =C2=A0 =C2=A0>
>>=C2=A0 =C2=A0 =C2=A0>
>>=C2=A0 =C2=A0 =C2=A0> I don't think that there is a way to d= isable the quirk, so you'll have
>>=C2=A0 =C2=A0 =C2=A0to revert.
>>=C2=A0 =C2=A0 =C2=A0> Note that ATA TRIM is still enabled, it= 9;s only NCQ TRIM that got disabled.
>>
>>=C2=A0 =C2=A0 =C2=A0From my understanding ATA TRIM is non-queued TR= IM as opposed to NCQ TRIM.
>>=C2=A0 =C2=A0 =C2=A0With only ATA TRIM enabled, drive will have inc= reased latency when large
>>=C2=A0 =C2=A0 =C2=A0amount of files deleted.
>>
>>=C2=A0 =C2=A0 =C2=A0However I'm not sure how ZFS handles TRIMs,= maybe it groups several TRIMs
>>=C2=A0 =C2=A0 =C2=A0and then sends when drive is idle.
>>=C2=A0 =C2=A0 =C2=A0Another question is how this affects TRIM enabl= ed swap partition.
>>
>>=C2=A0 =C2=A0 =C2=A0In general, if I'm correct here, disabling = NCQ TRIM should not be an issue
>>=C2=A0 =C2=A0 =C2=A0for desktop usage, except for large git reposit= ory updates, etc.
>>
>>
>> In general, it won't matter a ton. Unless you are seeing stall= s because of
>> read/write I/O waiting behind trims that have to go to the drive o= ne at a
>> time, the effects are in the noise. If you are seeing that, though= , the
>> effects can be quite helpful in terms of latency... assuming the d= rive is
>> cooperative. The benefits of it vary a lot from drive to drive as = well. Some
>> early drives still serialize the trims, so the latency benefit is = not so
>> great. Some enterprise drives do trims instantly and keep some int= ernal state
>> that is helped a lot by them. But if NCQ trims are unstable, we li= kely should
>> err on the side of disabling.
>>
>> The errors that Andriy was seeing typically are cable or controlle= r issues. In
>> his case, it was a controller issue (apparently, and surprisingly,= paired with
>> a specific type of drive). Linux has a specific workaround for the= pair, which
>> is unusual. But there's two issues at play.
>>
>> The first issue is that these drives should have NCQ Trim turned o= ff,
>> according to Linux's commit log. This may be firmware revision= based or may
>> just be some people are luckier with newer firmware (it's hard= to say from the
>> mixed reports, but they read more like luck than version for the s= ampling I
>> did). So Andriy's change is good. The second issue, and one I = was confused on
>> earlier, is that Linux goes a step further and disables NCQ entire= ly for these
>> old controllers for 860 and 870 drives. This isn't too surpris= ing. We have a
>> workaround for PMP for these controllers already.... One can disab= le NCQ
>> entirely by setting the tags to 0... Given the specificity of the = problem, and
>> the age of the controllers, I'm inclined to not implement the = Linux workaround
>> for this specific pair.


--
Andriy Gapon

--0000000000004db5760611bf087c--