Re: how to tell if TRIM is working

From: Warner Losh <imp_at_bsdimp.com>
Date: Thu, 02 May 2024 14:16:04 UTC
On Thu, May 2, 2024 at 6:30 AM mike tancsa <mike@sentex.net> wrote:

> On 5/1/2024 4:24 PM, Matthew Grooms wrote:
> > On 5/1/24 14:38, mike tancsa wrote:
> >> Kind of struggling to check if TRIM is actually working or not with
> >> my SSDs on RELENG_14 in ZFS.
> >>
> >> On a pool that has almost no files on it (capacity at 0% out of 3TB),
> >> should not
> >>
> >> zpool -w trim <pool> be almost instant after a couple of runs ?
> >> Instead it seems to always take about 10min to complete.
> >>
> >> Looking at the stats,
> >>
> >> kstat.zfs.tortank1.misc.iostats.trim_bytes_failed: 0
> >> kstat.zfs.tortank1.misc.iostats.trim_extents_failed: 0
> >> kstat.zfs.tortank1.misc.iostats.trim_bytes_skipped: 2743435264
> >> kstat.zfs.tortank1.misc.iostats.trim_extents_skipped: 253898
> >> kstat.zfs.tortank1.misc.iostats.trim_bytes_written: 14835526799360
> >> kstat.zfs.tortank1.misc.iostats.trim_extents_written: 1169158
> >>
> >> what and why are bytes being skipped ?
> >>
> >> One of the drives for example I had a hard time seeing evidence of
> >> this at the disk level while fiddling with TRIM recently. It appeared
> >> that at least some counters are driver and operation specific. For
> >> example, the da driver appears to update counters in some paths but
> >> not others. I assume that ada is different. There is a bug report for
> >> da, but haven't seen any feedback ...
> >
> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277673
> >
> > You could try to run gstat with the -d flag during the time period
> > when the delete operations are expected to occur. That should give you
> > an idea of what's happening at the disk level in real time but may not
> > offer more info than you're already seeing.
> >
>
> It *seems* to be doing something.  What I dont understand is why if I
> run it once, do nothing (no writes / snapshots etc), and then run trim
> again, it seems to be doing something with gstat even though there
> should not be anything to mark as being trimmed ?
>
> dT: 1.002s  w: 1.000s
>   L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps
> ms/d   %busy Name
>      0   1254      0      0    0.0    986   5202    2.0    244
> 8362733    4.5   55.6  ada0
>     12   1242      0      0    0.0   1012   5218    1.9    206
> 4972041    6.0   63.3  ada2
>     12   1242      0      0    0.0   1012   5218    1.9    206
> 4972041    6.0   63.3  ada2p1
>      0   4313      0      0    0.0   1024   5190    0.8   3266
> 6463815    0.4   62.8  ada3
>      0   1254      0      0    0.0    986   5202    2.0    244
> 8362733    4.5   55.6  ada0p1
>      0   4238      0      0    0.0    960   4874    0.7   3254
> 6280362    0.4   59.8  ada5
>      0   4313      0      0    0.0   1024   5190    0.8   3266
> 6463815    0.4   62.8  ada3p1
>      0   4238      0      0    0.0    960   4874    0.7   3254
> 6280362    0.4   59.8  ada5p1
> dT: 1.001s  w: 1.000s
>   L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps
> ms/d   %busy Name
>      2   2381      0      0    0.0   1580   9946    0.9    767
> 5990286    1.8   70.0  ada0
>      2   2801      0      0    0.0   1540   9782    0.9   1227
> 11936510    1.0   65.2  ada2
>      2   2801      0      0    0.0   1540   9782    0.9   1227
> 11936510    1.0   65.2  ada2p1
>      0   2072      0      0    0.0   1529   9566    0.8    509
> 12549587    2.1   57.0  ada3
>      2   2381      0      0    0.0   1580   9946    0.9    767
> 5990286    1.8   70.0  ada0p1
>      0   2042      0      0    0.0   1517   9427    0.6    491
> 12549535    1.9   52.4  ada5
>      0   2072      0      0    0.0   1529   9566    0.8    509
> 12549587    2.1   57.0  ada3p1
>      0   2042      0      0    0.0   1517   9427    0.6    491
> 12549535    1.9   52.4  ada5p1
> dT: 1.002s  w: 1.000s
>   L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps
> ms/d   %busy Name
>      2   1949      0      0    0.0   1094   5926    1.2    827
> 11267200    1.8   78.8  ada0
>      0   2083      0      0    0.0   1115   6034    0.7    939
> 16537981    1.4   67.2  ada2
>      0   2083      0      0    0.0   1115   6034    0.7    939
> 16537981    1.4   67.2  ada2p1
>      2   2525      0      0    0.0   1098   5914    0.8   1399
> 16021615    1.1   79.3  ada3
>      2   1949      0      0    0.0   1094   5926    1.2    827
> 11267200    1.8   78.8  ada0p1
>     12   2471      0      0    0.0   1018   5399    1.0   1425
> 15395566    1.1   80.5  ada5
>      2   2525      0      0    0.0   1098   5914    0.8   1399
> 16021615    1.1   79.3  ada3p1
>     12   2471      0      0    0.0   1018   5399    1.0   1425
> 15395566    1.1   80.5  ada5p1
>
> The ultimate problem is that after a while with a lot of writes, the
> disk performance will be toast until I do a manual trim -f of the disk
> :(   this is most notable on consumer WD SSDs.  I havent done any
> extensive tests with Samsung SSDs to see if there are performance
> penalties or not. It might be that they are just better at masking the
> problem.  I dont see the same issue with ZFS on Linux with the same
> disks / hardware
>

When trims are fast, you want to send them to the drive as soon as you
know the blocks are freed. UFS always does this (if trim is enabled at all).
ZFS has a lot of knobs to control when / how / if this is done.

vfs.zfs.vdev.trim_min_active: 1
vfs.zfs.vdev.trim_max_active: 2
vfs.zfs.trim.queue_limit: 10
vfs.zfs.trim.txg_batch: 32
vfs.zfs.trim.metaslab_skip: 0
vfs.zfs.trim.extent_bytes_min: 32768
vfs.zfs.trim.extent_bytes_max: 134217728
vfs.zfs.l2arc.trim_ahead: 0

I've not tried to tune these in the past, but you can see how they
affect things.


Warner



> I have an open PR in
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277992 that I think
> might actually have 2 separate problems.
>
>      ---Mike
>
>
> >> e.g. here was one disk in the pool that was taking a long time for
> >> each zpool trim
> >>
> >> # time trim -f /dev/ada1
> >> trim /dev/ada1 offset 0 length 1000204886016
> >> 0.000u 0.057s 1:29.33 0.0%      5+184k 0+0io 0pf+0w
> >> and then if I re-run it
> >> #  time trim -f /dev/ada1
> >> trim /dev/ada1 offset 0 length 1000204886016
> >> 0.000u 0.052s 0:04.15 1.2%      1+52k 0+0io 0pf+0w
> >>
> >> 90 seconds and then 4 seconds after that.
> >>
> >
> > -Matthew
> >
>
>