Re: zfs with operations like rm -rf takes a very long time recently

From: Warner Losh <imp_at_bsdimp.com>
Date: Sun, 16 Oct 2022 16:51:00 UTC
On Sun, Oct 16, 2022 at 10:41 AM void <void@f-m.fm> wrote:

> On Sun, 16 Oct 2022, at 16:07, Alan Somers wrote:
> > Gstat is showing that your disk is fully busy.  It's also showing read
> > latency as high as 128 ms, which is extremely slow.
>
> I ran another gstat with -I 5s and have omitted -d as
> per your last email. How did you calculate latency?
>
> There's zfs-stats installed too. Would pasting some output here be useful?
>
> # gstat -opC -I 5s
>
> timestamp,name,q-depth,total_ops/s,read/s,read-KiB/s,ms/read,write/s,write-KiB/s,ms/write,other/s,ms/other,%busy
> 2022-10-16 17:23:47.080118413,da0,3,0,0,0,0.0,0,0,0.0,0,0.0,0.0
> 2022-10-16
> 17:23:52.088745392,da0,4,93,45,246,26.4,49,3974,12.7,0,154.4,97.8
> 2022-10-16
> 17:23:57.167503039,da0,3,82,41,164,25.5,40,3193,17.3,1,153.2,99.2
> 2022-10-16
> 17:24:02.176759152,da0,3,81,43,176,22.8,38,2704,13.3,0,252.6,98.6
> 2022-10-16
> 17:24:07.189364538,da0,2,90,35,140,27.9,55,3371,15.9,1,146.9,99.4
> 2022-10-16
> 17:24:12.190874083,da0,3,95,42,170,23.3,52,3486,13.2,0,167.0,96.6
> 2022-10-16
> 17:24:17.212769876,da0,4,85,33,134,30.6,52,3508,22.3,1,164.9,100.7
> 2022-10-16
> 17:24:22.286986167,da0,3,89,42,168,23.2,46,3149,12.7,1,127.3,97.2
> 2022-10-16
> 17:24:27.295992137,da0,3,88,40,160,25.4,48,3978,15.6,0,147.4,101.3
> 2022-10-16
> 17:24:32.406662934,da0,2,90,46,254,27.1,44,3554,12.9,0,159.3,97.9
> 2022-10-16
> 17:24:37.426033077,da0,3,87,42,177,23.3,45,2835,15.4,1,112.8,98.0
> 2022-10-16
> 17:24:42.455203661,da0,3,95,43,170,23.1,52,3770,11.9,0,153.5,98.8
> 2022-10-16
> 17:24:47.468741642,da0,3,71,36,144,27.9,35,2510,19.7,1,243.7,99.3
> 2022-10-16
> 17:24:52.471973107,da0,3,80,34,138,29.1,46,3630,17.6,0,266.6,99.2
> 2022-10-16
> 17:24:57.490425268,da0,3,79,35,141,28.1,43,3372,18.6,1,297.8,98.3
> 2022-10-16
> 17:25:02.589730547,da0,4,68,30,123,32.7,36,2318,22.4,3,105.2,97.9
> ^C
>

Your disk is doing ~90 IOPs. ~60 read ~30 write with an average queue depth
of
4, and a 95+% busy factor (which is kinda misleading, but in this case not
so much).

So a crapton of traffic to the disk is making the disk slow.  Why that's
happening, I'm unsure,
but your disk is slow because it's fairly loaded. I'm not sure what the
'other operations'
are that aren't read, write or trim. At a guess, I'd say that's BIO_FLUSH
slowing things
down a bit (but not a huge amount).


> > I suspect a
> > problem with your disk.  FYI, ZFS naturally has a 5-second rhythm
> > (unless you changed vfs.zfs.txg.timeout),
>
> # sysctl vfs.zfs.txg.timeout
> vfs.zfs.txg.timeout: 5
>
> > so gstat's output is
> > sometimes more consistent if you use "-I 5s".  You can also omit "-d"
> > for magnetic HDDs, since they don't have anything like TRIM.
>
> > I suggest checking dmesg to see if there are any messages about errors
> > from da0.  It would also be worth running "smartctl -a /dev/da0", from
> > sysutils/smartmontools.
>
> Unfortunately, no errors seen in either dmesg or smartctl. there's been a
> long smartctl test relatively recently, also came back clean.
>
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining
> LifeTime(hours)  LBA_of_first_error
> # 1  Short offline       Completed without error       00%     39253
>    -
> # 2  Extended offline    Completed without error       00%     39176
>    -
> # 3  Short offline       Completed without error       00%     39172
>    -
>
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
> WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always
>      -       0
>   2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline
>     -       0
>   3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always
>      -       2595
>   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always
>      -       1487
>   5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always
>      -       0
>   7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always
>      -       0
>   8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline
>     -       0
>   9 Power_On_Hours          0x0032   002   002   000    Old_age   Always
>      -       39254
>  10 Spin_Retry_Count        0x0033   128   100   030    Pre-fail  Always
>      -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always
>      -       1263
> 191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always
>      -       4
> 192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always
>      -       426
> 193 Load_Cycle_Count        0x0032   029   029   000    Old_age   Always
>      -       710616
> 194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always
>      -       54 (Min/Max 16/65)
> 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always
>      -       0
> 197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always
>      -       0
> 198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline
>     -       0
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always
>      -       0
> 220 Disk_Shift              0x0002   100   100   000    Old_age   Always
>      -       0
> 222 Loaded_Hours            0x0032   075   075   000    Old_age   Always
>      -       10367
> 223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always
>      -       0
> 224 Load_Friction           0x0022   100   100   000    Old_age   Always
>      -       0
> 226 Load-in_Time            0x0026   100   100   000    Old_age   Always
>      -       260
> 240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline
>     -       0
>
> SMART Error Log Version: 1
> No Errors Logged
>

The only thing that looks high here is the load cycle count and maybe the
power off retract count. With
40k power on hours, that's a power off every 90 hours? That seems a little
weird. And the load cycle
is almost 20 times the power on hours, suggesting it's going on every few
minutes. That's higher than
I'd expect as well. The power cycle count is also high for the number of
hours. Your disk is pushing 4 years
old too, which may mean its past expected life. So this profile suggests
possible signs of fatigue.

Warner