Re: periodic daily takes a very long time to run (14-stable)

From: Ronald Klop <ronald-lists_at_klop.ws>
Date: Fri, 27 Oct 2023 13:34:35 UTC
Van: void <void@f-m.fm>
Datum: vrijdag, 27 oktober 2023 14:51
Aan: freebsd-stable@freebsd.org
Onderwerp: Re: periodic daily takes a very long time to run (14-stable)
> 
> On Fri, Oct 27, 2023 at 01:45:24PM +0200, Ronald Klop wrote:
> 
> >Can you run "gstat" or "iostat -x -d 1" to see how busy your disk is? >And how much bandwidth is uses.
> 
> when periodic is *not* running, gstat numbers fluctuate between whats indicated below and all zeros:
> 
> dT: 1.011s  w: 1.000s
>   L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>      0     69      0      0    0.0     67    732    0.6   24.3| da0
>      0      0      0      0    0.0      0      0    0.0    0.0| da0p1
>      0      0      0      0    0.0      0      0    0.0    0.0| da0p2
>      0     69      0      0    0.0     67    732    0.6   24.3| da0p3
>      0      0      0      0    0.0      0      0    0.0    0.0| da0p2.eli
>      0     69      0      0    0.0     67    732    0.8   24.4| da0p3.eli
> 
> # iostat -x -d 1
>         extended device statistics  
> device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b  
> da0           15      18    189.4    565.1    47     7   144    27    0  23 pass0          0       0      0.0      0.0    54     0   154    79    0   0         extended device statistics  
> device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b  
> da0            3      70    172.6    698.0    26     1   129     5    0  33 pass0          0       0      0.0      0.0     0     0     0     0    0   0         extended device statistics  
> device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b  
> da0            0       0      0.0      0.0     0     0     0     0    0   0 pass0          0       0      0.0      0.0     0     0     0     0    0   0
> (then all 0s repeatedly)
> 
> When periodic *is* running:
> 
> dT: 1.047s  w: 1.000s
>   L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>      2     89     34    138   88.1     54    783   33.9   95.0| da0
>      0      0      0      0    0.0      0      0    0.0    0.0| da0p1
>      0      0      0      0    0.0      0      0    0.0    0.0| da0p2
>      2     89     34    138   88.1     54    783   34.0   95.0| da0p3
>      0      0      0      0    0.0      0      0    0.0    0.0| da0p2.eli
>      2     90     35    141   86.3     54    783   34.2   96.9| da0p3.eli
> 
> the kBps number varies wildly between 100 and 1200. %busy is 92-102%
> 
> # iostat -x -d 1
>                          extended device statistics  
> device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b  
> da0           15      18    189.6    564.1    47     7   144    27   10  23 pass0          0       0      0.0      0.0    54     0   154    79    0   0
> Numbers vary wildly here, too. I wonder if theres a switch for iostat that lets it
> run for say 1000 seconds then produces an average of the values gathered.
> 
> I'll look for one
> -- 
>  
> 
> 
> 


What stands out to me is that you do quite a lot of writes on the disk. (I might be mistaken.)
The max. number of IOPS for HDD is around 80 for consumer grade harddisks. I think this counts for USB connected disks.
https://en.wikipedia.org/wiki/IOPS#Mechanical_hard_drives
From the stats you posted it looks like you are almost always doing 50+ writes/second already. That does not leave much IOPS for the find process.

ZFS tries to bundle the writes every 5 seconds to leave room for the reads. See "sysctl vfs.zfs.txg.timeout". Unless it has too much data to write or a sync request comes in.
#device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b
$ iostat -x  -w 1 da1 | grep da1
da1            6      19     94.0    525.0    15     2   123     8    1  12
da1            0       4      0.0     15.2     0     1   122    41    0  23
da1            0       0      0.0      0.0     0     0     0     0    0   0
da1            0       2      0.0    738.7     0     4    61    23    0   7
da1            0       0      0.0      0.0     0     0     0     0    0   0
da1            0      98      0.0   2850.9     0     1     0     1    1   7
da1            0       4      0.0     16.0     0     1   139    47    0  28
da1            0       0      0.0      0.0     0     0     0     0    0   0
da1            0       0      0.0      0.0     0     0     0     0    0   0
da1            0       0      0.0      0.0     0     0     0     0    0   0
da1            0      80      0.0   2361.3     0     1     0     1    1   6
da1            0       4      0.0     15.3     0     1   125    42    0  24
Every 5th row is a spike in writes on my RPI4. "zpool iostat 1" should give a similar pattern.

So if my observation is right it might be interesting to find out what is writing. I had similar issues after the number of jails on my RPI4 increased and they all were doing a little bit of writing which accumulated in quite a lot of writing. My solution was to add an SSD.

Regards,
Ronald.