Re: zfs with operations like rm -rf takes a very long time recently

In reply to: Mark Millard : "RE: zfs with operations like rm -rf takes a very long time recently"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Mark Millard <marklmi_at_yahoo.com>
Date: Sun, 16 Oct 2022 16:27:58 UTC
On 2022-Oct-16, at 08:42, Mark Millard <marklmi@yahoo.com> wrote:

> void <void_at_f-m.fm> wrote on
> Date: Sun, 16 Oct 2022 13:57:04 UTC :
> 
>> Has anything recently changed in -current that would make file operations 
>> on zfs such as rm -rf *.* very slow?
>> 
>> What would I look for and how would I test it?
>> 
>> system is FreeBSD 14.0-CURRENT #5 main-n258595-226e41467ee1 on arm64.aarch64
>> using GENERIC-NODEBUG kernel.
>> 
>> the zfs is zroot on usb3 on a raspberry pi4 8GB. there appears to be plenty 
>> of resources. cpu speed is 2.1GHz. zroot is external usb3 hd.
>> 
>> Right now it's rm -rf-ing /var/cache/ccache/* which is 5GB max and it's taken 
>> over 10 mins. It was never this slow. No errors in /var/log/messages and none 
>> yet in smartd. zpool scrub last ran successfully 3 days ago.
>> 
>> last pid:  4324;  load averages:  0.17,  0.10,  0.12                                                      up 0+02:10:34  14:40:55
>> 77 processes:  1 running, 76 sleeping
>> CPU:  1.6% user,  0.0% nice,  1.9% system,  0.2% interrupt, 96.4% idle
>> Mem: 550M Active, 803M Inact, 2224M Wired, 40K Buf, 4239M Free
>> ARC: 1293M Total, 381M MFU, 725M MRU, 1124K Anon, 30M Header, 156M Other
>>     938M Compressed, 1906M Uncompressed, 2.03:1 Ratio
>> Swap: 16G Total, 16G Free
>> Process id to show (+ for all): 
>>  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
>> 3871 root          1  20    0    12M  3648K zio->i   2   0:10   0.39% rm
>>  353 _pflogd       1  20    0    13M  2108K bpf      1   0:00   0.00% pflogd
>> 1441 mailnull      1  28    0    25M  9508K select   3   0:01   0.00% exim
> 
> 
> "The Design and Implementation of the FreeBSD Operating System"
> 2nd Ed. says about ZFS (page 548):
> 
> "Like all non-overwriting filesystems, ZFS operates best
> when at least a quarter of its disk pool is free. Write
> throughout becomes poor when the pool gets too full. By
> contrast, UFS can run well to 95 percent full and acceptably
> to 99 percent full."
> 
> And page 549 says:
> 
> "ZFS was designed to manage and operate enormous filesystems
> easily, which it does well. Its design assumed that it would
> have many fast 64-bit CPUs with large amounts of memory to
> support these enormous filesystems. When these resources are
> available, it works really well. However, it is not designed
> for or well suited to run on resource-constrained system using
> 32-bit CPUs with less than 8 Gbyte of memory and one small,
> nearly-full disk, which is typical of many embedded systems."
> 
> (Note the full-disk part and 8 GiByte being at the low end of
> the RAM size range.)
> 
> Page 523 says:
> 
> "ZFS takes advantage of the abundant processor power available
> with current multi-core CPUs. Because they are much faster than
> storage, ZFS can afford to checksum everything."
> 
> The book is not explicit about RAM subsystem performance
> tradeoffs for ZFS. One property of the RPi4B's is that they
> have very small RAM caches and one core can saturate the
> memory subsystem if the RAM caches are being fairly
> ineffective overall. In such contexts, multi-core need not
> cut the time things take. (But I've no clue how likely such
> conditions would be for your context.) A cache-busting
> access pattern over much more than 1 MiByte memory range
> drops the RPi4B performance greatly compared to such an
> access pattern fitting in a 1 MiByte or smaller range --no
> matter if it is 1 core or more cores that is/are trying to
> be active.
> 
> 
> Independent of all that, something like:
> 
> # gstat -spod
> 
> would likely be interesting to monitor at during a time-taking
> "rm -fr" .
> 
> I do not know if the "rm -fr" is deleting a lot of files that
> also have unchanged content in a snapshot. Such files are not
> actually deleted. The information about where the file should
> be visible is adjusted instead, leaving the snapshot copy(s)
> available for access. Such adds to the disk space usage by
> writing more metadata without deleting the snapshot related
> data.
> 

Your:

Filesystem                          Size    Used   Avail Capacity  Mounted on
zroot/ROOT/default                  863G    146G    717G    17%    /

indicates that it should not be too full.

You published some "gstat -dopC" data.It looks like generally
under, say, 65 reads/sec with generally under, day, 300 KiBytes/sec
resulting is enough to keep the %busy suggestive figure at over 90,
no writes at the time. It looks like where there is notably less
read activity, there is write activity and/or "other" activity
keeping the %busy figure generally 90+. Sure looks I/O bound.

Page 547, mid first paragraph:

"The result of the sequential writing is that a file can end up
requiring many random access when it is later read. ZFS mitigates
the reading cost by dedicating enough memory to the ARC to be
able to keep all actively accessed files resident. ZFS also
attempts to prefetch data when files are being read."

It looks like you are suffering the random accesses from an
ineffective mitigation and have storage media for which the, say,
seek times (and more), is leading to generally under, say, 65
reads/sec (absent writes/other).

(It is harder to say as much about the writes&others activities based
on the limited data and the mixing of read/write/other.)

===
Mark Millard
marklmi at yahoo.com