[zfs] recordsize: unexpected increase of disk usage when increasing it
Date: Tue, 18 Jan 2022 13:56:46 UTC
TLDR: I rsync-ed the same data twice: once with 128K recordsize and once with 1M, and the allocated size on disk is ~3% bigger with 1M. Why not smaller ? Hello, I would like some help to understand how the disk usage evolves when I change the recordsize. I've read several articles/presentations/forums about recordsize in ZFS, and if I try to summarize, I mainly understood that: - recordsize is the "maximum" size of "objects" (so "logical blocks") that zfs will create for both -data & metadata, then each object is compressed and allocated to one vdev, splitted into smaller (ashift size) "physical" blocks and written on disks - increasing recordsize is usually good when storing large files that are not modified, because it limits the nb of metadata objects (block-pointers), which has a positive effect on performance - decreasing recordsize is useful for "databases-like" workloads (ie: small random writes inside existing objects), because it avoids write amplification (read-modify-write a large object for a small update) Today, I'm trying to observe the effect of increasing recordsize for *my* data (because I'm also considering defining special_small_blocks & using SSDs as "special", but not tested nor discussed here, just recordsize). So, I'm doing some benchmarks on my "documents" dataset (details in "notes" below), but the results are really strange to me. When I rsync the same data to a freshly-recreated zpool: A) with recordsize=128K : 226G allocated on disk B) with recordsize=1M : 232G allocated on disk => bigger than 128K ?!? I would clearly expect the other way around, because bigger recordsize generates less metadata so smaller disk usage, and there shouldn't be any overhead because 1M is just a maximum and not a forced size to allocate for every object. I don't mind the increased usage (I can live with a few GB more), but I would like to understand why it happens. I tried to give all the details of my tests below. Did I do something wrong ? Can you explain the increase ? Thanks ! =============================================== A) 128K ========== # zpool destroy bench # zpool create -o ashift=12 bench /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 # rsync -av --exclude '.zfs' /mnt/tank/docs-florent/ /bench [...] sent 241,042,476,154 bytes received 353,838 bytes 81,806,492.45 bytes/sec total size is 240,982,439,038 speedup is 1.00 # zfs get recordsize bench NAME PROPERTY VALUE SOURCE bench recordsize 128K default # zpool list -v bench NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT bench 2.72T 226G 2.50T - - 0% 8% 1.00x ONLINE - gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 2.72T 226G 2.50T - - 0% 8.10% - ONLINE # zfs list bench NAME USED AVAIL REFER MOUNTPOINT bench 226G 2.41T 226G /bench # zfs get all bench |egrep "(used|referenced|written)" bench used 226G - bench referenced 226G - bench usedbysnapshots 0B - bench usedbydataset 226G - bench usedbychildren 1.80M - bench usedbyrefreservation 0B - bench written 226G - bench logicalused 226G - bench logicalreferenced 226G - # zdb -Lbbbs bench > zpool-bench-rcd128K.zdb =============================================== B) 1M ========== # zpool destroy bench # zpool create -o ashift=12 bench /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 # zfs set recordsize=1M bench # rsync -av --exclude '.zfs' /mnt/tank/docs-florent/ /bench [...] sent 241,042,476,154 bytes received 353,830 bytes 80,173,899.88 bytes/sec total size is 240,982,439,038 speedup is 1.00 # zfs get recordsize bench NAME PROPERTY VALUE SOURCE bench recordsize 1M local # zpool list -v bench NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT bench 2.72T 232G 2.49T - - 0% 8% 1.00x ONLINE - gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 2.72T 232G 2.49T - - 0% 8.32% - ONLINE # zfs list bench NAME USED AVAIL REFER MOUNTPOINT bench 232G 2.41T 232G /bench # zfs get all bench |egrep "(used|referenced|written)" bench used 232G - bench referenced 232G - bench usedbysnapshots 0B - bench usedbydataset 232G - bench usedbychildren 1.96M - bench usedbyrefreservation 0B - bench written 232G - bench logicalused 232G - bench logicalreferenced 232G - # zdb -Lbbbs bench > zpool-bench-rcd1M.zdb =============================================== Notes: ========== - the source dataset contains ~50% of pictures (raw files and jpg), and also some music, various archived documents, zip, videos - no change on the source dataset while testing (cf size logged by resync) - I repeated the tests twice (128K, then 1M, then 128K, then 1M), and same results - probably not important here, but: /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 is a Red 3TB CMR (WD30EFRX), and /mnt/tank/docs-florent/ is a 128K-recordsize dataset on another zpool that I never tweaked except ashit=12 (because using the same model of Red 3TB) # zfs --version zfs-2.0.6-1 zfs-kmod-v2021120100-zfs_a8c7652 # uname -a FreeBSD xxxxxxxxx 12.2-RELEASE-p11 FreeBSD 12.2-RELEASE-p11 75566f060d4(HEAD) TRUENAS amd64