From nobody Tue Jan 18 14:29:54 2022 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id B9FC8195A784 for ; Tue, 18 Jan 2022 14:30:05 +0000 (UTC) (envelope-from rincebrain@gmail.com) Received: from mail-ua1-x935.google.com (mail-ua1-x935.google.com [IPv6:2607:f8b0:4864:20::935]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JdWQ46b5Fz4h1F; Tue, 18 Jan 2022 14:30:04 +0000 (UTC) (envelope-from rincebrain@gmail.com) Received: by mail-ua1-x935.google.com with SMTP id x33so36971836uad.12; Tue, 18 Jan 2022 06:30:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=zfTxVRw40nL/03eq9LoGY1BACyMnyaqH2uzIf9qwluI=; b=bvmwSqSqs5q7jH9d8fJzPhi1ufMubr0CZaMGSwJEDtZ12+ifTixcxhdQXPCixeodCk 1/SSuzJOITV2ni/KjObmA+aAbPUuti8kdP1B0fdfGwgnVR2Jrts9ZagyrdDkiEpKh+63 tmMf9F1OiqD7cHy9E8p+XndYEOFfI0sBXyMQdB48GCIEbfUTXE8aenn18U4/O4NbeNLa RdkXPXC3Oiqgff7Oh99kiZcTWgFPZ18HuijoKCzbBVrfR2xG4HPBAQ3Jw1kX2rCTvA7V 3P/SPC6nFCQk8nhg7+zka/64nBjP6XuR7NtFZgWxe570XKj27B2DUUXBqpd7OUVXbbY4 oI1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zfTxVRw40nL/03eq9LoGY1BACyMnyaqH2uzIf9qwluI=; b=y/8dxYYw+5jmLoVX14zlgt1/OVW15/qwpgHLi7zEssmmsAQcpsXlO4QESgsjNaD7gU RPaFHCCBqUtvdaHHRP7tuCsGRbZGkTzBsqUc8EGn/rwWzXmr1RbrF79FCisJaj7TZkGi 8QLpFGgKV0QnvSY2kzwjS3ZkuYGQ+m5GfGOAiV8Endzk/AukYwr/7ujHabpz9X08Lxzv JpdRXKpZlzYkYhyNqVxdF7n6rnpZLxl1V/Tle5R46KsUM3Yi+L7cicrtDGVB8jpHttSC mWc1gMFC7ljhrvY1GM1s+989VOhD/rQCPv124NxQ34pSaJrExRdYJ4pw47GK5P1nARNi LYuQ== X-Gm-Message-State: AOAM530CKAMx3ZMROSUZlZMk9tnl3DK1R+xmbLenj4lbjy8Ma+mpObU1 IbLEWWhSe4GDhOBJSRyO2HI6O6LWNTR08Ad2GvNjE+OwnE0= X-Google-Smtp-Source: ABdhPJxN6v4G/NjiVfQuvr2Xuw+NBzQCSQO3e7JMors3LhLio+fpgOpG6gfMICpZtE8eM/OdoprrEcV8CtXU2D/EnNk= X-Received: by 2002:a05:6102:3a74:: with SMTP id bf20mr5272479vsb.31.1642516202763; Tue, 18 Jan 2022 06:30:02 -0800 (PST) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Rich Date: Tue, 18 Jan 2022 09:29:54 -0500 Message-ID: Subject: Re: [zfs] recordsize: unexpected increase of disk usage when increasing it To: Alan Somers Cc: Florent Rivoire , freebsd-fs Content-Type: multipart/alternative; boundary="000000000000cdb85605d5dc1a76" X-Rspamd-Queue-Id: 4JdWQ46b5Fz4h1F X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=bvmwSqSq; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of rincebrain@gmail.com designates 2607:f8b0:4864:20::935 as permitted sender) smtp.mailfrom=rincebrain@gmail.com X-Spamd-Result: default: False [-4.00 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[gmail.com]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; NEURAL_HAM_LONG(-1.00)[-1.000]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::935:from]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; MLMMJ_DEST(0.00)[freebsd-fs]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] X-ThisMailContainsUnwantedMimeParts: N --000000000000cdb85605d5dc1a76 Content-Type: text/plain; charset="UTF-8" Really? I didn't know it would still trim the tails on files with compression off. ... size 1179648 parent 34 links 1 pflags 40800000004 Indirect blocks: 0 L1 DVA[0]=<3:c02b96c000:1000> DVA[1]=<3:c810733000:1000> [L1 ZFS plain file] skein lz4 unencrypted LE contiguous unique double size=20000L/1000P birth=35675472L/35675472P fill=2 cksum=5cfba24b351a09aa:8bd9dfef87c5b625:906ed5c3252943db:bed77ce51ad540d4 0 L0 DVA[0]=<2:a0827db4000:100000> [L0 ZFS plain file] skein uncompressed unencrypted LE contiguous unique single size=100000L/100000P birth=35675472L/35675472P fill=1 cksum=95b06edf60e5f54c:af6f6950775d0863:8fc28b0783fcd9d3:2e44676e48a59360 100000 L0 DVA[0]=<2:a0827eb4000:100000> [L0 ZFS plain file] skein uncompressed unencrypted LE contiguous unique single size=100000L/100000P birth=35675472L/35675472P fill=1 cksum=62a1f05769528648:8197c8a05ca9f1fb:a750c690124dd2e0:390bddc4314cd4c3 It seems not? - Rich On Tue, Jan 18, 2022 at 9:23 AM Alan Somers wrote: > On Tue, Jan 18, 2022 at 7:13 AM Rich wrote: > > > > Compression would have made your life better here, and possibly also > made it clearer what's going on. > > > > All records in a file are going to be the same size pre-compression - so > if you set the recordsize to 1M and save a 131.1M file, it's going to take > up 132M on disk before compression/raidz overhead/whatnot. > > Not true. ZFS will trim the file's tails even without compression enabled. > > > > > Usually compression saves you from the tail padding actually requiring > allocation on disk, which is one reason I encourage everyone to at least > use lz4 (or, if you absolutely cannot for some reason, I guess zle should > also work for this one case...) > > > > But I would say it's probably the sum of last record padding across the > whole dataset, if you don't have compression on. > > > > - Rich > > > > On Tue, Jan 18, 2022 at 8:57 AM Florent Rivoire > wrote: > >> > >> TLDR: I rsync-ed the same data twice: once with 128K recordsize and > >> once with 1M, and the allocated size on disk is ~3% bigger with 1M. > >> Why not smaller ? > >> > >> > >> Hello, > >> > >> I would like some help to understand how the disk usage evolves when I > >> change the recordsize. > >> > >> I've read several articles/presentations/forums about recordsize in > >> ZFS, and if I try to summarize, I mainly understood that: > >> - recordsize is the "maximum" size of "objects" (so "logical blocks") > >> that zfs will create for both -data & metadata, then each object is > >> compressed and allocated to one vdev, splitted into smaller (ashift > >> size) "physical" blocks and written on disks > >> - increasing recordsize is usually good when storing large files that > >> are not modified, because it limits the nb of metadata objects > >> (block-pointers), which has a positive effect on performance > >> - decreasing recordsize is useful for "databases-like" workloads (ie: > >> small random writes inside existing objects), because it avoids write > >> amplification (read-modify-write a large object for a small update) > >> > >> Today, I'm trying to observe the effect of increasing recordsize for > >> *my* data (because I'm also considering defining special_small_blocks > >> & using SSDs as "special", but not tested nor discussed here, just > >> recordsize). > >> So, I'm doing some benchmarks on my "documents" dataset (details in > >> "notes" below), but the results are really strange to me. > >> > >> When I rsync the same data to a freshly-recreated zpool: > >> A) with recordsize=128K : 226G allocated on disk > >> B) with recordsize=1M : 232G allocated on disk => bigger than 128K ?!? > >> > >> I would clearly expect the other way around, because bigger recordsize > >> generates less metadata so smaller disk usage, and there shouldn't be > >> any overhead because 1M is just a maximum and not a forced size to > >> allocate for every object. > > A common misconception. The 1M recordsize applies to every newly > created object, and every object must use the same size for all of its > records (except possibly the last one). But objects created before > you changed the recsize will retain their old recsize, file tails have > a flexible recsize. > > >> I don't mind the increased usage (I can live with a few GB more), but > >> I would like to understand why it happens. > > You might be seeing the effects of sparsity. ZFS is smart enough not > to store file holes (and if any kind of compression is enabled, it > will find long runs of zeroes and turn them into holes). If your data > contains any holes that are >= 128 kB but < 1MB, then they can be > stored as holes with a 128 kB recsize but must be stored as long runs > of zeros with a 1MB recsize. > > However, I would suggest that you don't bother. With a 128kB recsize, > ZFS has something like a 1000:1 ratio of data:metadata. In other > words, increasing your recsize can save you at most 0.1% of disk > space. Basically, it doesn't matter. What it _does_ matter for is > the tradeoff between write amplification and RAM usage. 1000:1 is > comparable to the disk:ram of many computers. And performance is more > sensitive to metadata access times than data access times. So > increasing your recsize can help you keep a greater fraction of your > metadata in ARC. OTOH, as you remarked increasing your recsize will > also increase write amplification. > > So to summarize: > * Adjust compression settings to save disk space. > * Adjust recsize to save RAM. > > -Alan > > >> > >> I tried to give all the details of my tests below. > >> Did I do something wrong ? Can you explain the increase ? > >> > >> Thanks ! > >> > >> > >> > >> =============================================== > >> A) 128K > >> ========== > >> > >> # zpool destroy bench > >> # zpool create -o ashift=12 bench > >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 > >> > >> # rsync -av --exclude '.zfs' /mnt/tank/docs-florent/ /bench > >> [...] > >> sent 241,042,476,154 bytes received 353,838 bytes 81,806,492.45 > bytes/sec > >> total size is 240,982,439,038 speedup is 1.00 > >> > >> # zfs get recordsize bench > >> NAME PROPERTY VALUE SOURCE > >> bench recordsize 128K default > >> > >> # zpool list -v bench > >> NAME SIZE ALLOC FREE > >> CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT > >> bench 2.72T 226G 2.50T > >> - - 0% 8% 1.00x ONLINE - > >> gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 2.72T 226G 2.50T > >> - - 0% 8.10% - ONLINE > >> > >> # zfs list bench > >> NAME USED AVAIL REFER MOUNTPOINT > >> bench 226G 2.41T 226G /bench > >> > >> # zfs get all bench |egrep "(used|referenced|written)" > >> bench used 226G - > >> bench referenced 226G - > >> bench usedbysnapshots 0B - > >> bench usedbydataset 226G - > >> bench usedbychildren 1.80M - > >> bench usedbyrefreservation 0B - > >> bench written 226G - > >> bench logicalused 226G - > >> bench logicalreferenced 226G - > >> > >> # zdb -Lbbbs bench > zpool-bench-rcd128K.zdb > >> > >> > >> > >> =============================================== > >> B) 1M > >> ========== > >> > >> # zpool destroy bench > >> # zpool create -o ashift=12 bench > >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 > >> # zfs set recordsize=1M bench > >> > >> # rsync -av --exclude '.zfs' /mnt/tank/docs-florent/ /bench > >> [...] > >> sent 241,042,476,154 bytes received 353,830 bytes 80,173,899.88 > bytes/sec > >> total size is 240,982,439,038 speedup is 1.00 > >> > >> # zfs get recordsize bench > >> NAME PROPERTY VALUE SOURCE > >> bench recordsize 1M local > >> > >> # zpool list -v bench > >> NAME SIZE ALLOC FREE > >> CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT > >> bench 2.72T 232G 2.49T > >> - - 0% 8% 1.00x ONLINE - > >> gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 2.72T 232G 2.49T > >> - - 0% 8.32% - ONLINE > >> > >> # zfs list bench > >> NAME USED AVAIL REFER MOUNTPOINT > >> bench 232G 2.41T 232G /bench > >> > >> # zfs get all bench |egrep "(used|referenced|written)" > >> bench used 232G - > >> bench referenced 232G - > >> bench usedbysnapshots 0B - > >> bench usedbydataset 232G - > >> bench usedbychildren 1.96M - > >> bench usedbyrefreservation 0B - > >> bench written 232G - > >> bench logicalused 232G - > >> bench logicalreferenced 232G - > >> > >> # zdb -Lbbbs bench > zpool-bench-rcd1M.zdb > >> > >> > >> > >> =============================================== > >> Notes: > >> ========== > >> > >> - the source dataset contains ~50% of pictures (raw files and jpg), > >> and also some music, various archived documents, zip, videos > >> - no change on the source dataset while testing (cf size logged by > resync) > >> - I repeated the tests twice (128K, then 1M, then 128K, then 1M), and > >> same results > >> - probably not important here, but: > >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 is a Red 3TB CMR > >> (WD30EFRX), and /mnt/tank/docs-florent/ is a 128K-recordsize dataset > >> on another zpool that I never tweaked except ashit=12 (because using > >> the same model of Red 3TB) > >> > >> # zfs --version > >> zfs-2.0.6-1 > >> zfs-kmod-v2021120100-zfs_a8c7652 > >> > >> # uname -a > >> FreeBSD xxxxxxxxx 12.2-RELEASE-p11 FreeBSD 12.2-RELEASE-p11 > >> 75566f060d4(HEAD) TRUENAS amd64 > --000000000000cdb85605d5dc1a76 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Really? I didn't know it would still trim the tails on= files with compression off.

...

=C2=A0 =C2=A0 =C2=A0 =C2=A0 size =C2=A0 =C2=A01179648
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 parent =C2=A034
=C2=A0 =C2=A0 =C2=A0 =C2=A0 links =C2=A0 1=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 pflags =C2=A040800000004
Indirect blocks= :
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 L1 =C2=A0DVA[= 0]=3D<3:c02b96c000:1000> DVA[1]=3D<3:c810733000:1000> [L1 ZFS p= lain file] skein lz4 unencrypted LE contiguous unique double size=3D20000L/= 1000P birth=3D35675472L/35675472P fill=3D2 cksum=3D5cfba24b351a09aa:8bd9dfe= f87c5b625:906ed5c3252943db:bed77ce51ad540d4
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0L0 DVA[0]=3D<2:a0827db4000:100000>= [L0 ZFS plain file] skein uncompressed unencrypted LE contiguous unique si= ngle size=3D100000L/100000P birth=3D35675472L/35675472P fill=3D1 cksum=3D95= b06edf60e5f54c:af6f6950775d0863:8fc28b0783fcd9d3:2e44676e48a59360
=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 100000 =C2=A0L0 DVA[0]=3D<2:a0827eb4000:100= 000> [L0 ZFS plain file] skein uncompressed unencrypted LE contiguous un= ique single size=3D100000L/100000P birth=3D35675472L/35675472P fill=3D1 cks= um=3D62a1f05769528648:8197c8a05ca9f1fb:a750c690124dd2e0:390bddc4314cd4c3

It seems not?

- Rich


On Tue, Jan 18, 2022 at 9:23 AM Alan Somers <asomers@freebsd.org> wrote:
On Tue, Jan 18, 2022 at = 7:13 AM Rich <= rincebrain@gmail.com> wrote:
>
> Compression would have made your life better here, and possibly also m= ade it clearer what's going on.
>
> All records in a file are going to be the same size pre-compression - = so if you set the recordsize to 1M and save a 131.1M file, it's going t= o take up 132M on disk before compression/raidz overhead/whatnot.

Not true.=C2=A0 ZFS will trim the file's tails even without compression= enabled.

>
> Usually compression saves you from the tail padding actually requiring= allocation on disk, which is one reason I encourage everyone to at least u= se lz4 (or, if you absolutely cannot for some reason, I guess zle should al= so work for this one case...)
>
> But I would say it's probably the sum of last record padding acros= s the whole dataset, if you don't have compression on.
>
> - Rich
>
> On Tue, Jan 18, 2022 at 8:57 AM Florent Rivoire <florent@rivoire.fr> wrote:
>>
>> TLDR: I rsync-ed the same data twice: once with 128K recordsize an= d
>> once with 1M, and the allocated size on disk is ~3% bigger with 1M= .
>> Why not smaller ?
>>
>>
>> Hello,
>>
>> I would like some help to understand how the disk usage evolves wh= en I
>> change the recordsize.
>>
>> I've read several articles/presentations/forums about recordsi= ze in
>> ZFS, and if I try to summarize, I mainly understood that:
>> - recordsize is the "maximum" size of "objects"= ; (so "logical blocks")
>> that zfs will create for both=C2=A0 -data & metadata, then eac= h object is
>> compressed and allocated to one vdev, splitted into smaller (ashif= t
>> size) "physical" blocks and written on disks
>> - increasing recordsize is usually good when storing large files t= hat
>> are not modified, because it limits the nb of metadata objects
>> (block-pointers), which has a positive effect on performance
>> - decreasing recordsize is useful for "databases-like" w= orkloads (ie:
>> small random writes inside existing objects), because it avoids wr= ite
>> amplification (read-modify-write a large object for a small update= )
>>
>> Today, I'm trying to observe the effect of increasing recordsi= ze for
>> *my* data (because I'm also considering defining special_small= _blocks
>> & using SSDs as "special", but not tested nor discus= sed here, just
>> recordsize).
>> So, I'm doing some benchmarks on my "documents" data= set (details in
>> "notes" below), but the results are really strange to me= .
>>
>> When I rsync the same data to a freshly-recreated zpool:
>> A) with recordsize=3D128K : 226G allocated on disk
>> B) with recordsize=3D1M : 232G allocated on disk =3D> bigger th= an 128K ?!?
>>
>> I would clearly expect the other way around, because bigger record= size
>> generates less metadata so smaller disk usage, and there shouldn&#= 39;t be
>> any overhead because 1M is just a maximum and not a forced size to=
>> allocate for every object.

A common misconception.=C2=A0 The 1M recordsize applies to every newly
created object, and every object must use the same size for all of its
records (except possibly the last one).=C2=A0 But objects created before you changed the recsize will retain their old recsize, file tails have
a flexible recsize.

>> I don't mind the increased usage (I can live with a few GB mor= e), but
>> I would like to understand why it happens.

You might be seeing the effects of sparsity.=C2=A0 ZFS is smart enough not<= br> to store file holes (and if any kind of compression is enabled, it
will find long runs of zeroes and turn them into holes).=C2=A0 If your data=
contains any holes that are >=3D 128 kB but < 1MB, then they can be stored as holes with a 128 kB recsize but must be stored as long runs
of zeros with a 1MB recsize.

However, I would suggest that you don't bother.=C2=A0 With a 128kB recs= ize,
ZFS has something like a 1000:1 ratio of data:metadata.=C2=A0 In other
words, increasing your recsize can save you at most 0.1% of disk
space.=C2=A0 Basically, it doesn't matter.=C2=A0 What it _does_ matter = for is
the tradeoff between write amplification and RAM usage.=C2=A0 1000:1 is
comparable to the disk:ram of many computers.=C2=A0 And performance is more=
sensitive to metadata access times than data access times.=C2=A0 So
increasing your recsize can help you keep a greater fraction of your
metadata in ARC.=C2=A0 OTOH, as you remarked increasing your recsize will also increase write amplification.

So to summarize:
* Adjust compression settings to save disk space.
* Adjust recsize to save RAM.

-Alan

>>
>> I tried to give all the details of my tests below.
>> Did I do something wrong ? Can you explain the increase ?
>>
>> Thanks !
>>
>>
>>
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>> A) 128K
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>
>> # zpool destroy bench
>> # zpool create -o ashift=3D12 bench
>> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4
>>
>> # rsync -av --exclude '.zfs' /mnt/tank/docs-florent/ /benc= h
>> [...]
>> sent 241,042,476,154 bytes=C2=A0 received 353,838 bytes=C2=A0 81,8= 06,492.45 bytes/sec
>> total size is 240,982,439,038=C2=A0 speedup is 1.00
>>
>> # zfs get recordsize bench
>> NAME=C2=A0 =C2=A0PROPERTY=C2=A0 =C2=A0 VALUE=C2=A0 =C2=A0 SOURCE >> bench=C2=A0 recordsize=C2=A0 128K=C2=A0 =C2=A0 =C2=A0default
>>
>> # zpool list -v bench
>> NAME=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0SIZE=C2=A0 ALLOC=C2=A0 =C2=A0FREE
>> CKPOINT=C2=A0 EXPANDSZ=C2=A0 =C2=A0FRAG=C2=A0 =C2=A0 CAP=C2=A0 DED= UP=C2=A0 =C2=A0 HEALTH=C2=A0 ALTROOT
>> bench=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A02.72T=C2=A0 =C2=A0226G=C2=A0 2.50T
>>=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2= =A00%=C2=A0 =C2=A0 =C2=A08%=C2=A0 1.00x=C2=A0 =C2=A0 ONLINE=C2=A0 -
>>=C2=A0 =C2=A0gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4=C2=A0 2.72T= =C2=A0 =C2=A0226G=C2=A0 2.50T
>>=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2= =A00%=C2=A0 8.10%=C2=A0 =C2=A0 =C2=A0 -=C2=A0 =C2=A0 ONLINE
>>
>> # zfs list bench
>> NAME=C2=A0 =C2=A0 USED=C2=A0 AVAIL=C2=A0 =C2=A0 =C2=A0REFER=C2=A0 = MOUNTPOINT
>> bench=C2=A0 =C2=A0226G=C2=A0 2.41T=C2=A0 =C2=A0 =C2=A0 226G=C2=A0 = /bench
>>
>> # zfs get all bench |egrep "(used|referenced|written)" >> bench=C2=A0 used=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0-
>> bench=C2=A0 referenced=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 22= 6G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0- >> bench=C2=A0 usedbysnapshots=C2=A0 =C2=A0 =C2=A0 =C2=A00B=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-
>> bench=C2=A0 usedbydataset=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0226G=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-
>> bench=C2=A0 usedbychildren=C2=A0 =C2=A0 =C2=A0 =C2=A0 1.80M=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -
>> bench=C2=A0 usedbyrefreservation=C2=A0 0B=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-
>> bench=C2=A0 written=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0-
>> bench=C2=A0 logicalused=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A022= 6G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0- >> bench=C2=A0 logicalreferenced=C2=A0 =C2=A0 =C2=A0226G=C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-
>>
>> # zdb -Lbbbs bench > zpool-bench-rcd128K.zdb
>>
>>
>>
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>> B) 1M
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>
>> # zpool destroy bench
>> # zpool create -o ashift=3D12 bench
>> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4
>> # zfs set recordsize=3D1M bench
>>
>> # rsync -av --exclude '.zfs' /mnt/tank/docs-florent/ /benc= h
>> [...]
>> sent 241,042,476,154 bytes=C2=A0 received 353,830 bytes=C2=A0 80,1= 73,899.88 bytes/sec
>> total size is 240,982,439,038=C2=A0 speedup is 1.00
>>
>> # zfs get recordsize bench
>> NAME=C2=A0 =C2=A0PROPERTY=C2=A0 =C2=A0 VALUE=C2=A0 =C2=A0 SOURCE >> bench=C2=A0 recordsize=C2=A0 1M=C2=A0 =C2=A0 =C2=A0 =C2=A0local >>
>> # zpool list -v bench
>> NAME=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0SIZE=C2=A0 ALLOC=C2=A0 =C2=A0FREE
>> CKPOINT=C2=A0 EXPANDSZ=C2=A0 =C2=A0FRAG=C2=A0 =C2=A0 CAP=C2=A0 DED= UP=C2=A0 =C2=A0 HEALTH=C2=A0 ALTROOT
>> bench=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A02.72T=C2=A0 =C2=A0232G=C2=A0 2.49T
>>=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2= =A00%=C2=A0 =C2=A0 =C2=A08%=C2=A0 1.00x=C2=A0 =C2=A0 ONLINE=C2=A0 -
>>=C2=A0 =C2=A0gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4=C2=A0 2.72T= =C2=A0 =C2=A0232G=C2=A0 2.49T
>>=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2= =A00%=C2=A0 8.32%=C2=A0 =C2=A0 =C2=A0 -=C2=A0 =C2=A0 ONLINE
>>
>> # zfs list bench
>> NAME=C2=A0 =C2=A0 USED=C2=A0 AVAIL=C2=A0 =C2=A0 =C2=A0REFER=C2=A0 = MOUNTPOINT
>> bench=C2=A0 =C2=A0232G=C2=A0 2.41T=C2=A0 =C2=A0 =C2=A0 232G=C2=A0 = /bench
>>
>> # zfs get all bench |egrep "(used|referenced|written)" >> bench=C2=A0 used=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0-
>> bench=C2=A0 referenced=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 23= 2G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0- >> bench=C2=A0 usedbysnapshots=C2=A0 =C2=A0 =C2=A0 =C2=A00B=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-
>> bench=C2=A0 usedbydataset=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0232G=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-
>> bench=C2=A0 usedbychildren=C2=A0 =C2=A0 =C2=A0 =C2=A0 1.96M=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -
>> bench=C2=A0 usedbyrefreservation=C2=A0 0B=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-
>> bench=C2=A0 written=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0-
>> bench=C2=A0 logicalused=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A023= 2G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0- >> bench=C2=A0 logicalreferenced=C2=A0 =C2=A0 =C2=A0232G=C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-
>>
>> # zdb -Lbbbs bench > zpool-bench-rcd1M.zdb
>>
>>
>>
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
>> Notes:
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>
>> - the source dataset contains ~50% of pictures (raw files and jpg)= ,
>> and also some music, various archived documents, zip, videos
>> - no change on the source dataset while testing (cf size logged by= resync)
>> - I repeated the tests twice (128K, then 1M, then 128K, then 1M), = and
>> same results
>> - probably not important here, but:
>> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 is a Red 3TB CMR >> (WD30EFRX), and /mnt/tank/docs-florent/ is a 128K-recordsize datas= et
>> on another zpool that I never tweaked except ashit=3D12 (because u= sing
>> the same model of Red 3TB)
>>
>> # zfs --version
>> zfs-2.0.6-1
>> zfs-kmod-v2021120100-zfs_a8c7652
>>
>> # uname -a
>> FreeBSD xxxxxxxxx 12.2-RELEASE-p11 FreeBSD 12.2-RELEASE-p11
>> 75566f060d4(HEAD) TRUENAS=C2=A0 amd64
--000000000000cdb85605d5dc1a76--