From nobody Tue Jan 18 15:33:03 2022 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id E5F4A1965601 for ; Tue, 18 Jan 2022 15:33:12 +0000 (UTC) (envelope-from rincebrain@gmail.com) Received: from mail-ua1-x92b.google.com (mail-ua1-x92b.google.com [IPv6:2607:f8b0:4864:20::92b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JdXpw2jVWz3Q7S; Tue, 18 Jan 2022 15:33:12 +0000 (UTC) (envelope-from rincebrain@gmail.com) Received: by mail-ua1-x92b.google.com with SMTP id c36so37371806uae.13; Tue, 18 Jan 2022 07:33:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=PNrhQyahErxAcdaKh7341aBxAU81dyA/cG0XU6SSNO8=; b=jYtMsjrFCZK2RfK0RFrrEWn3TJaDdF0LS+SG2hH4gMze6s4uI3bKaxQ9wXXZHRNUq5 V5tnBuksMCfzMnpleOa185RSlnYtEp/Flsw4xdryXgYlqljNxieUWjqu1vu7Y2EAikkT web37yBxDbtj3Ptfzyve05dv14TFGMQWn6r+TaxT4OnBO3y1cT6cjx4kxmVw0TwCbqsg fMSZliIzEEIU1xhza08icP+u/j/SInKV0CWexjWNpSe2I52gF2NlY1e7vx1RqTgr2b1c MNGBWgTEHw21A9AyqIsSmPyqKkw0qWxXWB+29dxTeW166hXkZm4fovZplrhfXB+mq1aY 3hmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PNrhQyahErxAcdaKh7341aBxAU81dyA/cG0XU6SSNO8=; b=6bWm9qyMYBtL8rEJ8QQMPK/Kwyk4bxYNKKnjaEI0zXFb8p3EfzW+ZdcWlpagE9++t9 zHQdyE2U8V6Y1QjemohgL4nodCLmVv91167KJI9nT33347UgnhB0A0TYjuKtT9pkk4FX IXu5LgTU7IqAPMngKUiadP+uSWFEktm8f4aD5PO1nW6VgQDxOFrO1xj+l7pfPEFmPu/E +5tH3XcFGCD39+ulWeDIWwZsYtAlKHp1gyXXjCqui/gbQMThHEbFnUWvh2aYjY86JcSR tSDkAZYA6TVSwbE2iVB2vMxmVxYwTPXUjvg0/gIw3eQYQdx7Mg3r3J+P2cFET+7o57MJ HvjA== X-Gm-Message-State: AOAM532xL0806Vx5uSiGghgRjRreoKiaChFZgupZN1mkDXqt9SOHmAsH GLIc9cG4rFcyRHLJblQZ83rrBUcVQ1sP3itmp50= X-Google-Smtp-Source: ABdhPJyniVx8Tb19aJJ1AQwg7RQ2vQkrYMiQFMb06DYoSKLtu3/1wO4Av5/5mzMgdG4qiQ0iNs79yI3wLIF0R4a8LK0= X-Received: by 2002:ab0:d97:: with SMTP id i23mr9905744uak.79.1642519991757; Tue, 18 Jan 2022 07:33:11 -0800 (PST) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Rich Date: Tue, 18 Jan 2022 10:33:03 -0500 Message-ID: Subject: Re: [zfs] recordsize: unexpected increase of disk usage when increasing it To: alan somers Cc: Alan Somers , Florent Rivoire , freebsd-fs Content-Type: multipart/alternative; boundary="000000000000a5311a05d5dcfcb3" X-Rspamd-Queue-Id: 4JdXpw2jVWz3Q7S X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=jYtMsjrF; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of rincebrain@gmail.com designates 2607:f8b0:4864:20::92b as permitted sender) smtp.mailfrom=rincebrain@gmail.com X-Spamd-Result: default: False [-4.00 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::92b:from]; MLMMJ_DEST(0.00)[freebsd-fs]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] X-ThisMailContainsUnwantedMimeParts: N --000000000000a5311a05d5dcfcb3 Content-Type: text/plain; charset="UTF-8" Yeah, that's consistent with my understanding of the behavior - one record gets packed, as soon as you hit recordsize all subsequent records are (logically, at least) recordsize, and then compression saves you, or doesn't. - Rich On Tue, Jan 18, 2022 at 10:29 AM alan somers wrote: > I think the difference is in whether the file is < 1 record or >= 1 > record. It looks like the first record is variably-sized but after > that it's like you say, with compression off it rounds up. > > On Tue, Jan 18, 2022 at 8:07 AM Rich wrote: > > > > Nope. I just retried it on my FBSD 13-RELEASE VM, too: > > # uname -a > > FreeBSD fbsd13rel 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Aug 24 > 07:33:27 UTC 2021 root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC > amd64 > > # zpool version > > zfs-2.1.99-683_ga967e54c2 > > zfs-kmod-2.1.99-683_ga967e54c2 > > # zpool get all | grep 'feature@' | grep disabled > > buildpool feature@edonr disabled > local > > # dd if=/dev/urandom of=/buildpool/testme/2 bs=1179648 count=1 > > 1+0 records in > > 1+0 records out > > 1179648 bytes transferred in 0.009827 secs (120041885 bytes/sec) > > # du -sh /buildpool/testme/2 > > 2.0M /buildpool/testme/2 > > # zfs get all buildpool/testme | grep -v default > > NAME PROPERTY VALUE SOURCE > > buildpool/testme type filesystem - > > buildpool/testme creation Tue Jan 18 4:46 2022 - > > buildpool/testme used 4.03M - > > buildpool/testme available 277G - > > buildpool/testme referenced 4.03M - > > buildpool/testme compressratio 1.00x - > > buildpool/testme mounted yes - > > buildpool/testme recordsize 1M local > > buildpool/testme compression off local > > buildpool/testme atime off inherited > from buildpool > > buildpool/testme createtxg 15030 - > > buildpool/testme version 5 - > > buildpool/testme utf8only off - > > buildpool/testme normalization none - > > buildpool/testme casesensitivity sensitive - > > buildpool/testme guid 11057815587819738755 - > > buildpool/testme usedbysnapshots 0B - > > buildpool/testme usedbydataset 4.03M - > > buildpool/testme usedbychildren 0B - > > buildpool/testme usedbyrefreservation 0B - > > buildpool/testme objsetid 280 - > > buildpool/testme refcompressratio 1.00x - > > buildpool/testme written 4.03M - > > buildpool/testme logicalused 4.01M - > > buildpool/testme logicalreferenced 4.01M - > > > > What version are you running? > > > > - Rich > > > > On Tue, Jan 18, 2022 at 10:00 AM Alan Somers > wrote: > >> > >> That's not what I get. Is your pool formatted using a very old > >> version or something? > >> > >> somers@fbsd-head /u/h/somers [1]> > >> dd if=/dev/random bs=1179648 of=/testpool/food/t/richfile count=1 > >> 1+0 records in > >> 1+0 records out > >> 1179648 bytes transferred in 0.003782 secs (311906705 bytes/sec) > >> somers@fbsd-head /u/h/somers> du -sh /testpool/food/t/richfile > >> 1.1M /testpool/food/t/richfile > >> > >> On Tue, Jan 18, 2022 at 7:51 AM Rich wrote: > >> > > >> > 2.1M /workspace/test1M/1 > >> > > >> > - Rich > >> > > >> > On Tue, Jan 18, 2022 at 9:47 AM Alan Somers > wrote: > >> >> > >> >> Yeah, it does. Just check "du -sh ". zdb there is showing > >> >> you the logical size of the record, but it isn't showing how many > disk > >> >> blocks are actually allocated. > >> >> > >> >> On Tue, Jan 18, 2022 at 7:30 AM Rich wrote: > >> >> > > >> >> > Really? I didn't know it would still trim the tails on files with > compression off. > >> >> > > >> >> > ... > >> >> > > >> >> > size 1179648 > >> >> > parent 34 > >> >> > links 1 > >> >> > pflags 40800000004 > >> >> > Indirect blocks: > >> >> > 0 L1 DVA[0]=<3:c02b96c000:1000> > DVA[1]=<3:c810733000:1000> [L1 ZFS plain file] skein lz4 unencrypted LE > contiguous unique double size=20000L/1000P birth=35675472L/35675472P fill=2 > cksum=5cfba24b351a09aa:8bd9dfef87c5b625:906ed5c3252943db:bed77ce51ad540d4 > >> >> > 0 L0 DVA[0]=<2:a0827db4000:100000> [L0 ZFS plain > file] skein uncompressed unencrypted LE contiguous unique single > size=100000L/100000P birth=35675472L/35675472P fill=1 > cksum=95b06edf60e5f54c:af6f6950775d0863:8fc28b0783fcd9d3:2e44676e48a59360 > >> >> > 100000 L0 DVA[0]=<2:a0827eb4000:100000> [L0 ZFS plain > file] skein uncompressed unencrypted LE contiguous unique single > size=100000L/100000P birth=35675472L/35675472P fill=1 > cksum=62a1f05769528648:8197c8a05ca9f1fb:a750c690124dd2e0:390bddc4314cd4c3 > >> >> > > >> >> > It seems not? > >> >> > > >> >> > - Rich > >> >> > > >> >> > > >> >> > On Tue, Jan 18, 2022 at 9:23 AM Alan Somers > wrote: > >> >> >> > >> >> >> On Tue, Jan 18, 2022 at 7:13 AM Rich > wrote: > >> >> >> > > >> >> >> > Compression would have made your life better here, and possibly > also made it clearer what's going on. > >> >> >> > > >> >> >> > All records in a file are going to be the same size > pre-compression - so if you set the recordsize to 1M and save a 131.1M > file, it's going to take up 132M on disk before compression/raidz > overhead/whatnot. > >> >> >> > >> >> >> Not true. ZFS will trim the file's tails even without > compression enabled. > >> >> >> > >> >> >> > > >> >> >> > Usually compression saves you from the tail padding actually > requiring allocation on disk, which is one reason I encourage everyone to > at least use lz4 (or, if you absolutely cannot for some reason, I guess zle > should also work for this one case...) > >> >> >> > > >> >> >> > But I would say it's probably the sum of last record padding > across the whole dataset, if you don't have compression on. > >> >> >> > > >> >> >> > - Rich > >> >> >> > > >> >> >> > On Tue, Jan 18, 2022 at 8:57 AM Florent Rivoire < > florent@rivoire.fr> wrote: > >> >> >> >> > >> >> >> >> TLDR: I rsync-ed the same data twice: once with 128K > recordsize and > >> >> >> >> once with 1M, and the allocated size on disk is ~3% bigger > with 1M. > >> >> >> >> Why not smaller ? > >> >> >> >> > >> >> >> >> > >> >> >> >> Hello, > >> >> >> >> > >> >> >> >> I would like some help to understand how the disk usage > evolves when I > >> >> >> >> change the recordsize. > >> >> >> >> > >> >> >> >> I've read several articles/presentations/forums about > recordsize in > >> >> >> >> ZFS, and if I try to summarize, I mainly understood that: > >> >> >> >> - recordsize is the "maximum" size of "objects" (so "logical > blocks") > >> >> >> >> that zfs will create for both -data & metadata, then each > object is > >> >> >> >> compressed and allocated to one vdev, splitted into smaller > (ashift > >> >> >> >> size) "physical" blocks and written on disks > >> >> >> >> - increasing recordsize is usually good when storing large > files that > >> >> >> >> are not modified, because it limits the nb of metadata objects > >> >> >> >> (block-pointers), which has a positive effect on performance > >> >> >> >> - decreasing recordsize is useful for "databases-like" > workloads (ie: > >> >> >> >> small random writes inside existing objects), because it > avoids write > >> >> >> >> amplification (read-modify-write a large object for a small > update) > >> >> >> >> > >> >> >> >> Today, I'm trying to observe the effect of increasing > recordsize for > >> >> >> >> *my* data (because I'm also considering defining > special_small_blocks > >> >> >> >> & using SSDs as "special", but not tested nor discussed here, > just > >> >> >> >> recordsize). > >> >> >> >> So, I'm doing some benchmarks on my "documents" dataset > (details in > >> >> >> >> "notes" below), but the results are really strange to me. > >> >> >> >> > >> >> >> >> When I rsync the same data to a freshly-recreated zpool: > >> >> >> >> A) with recordsize=128K : 226G allocated on disk > >> >> >> >> B) with recordsize=1M : 232G allocated on disk => bigger than > 128K ?!? > >> >> >> >> > >> >> >> >> I would clearly expect the other way around, because bigger > recordsize > >> >> >> >> generates less metadata so smaller disk usage, and there > shouldn't be > >> >> >> >> any overhead because 1M is just a maximum and not a forced > size to > >> >> >> >> allocate for every object. > >> >> >> > >> >> >> A common misconception. The 1M recordsize applies to every newly > >> >> >> created object, and every object must use the same size for all > of its > >> >> >> records (except possibly the last one). But objects created > before > >> >> >> you changed the recsize will retain their old recsize, file tails > have > >> >> >> a flexible recsize. > >> >> >> > >> >> >> >> I don't mind the increased usage (I can live with a few GB > more), but > >> >> >> >> I would like to understand why it happens. > >> >> >> > >> >> >> You might be seeing the effects of sparsity. ZFS is smart enough > not > >> >> >> to store file holes (and if any kind of compression is enabled, it > >> >> >> will find long runs of zeroes and turn them into holes). If your > data > >> >> >> contains any holes that are >= 128 kB but < 1MB, then they can be > >> >> >> stored as holes with a 128 kB recsize but must be stored as long > runs > >> >> >> of zeros with a 1MB recsize. > >> >> >> > >> >> >> However, I would suggest that you don't bother. With a 128kB > recsize, > >> >> >> ZFS has something like a 1000:1 ratio of data:metadata. In other > >> >> >> words, increasing your recsize can save you at most 0.1% of disk > >> >> >> space. Basically, it doesn't matter. What it _does_ matter for > is > >> >> >> the tradeoff between write amplification and RAM usage. 1000:1 is > >> >> >> comparable to the disk:ram of many computers. And performance is > more > >> >> >> sensitive to metadata access times than data access times. So > >> >> >> increasing your recsize can help you keep a greater fraction of > your > >> >> >> metadata in ARC. OTOH, as you remarked increasing your recsize > will > >> >> >> also increase write amplification. > >> >> >> > >> >> >> So to summarize: > >> >> >> * Adjust compression settings to save disk space. > >> >> >> * Adjust recsize to save RAM. > >> >> >> > >> >> >> -Alan > >> >> >> > >> >> >> >> > >> >> >> >> I tried to give all the details of my tests below. > >> >> >> >> Did I do something wrong ? Can you explain the increase ? > >> >> >> >> > >> >> >> >> Thanks ! > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> =============================================== > >> >> >> >> A) 128K > >> >> >> >> ========== > >> >> >> >> > >> >> >> >> # zpool destroy bench > >> >> >> >> # zpool create -o ashift=12 bench > >> >> >> >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 > >> >> >> >> > >> >> >> >> # rsync -av --exclude '.zfs' /mnt/tank/docs-florent/ /bench > >> >> >> >> [...] > >> >> >> >> sent 241,042,476,154 bytes received 353,838 bytes > 81,806,492.45 bytes/sec > >> >> >> >> total size is 240,982,439,038 speedup is 1.00 > >> >> >> >> > >> >> >> >> # zfs get recordsize bench > >> >> >> >> NAME PROPERTY VALUE SOURCE > >> >> >> >> bench recordsize 128K default > >> >> >> >> > >> >> >> >> # zpool list -v bench > >> >> >> >> NAME SIZE ALLOC > FREE > >> >> >> >> CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT > >> >> >> >> bench 2.72T 226G > 2.50T > >> >> >> >> - - 0% 8% 1.00x ONLINE - > >> >> >> >> gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 2.72T 226G > 2.50T > >> >> >> >> - - 0% 8.10% - ONLINE > >> >> >> >> > >> >> >> >> # zfs list bench > >> >> >> >> NAME USED AVAIL REFER MOUNTPOINT > >> >> >> >> bench 226G 2.41T 226G /bench > >> >> >> >> > >> >> >> >> # zfs get all bench |egrep "(used|referenced|written)" > >> >> >> >> bench used 226G - > >> >> >> >> bench referenced 226G - > >> >> >> >> bench usedbysnapshots 0B - > >> >> >> >> bench usedbydataset 226G - > >> >> >> >> bench usedbychildren 1.80M - > >> >> >> >> bench usedbyrefreservation 0B - > >> >> >> >> bench written 226G - > >> >> >> >> bench logicalused 226G - > >> >> >> >> bench logicalreferenced 226G - > >> >> >> >> > >> >> >> >> # zdb -Lbbbs bench > zpool-bench-rcd128K.zdb > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> =============================================== > >> >> >> >> B) 1M > >> >> >> >> ========== > >> >> >> >> > >> >> >> >> # zpool destroy bench > >> >> >> >> # zpool create -o ashift=12 bench > >> >> >> >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 > >> >> >> >> # zfs set recordsize=1M bench > >> >> >> >> > >> >> >> >> # rsync -av --exclude '.zfs' /mnt/tank/docs-florent/ /bench > >> >> >> >> [...] > >> >> >> >> sent 241,042,476,154 bytes received 353,830 bytes > 80,173,899.88 bytes/sec > >> >> >> >> total size is 240,982,439,038 speedup is 1.00 > >> >> >> >> > >> >> >> >> # zfs get recordsize bench > >> >> >> >> NAME PROPERTY VALUE SOURCE > >> >> >> >> bench recordsize 1M local > >> >> >> >> > >> >> >> >> # zpool list -v bench > >> >> >> >> NAME SIZE ALLOC > FREE > >> >> >> >> CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT > >> >> >> >> bench 2.72T 232G > 2.49T > >> >> >> >> - - 0% 8% 1.00x ONLINE - > >> >> >> >> gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 2.72T 232G > 2.49T > >> >> >> >> - - 0% 8.32% - ONLINE > >> >> >> >> > >> >> >> >> # zfs list bench > >> >> >> >> NAME USED AVAIL REFER MOUNTPOINT > >> >> >> >> bench 232G 2.41T 232G /bench > >> >> >> >> > >> >> >> >> # zfs get all bench |egrep "(used|referenced|written)" > >> >> >> >> bench used 232G - > >> >> >> >> bench referenced 232G - > >> >> >> >> bench usedbysnapshots 0B - > >> >> >> >> bench usedbydataset 232G - > >> >> >> >> bench usedbychildren 1.96M - > >> >> >> >> bench usedbyrefreservation 0B - > >> >> >> >> bench written 232G - > >> >> >> >> bench logicalused 232G - > >> >> >> >> bench logicalreferenced 232G - > >> >> >> >> > >> >> >> >> # zdb -Lbbbs bench > zpool-bench-rcd1M.zdb > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> =============================================== > >> >> >> >> Notes: > >> >> >> >> ========== > >> >> >> >> > >> >> >> >> - the source dataset contains ~50% of pictures (raw files and > jpg), > >> >> >> >> and also some music, various archived documents, zip, videos > >> >> >> >> - no change on the source dataset while testing (cf size > logged by resync) > >> >> >> >> - I repeated the tests twice (128K, then 1M, then 128K, then > 1M), and > >> >> >> >> same results > >> >> >> >> - probably not important here, but: > >> >> >> >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 is a Red 3TB > CMR > >> >> >> >> (WD30EFRX), and /mnt/tank/docs-florent/ is a 128K-recordsize > dataset > >> >> >> >> on another zpool that I never tweaked except ashit=12 (because > using > >> >> >> >> the same model of Red 3TB) > >> >> >> >> > >> >> >> >> # zfs --version > >> >> >> >> zfs-2.0.6-1 > >> >> >> >> zfs-kmod-v2021120100-zfs_a8c7652 > >> >> >> >> > >> >> >> >> # uname -a > >> >> >> >> FreeBSD xxxxxxxxx 12.2-RELEASE-p11 FreeBSD 12.2-RELEASE-p11 > >> >> >> >> 75566f060d4(HEAD) TRUENAS amd64 > --000000000000a5311a05d5dcfcb3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Yeah, that's consistent with my understanding of the b= ehavior - one record gets packed, as soon as you hit recordsize all subsequ= ent records are (logically, at least) recordsize, and then compression save= s you, or doesn't.

- Rich

On Tue, Jan 18, 2022= at 10:29 AM alan somers <asomers@g= mail.com> wrote:
I think the difference is in whether the file is < 1 record or &= gt;=3D 1
record.=C2=A0 It looks like the first record is variably-sized but after that it's like you say, with compression off it rounds up.

On Tue, Jan 18, 2022 at 8:07 AM Rich <rincebrain@gmail.com> wrote:
>
> Nope. I just retried it on my FBSD 13-RELEASE VM, too:
> # uname -a
> FreeBSD fbsd13rel 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Aug = 24 07:33:27 UTC 2021=C2=A0 =C2=A0 =C2=A0root@amd64-builder.daemonology.net:= /usr/obj/usr/src/amd64.amd64/sys/GENERIC=C2=A0 amd64
> # zpool version
> zfs-2.1.99-683_ga967e54c2
> zfs-kmod-2.1.99-683_ga967e54c2
> # zpool get all | grep 'feature@' | grep disabled
> buildpool=C2=A0 feature@edonr=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 disabled=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0local
> # dd if=3D/dev/urandom of=3D/buildpool/testme/2 bs=3D1179648 count=3D1=
> 1+0 records in
> 1+0 records out
> 1179648 bytes transferred in 0.009827 secs (120041885 bytes/sec)
> # du -sh /buildpool/testme/2
> 2.0M=C2=A0 =C2=A0 /buildpool/testme/2
> # zfs get all buildpool/testme | grep -v default
> NAME=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 PROPERTY=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 VALUE=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 SOURCE
> buildpool/testme=C2=A0 type=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 filesystem=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0-
> buildpool/testme=C2=A0 creation=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 Tue Jan 18=C2=A0 4:46 2022=C2=A0 -
> buildpool/testme=C2=A0 used=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 4.03M=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 -
> buildpool/testme=C2=A0 available=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0277G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0-
> buildpool/testme=C2=A0 referenced=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 4.03M=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = -
> buildpool/testme=C2=A0 compressratio=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= 1.00x=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -
> buildpool/testme=C2=A0 mounted=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0yes=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 -
> buildpool/testme=C2=A0 recordsize=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 1M=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0local
> buildpool/testme=C2=A0 compression=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0off=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 local
> buildpool/testme=C2=A0 atime=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0off=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 inherited from buildpool
> buildpool/testme=C2=A0 createtxg=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A015030=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 -
> buildpool/testme=C2=A0 version=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A05=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 -
> buildpool/testme=C2=A0 utf8only=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 off=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 -
> buildpool/testme=C2=A0 normalization=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= none=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<= br> > buildpool/testme=C2=A0 casesensitivity=C2=A0 =C2=A0 =C2=A0 =C2=A0sensi= tive=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -
> buildpool/testme=C2=A0 guid=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 11057815587819738755=C2=A0 =C2=A0-
> buildpool/testme=C2=A0 usedbysnapshots=C2=A0 =C2=A0 =C2=A0 =C2=A00B=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<= br> > buildpool/testme=C2=A0 usedbydataset=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= 4.03M=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -
> buildpool/testme=C2=A0 usedbychildren=C2=A0 =C2=A0 =C2=A0 =C2=A0 0B=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<= br> > buildpool/testme=C2=A0 usedbyrefreservation=C2=A0 0B=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-
> buildpool/testme=C2=A0 objsetid=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 280=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 -
> buildpool/testme=C2=A0 refcompressratio=C2=A0 =C2=A0 =C2=A0 1.00x=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -
> buildpool/testme=C2=A0 written=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A04.03M=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 -
> buildpool/testme=C2=A0 logicalused=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A04.01M=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -=
> buildpool/testme=C2=A0 logicalreferenced=C2=A0 =C2=A0 =C2=A04.01M=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -
>
> What version are you running?
>
> - Rich
>
> On Tue, Jan 18, 2022 at 10:00 AM Alan Somers <asomers@freebsd.org> wrote:
>>
>> That's not what I get.=C2=A0 Is your pool formatted using a ve= ry old
>> version or something?
>>
>> somers@fbsd-head /u/h/somers [1]>
>> dd if=3D/dev/random bs=3D1179648 of=3D/testpool/food/t/richfile co= unt=3D1
>> 1+0 records in
>> 1+0 records out
>> 1179648 bytes transferred in 0.003782 secs (311906705 bytes/sec) >> somers@fbsd-head /u/h/somers> du -sh=C2=A0 /testpool/food/t/ric= hfile
>> 1.1M=C2=A0 =C2=A0 /testpool/food/t/richfile
>>
>> On Tue, Jan 18, 2022 at 7:51 AM Rich <rincebrain@gmail.com> wrote:
>> >
>> > 2.1M=C2=A0 =C2=A0 /workspace/test1M/1
>> >
>> > - Rich
>> >
>> > On Tue, Jan 18, 2022 at 9:47 AM Alan Somers <asomers@freebsd.org> wro= te:
>> >>
>> >> Yeah, it does.=C2=A0 Just check "du -sh <FILENAME= >".=C2=A0 zdb there is showing
>> >> you the logical size of the record, but it isn't show= ing how many disk
>> >> blocks are actually allocated.
>> >>
>> >> On Tue, Jan 18, 2022 at 7:30 AM Rich <rincebrain@gmail.com> wrot= e:
>> >> >
>> >> > Really? I didn't know it would still trim the ta= ils on files with compression off.
>> >> >
>> >> > ...
>> >> >
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size=C2=A0 =C2=A0 1= 179648
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0parent=C2=A0 34
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0links=C2=A0 =C2=A01=
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pflags=C2=A0 408000= 00004
>> >> > Indirect blocks:
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 0 L1=C2=A0 DVA[0]=3D<3:c02b96c000:1000> DVA[1]=3D<3:c810733000= :1000> [L1 ZFS plain file] skein lz4 unencrypted LE contiguous unique do= uble size=3D20000L/1000P birth=3D35675472L/35675472P fill=3D2 cksum=3D5cfba= 24b351a09aa:8bd9dfef87c5b625:906ed5c3252943db:bed77ce51ad540d4
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 0=C2=A0 L0 DVA[0]=3D<2:a0827db4000:100000> [L0 ZFS plain file] sk= ein uncompressed unencrypted LE contiguous unique single size=3D100000L/100= 000P birth=3D35675472L/35675472P fill=3D1 cksum=3D95b06edf60e5f54c:af6f6950= 775d0863:8fc28b0783fcd9d3:2e44676e48a59360
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0100000=C2=A0= L0 DVA[0]=3D<2:a0827eb4000:100000> [L0 ZFS plain file] skein uncompr= essed unencrypted LE contiguous unique single size=3D100000L/100000P birth= =3D35675472L/35675472P fill=3D1 cksum=3D62a1f05769528648:8197c8a05ca9f1fb:a= 750c690124dd2e0:390bddc4314cd4c3
>> >> >
>> >> > It seems not?
>> >> >
>> >> > - Rich
>> >> >
>> >> >
>> >> > On Tue, Jan 18, 2022 at 9:23 AM Alan Somers <asomers@freebsd.org> wrote:
>> >> >>
>> >> >> On Tue, Jan 18, 2022 at 7:13 AM Rich <
rincebrain@gmail.com= > wrote:
>> >> >> >
>> >> >> > Compression would have made your life bette= r here, and possibly also made it clearer what's going on.
>> >> >> >
>> >> >> > All records in a file are going to be the s= ame size pre-compression - so if you set the recordsize to 1M and save a 13= 1.1M file, it's going to take up 132M on disk before compression/raidz = overhead/whatnot.
>> >> >>
>> >> >> Not true.=C2=A0 ZFS will trim the file's tai= ls even without compression enabled.
>> >> >>
>> >> >> >
>> >> >> > Usually compression saves you from the tail= padding actually requiring allocation on disk, which is one reason I encou= rage everyone to at least use lz4 (or, if you absolutely cannot for some re= ason, I guess zle should also work for this one case...)
>> >> >> >
>> >> >> > But I would say it's probably the sum o= f last record padding across the whole dataset, if you don't have compr= ession on.
>> >> >> >
>> >> >> > - Rich
>> >> >> >
>> >> >> > On Tue, Jan 18, 2022 at 8:57 AM Florent Riv= oire <florent@ri= voire.fr> wrote:
>> >> >> >>
>> >> >> >> TLDR: I rsync-ed the same data twice: o= nce with 128K recordsize and
>> >> >> >> once with 1M, and the allocated size on= disk is ~3% bigger with 1M.
>> >> >> >> Why not smaller ?
>> >> >> >>
>> >> >> >>
>> >> >> >> Hello,
>> >> >> >>
>> >> >> >> I would like some help to understand ho= w the disk usage evolves when I
>> >> >> >> change the recordsize.
>> >> >> >>
>> >> >> >> I've read several articles/presenta= tions/forums about recordsize in
>> >> >> >> ZFS, and if I try to summarize, I mainl= y understood that:
>> >> >> >> - recordsize is the "maximum"= size of "objects" (so "logical blocks")
>> >> >> >> that zfs will create for both=C2=A0 -da= ta & metadata, then each object is
>> >> >> >> compressed and allocated to one vdev, s= plitted into smaller (ashift
>> >> >> >> size) "physical" blocks and w= ritten on disks
>> >> >> >> - increasing recordsize is usually good= when storing large files that
>> >> >> >> are not modified, because it limits the= nb of metadata objects
>> >> >> >> (block-pointers), which has a positive = effect on performance
>> >> >> >> - decreasing recordsize is useful for &= quot;databases-like" workloads (ie:
>> >> >> >> small random writes inside existing obj= ects), because it avoids write
>> >> >> >> amplification (read-modify-write a larg= e object for a small update)
>> >> >> >>
>> >> >> >> Today, I'm trying to observe the ef= fect of increasing recordsize for
>> >> >> >> *my* data (because I'm also conside= ring defining special_small_blocks
>> >> >> >> & using SSDs as "special"= , but not tested nor discussed here, just
>> >> >> >> recordsize).
>> >> >> >> So, I'm doing some benchmarks on my= "documents" dataset (details in
>> >> >> >> "notes" below), but the resul= ts are really strange to me.
>> >> >> >>
>> >> >> >> When I rsync the same data to a freshly= -recreated zpool:
>> >> >> >> A) with recordsize=3D128K : 226G alloca= ted on disk
>> >> >> >> B) with recordsize=3D1M : 232G allocate= d on disk =3D> bigger than 128K ?!?
>> >> >> >>
>> >> >> >> I would clearly expect the other way ar= ound, because bigger recordsize
>> >> >> >> generates less metadata so smaller disk= usage, and there shouldn't be
>> >> >> >> any overhead because 1M is just a maxim= um and not a forced size to
>> >> >> >> allocate for every object.
>> >> >>
>> >> >> A common misconception.=C2=A0 The 1M recordsize = applies to every newly
>> >> >> created object, and every object must use the sa= me size for all of its
>> >> >> records (except possibly the last one).=C2=A0 Bu= t objects created before
>> >> >> you changed the recsize will retain their old re= csize, file tails have
>> >> >> a flexible recsize.
>> >> >>
>> >> >> >> I don't mind the increased usage (I= can live with a few GB more), but
>> >> >> >> I would like to understand why it happe= ns.
>> >> >>
>> >> >> You might be seeing the effects of sparsity.=C2= =A0 ZFS is smart enough not
>> >> >> to store file holes (and if any kind of compress= ion is enabled, it
>> >> >> will find long runs of zeroes and turn them into= holes).=C2=A0 If your data
>> >> >> contains any holes that are >=3D 128 kB but &= lt; 1MB, then they can be
>> >> >> stored as holes with a 128 kB recsize but must b= e stored as long runs
>> >> >> of zeros with a 1MB recsize.
>> >> >>
>> >> >> However, I would suggest that you don't both= er.=C2=A0 With a 128kB recsize,
>> >> >> ZFS has something like a 1000:1 ratio of data:me= tadata.=C2=A0 In other
>> >> >> words, increasing your recsize can save you at m= ost 0.1% of disk
>> >> >> space.=C2=A0 Basically, it doesn't matter.= =C2=A0 What it _does_ matter for is
>> >> >> the tradeoff between write amplification and RAM= usage.=C2=A0 1000:1 is
>> >> >> comparable to the disk:ram of many computers.=C2= =A0 And performance is more
>> >> >> sensitive to metadata access times than data acc= ess times.=C2=A0 So
>> >> >> increasing your recsize can help you keep a grea= ter fraction of your
>> >> >> metadata in ARC.=C2=A0 OTOH, as you remarked inc= reasing your recsize will
>> >> >> also increase write amplification.
>> >> >>
>> >> >> So to summarize:
>> >> >> * Adjust compression settings to save disk space= .
>> >> >> * Adjust recsize to save RAM.
>> >> >>
>> >> >> -Alan
>> >> >>
>> >> >> >>
>> >> >> >> I tried to give all the details of my t= ests below.
>> >> >> >> Did I do something wrong ? Can you expl= ain the increase ?
>> >> >> >>
>> >> >> >> Thanks !
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D
>> >> >> >> A) 128K
>> >> >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> >> >> >>
>> >> >> >> # zpool destroy bench
>> >> >> >> # zpool create -o ashift=3D12 bench
>> >> >> >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cb= b8cc3ad4
>> >> >> >>
>> >> >> >> # rsync -av --exclude '.zfs' /m= nt/tank/docs-florent/ /bench
>> >> >> >> [...]
>> >> >> >> sent 241,042,476,154 bytes=C2=A0 receiv= ed 353,838 bytes=C2=A0 81,806,492.45 bytes/sec
>> >> >> >> total size is 240,982,439,038=C2=A0 spe= edup is 1.00
>> >> >> >>
>> >> >> >> # zfs get recordsize bench
>> >> >> >> NAME=C2=A0 =C2=A0PROPERTY=C2=A0 =C2=A0 = VALUE=C2=A0 =C2=A0 SOURCE
>> >> >> >> bench=C2=A0 recordsize=C2=A0 128K=C2=A0= =C2=A0 =C2=A0default
>> >> >> >>
>> >> >> >> # zpool list -v bench
>> >> >> >> NAME=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0SIZE=C2=A0 ALLOC=C2=A0 =C2=A0F= REE
>> >> >> >> CKPOINT=C2=A0 EXPANDSZ=C2=A0 =C2=A0FRAG= =C2=A0 =C2=A0 CAP=C2=A0 DEDUP=C2=A0 =C2=A0 HEALTH=C2=A0 ALTROOT
>> >> >> >> bench=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02.72T=C2=A0 =C2=A0226G=C2=A0 2.50T >> >> >> >>=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0-=C2=A0 =C2=A0 =C2=A00%=C2=A0 =C2=A0 =C2=A08%=C2=A0 1.00x=C2=A0 = =C2=A0 ONLINE=C2=A0 -
>> >> >> >>=C2=A0 =C2=A0gptid/3c0f5cbc-b0ce-11ea-ab= 91-c8cbb8cc3ad4=C2=A0 2.72T=C2=A0 =C2=A0226G=C2=A0 2.50T
>> >> >> >>=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0-=C2=A0 =C2=A0 =C2=A00%=C2=A0 8.10%=C2=A0 =C2=A0 =C2=A0 -=C2=A0 = =C2=A0 ONLINE
>> >> >> >>
>> >> >> >> # zfs list bench
>> >> >> >> NAME=C2=A0 =C2=A0 USED=C2=A0 AVAIL=C2= =A0 =C2=A0 =C2=A0REFER=C2=A0 MOUNTPOINT
>> >> >> >> bench=C2=A0 =C2=A0226G=C2=A0 2.41T=C2= =A0 =C2=A0 =C2=A0 226G=C2=A0 /bench
>> >> >> >>
>> >> >> >> # zfs get all bench |egrep "(used|= referenced|written)"
>> >> >> >> bench=C2=A0 used=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-
>> >> >> >> bench=C2=A0 referenced=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0-
>> >> >> >> bench=C2=A0 usedbysnapshots=C2=A0 =C2= =A0 =C2=A0 =C2=A00B=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0-
>> >> >> >> bench=C2=A0 usedbydataset=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0-
>> >> >> >> bench=C2=A0 usedbychildren=C2=A0 =C2=A0= =C2=A0 =C2=A0 1.80M=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 -
>> >> >> >> bench=C2=A0 usedbyrefreservation=C2=A0 = 0B=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0-
>> >> >> >> bench=C2=A0 written=C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-
>> >> >> >> bench=C2=A0 logicalused=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0-
>> >> >> >> bench=C2=A0 logicalreferenced=C2=A0 =C2= =A0 =C2=A0226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0-
>> >> >> >>
>> >> >> >> # zdb -Lbbbs bench > zpool-bench-rcd= 128K.zdb
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D
>> >> >> >> B) 1M
>> >> >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> >> >> >>
>> >> >> >> # zpool destroy bench
>> >> >> >> # zpool create -o ashift=3D12 bench
>> >> >> >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cb= b8cc3ad4
>> >> >> >> # zfs set recordsize=3D1M bench
>> >> >> >>
>> >> >> >> # rsync -av --exclude '.zfs' /m= nt/tank/docs-florent/ /bench
>> >> >> >> [...]
>> >> >> >> sent 241,042,476,154 bytes=C2=A0 receiv= ed 353,830 bytes=C2=A0 80,173,899.88 bytes/sec
>> >> >> >> total size is 240,982,439,038=C2=A0 spe= edup is 1.00
>> >> >> >>
>> >> >> >> # zfs get recordsize bench
>> >> >> >> NAME=C2=A0 =C2=A0PROPERTY=C2=A0 =C2=A0 = VALUE=C2=A0 =C2=A0 SOURCE
>> >> >> >> bench=C2=A0 recordsize=C2=A0 1M=C2=A0 = =C2=A0 =C2=A0 =C2=A0local
>> >> >> >>
>> >> >> >> # zpool list -v bench
>> >> >> >> NAME=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0SIZE=C2=A0 ALLOC=C2=A0 =C2=A0F= REE
>> >> >> >> CKPOINT=C2=A0 EXPANDSZ=C2=A0 =C2=A0FRAG= =C2=A0 =C2=A0 CAP=C2=A0 DEDUP=C2=A0 =C2=A0 HEALTH=C2=A0 ALTROOT
>> >> >> >> bench=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02.72T=C2=A0 =C2=A0232G=C2=A0 2.49T >> >> >> >>=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0-=C2=A0 =C2=A0 =C2=A00%=C2=A0 =C2=A0 =C2=A08%=C2=A0 1.00x=C2=A0 = =C2=A0 ONLINE=C2=A0 -
>> >> >> >>=C2=A0 =C2=A0gptid/3c0f5cbc-b0ce-11ea-ab= 91-c8cbb8cc3ad4=C2=A0 2.72T=C2=A0 =C2=A0232G=C2=A0 2.49T
>> >> >> >>=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0-=C2=A0 =C2=A0 =C2=A00%=C2=A0 8.32%=C2=A0 =C2=A0 =C2=A0 -=C2=A0 = =C2=A0 ONLINE
>> >> >> >>
>> >> >> >> # zfs list bench
>> >> >> >> NAME=C2=A0 =C2=A0 USED=C2=A0 AVAIL=C2= =A0 =C2=A0 =C2=A0REFER=C2=A0 MOUNTPOINT
>> >> >> >> bench=C2=A0 =C2=A0232G=C2=A0 2.41T=C2= =A0 =C2=A0 =C2=A0 232G=C2=A0 /bench
>> >> >> >>
>> >> >> >> # zfs get all bench |egrep "(used|= referenced|written)"
>> >> >> >> bench=C2=A0 used=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-
>> >> >> >> bench=C2=A0 referenced=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0-
>> >> >> >> bench=C2=A0 usedbysnapshots=C2=A0 =C2= =A0 =C2=A0 =C2=A00B=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0-
>> >> >> >> bench=C2=A0 usedbydataset=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0-
>> >> >> >> bench=C2=A0 usedbychildren=C2=A0 =C2=A0= =C2=A0 =C2=A0 1.96M=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 -
>> >> >> >> bench=C2=A0 usedbyrefreservation=C2=A0 = 0B=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0-
>> >> >> >> bench=C2=A0 written=C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-
>> >> >> >> bench=C2=A0 logicalused=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0-
>> >> >> >> bench=C2=A0 logicalreferenced=C2=A0 =C2= =A0 =C2=A0232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0-
>> >> >> >>
>> >> >> >> # zdb -Lbbbs bench > zpool-bench-rcd= 1M.zdb
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D
>> >> >> >> Notes:
>> >> >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> >> >> >>
>> >> >> >> - the source dataset contains ~50% of p= ictures (raw files and jpg),
>> >> >> >> and also some music, various archived d= ocuments, zip, videos
>> >> >> >> - no change on the source dataset while= testing (cf size logged by resync)
>> >> >> >> - I repeated the tests twice (128K, the= n 1M, then 128K, then 1M), and
>> >> >> >> same results
>> >> >> >> - probably not important here, but:
>> >> >> >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cb= b8cc3ad4 is a Red 3TB CMR
>> >> >> >> (WD30EFRX), and /mnt/tank/docs-florent/= is a 128K-recordsize dataset
>> >> >> >> on another zpool that I never tweaked e= xcept ashit=3D12 (because using
>> >> >> >> the same model of Red 3TB)
>> >> >> >>
>> >> >> >> # zfs --version
>> >> >> >> zfs-2.0.6-1
>> >> >> >> zfs-kmod-v2021120100-zfs_a8c7652
>> >> >> >>
>> >> >> >> # uname -a
>> >> >> >> FreeBSD xxxxxxxxx 12.2-RELEASE-p11 Free= BSD 12.2-RELEASE-p11
>> >> >> >> 75566f060d4(HEAD) TRUENAS=C2=A0 amd64
--000000000000a5311a05d5dcfcb3--