Re: zfs diff
- In reply to: Eugene M. Zheganin: "zfs diff"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 13 Feb 2025 08:49:58 UTC
Hi, Two points. 1. Zfs diff has knowledge of changed blocks and needs to look up the filename (or multiple filenames) of the block. This is a computational heavy operation and can take a very long time. 2. A snapshot can consist of only metadata changes, like atime changes. So although the snapshots contains a few KB of MB of changes the diff doesn't show any changed data. You might try to use rsync to find differences between snapshots. This will generate a list of changes of two snapshots on my disk. # time rsync -an -v --stats --delete /data/jails/_ports/.zfs/snapshot/repl-2025-02-12_00-00/ /data/jails/_ports/.zfs/snapshot/test/ sending incremental file list ./ MOVED UPDATING .git/ .git/FETCH_HEAD ... [snip long list of files] ... x11/xwayland-satellite/distinfo x11/yakuake/ x11/yakuake/distinfo Number of files: 349,690 (reg: 283,943, dir: 65,236, link: 511) Number of created files: 145 (reg: 120, dir: 25) Number of deleted files: 267 (reg: 207, dir: 60) Number of regular files transferred: 2,342 Total file size: 6,873,671,941 bytes Total transferred file size: 43,376,750 bytes Literal data: 0 bytes Matched data: 0 bytes File list size: 524,271 File list generation time: 0.006 seconds File list transfer time: 0.000 seconds Total bytes sent: 10,499,359 Total bytes received: 92,348 sent 10,499,359 bytes received 92,348 bytes 282,445.52 bytes/sec total size is 6,873,671,941 speedup is 648.97 (DRY RUN) real 0m37.316s user 0m27.882s sys 0m33.888s This will be O(#files), but it will check all the files. Zfs diff might only check changed blocks, but needs to do a O(#changes * #files) lookup [1]. So which is faster depends on your situation. NB: quite some information is available about the speed of zfs diff when I google on "zfs diff takes too long". [1] https://zfsonlinux.topicbox.com/groups/zfs-discuss/T3d7c034221b1220a-Mac8fdaa32ad829183baad855 https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1881748 https://github.com/openzfs/zfs/pull/12837 https://github.com/openzfs/zfs/pull/10391 https://github.com/openzfs/zfs/issues/6920 Regards, Ronald. Van: "Eugene M. Zheganin" <eugene@zheganin.net> Datum: woensdag, 12 februari 2025 17:41 Aan: freebsd-stable@FreeBSD.org Onderwerp: zfs diff > > Hello, > > I have a 13.2-RELEASE-p3 system with a large storage attached: > > ===Cut=== > > NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT > tank 135T 122T 12.8T - - 66% 90% 1.27x ONLINE - > zroot 31.5G 27.8G 3.74G - - 70% 88% 1.00x ONLINE - > > ===Cut=== > > In order to process some newly incoming files I'd like to use the zfs diff functionality to get the list of the files created or modified. So I wrote a simple script (/root/periodic/zfsdiff) diffing two dataset snapshots between today and yesterday. Most of these launches do merely work. But not all of them. Some (like 15%) just are waiting for something infinitely, while seemingly doing nothing: > > ===Cut=== > > 39935 - I 18101:52,29 zfs diff tank/data/tank2@2025-01-20 tank/data/tank2@2025-01-21 > 46118 - Is 0:00,00 /bin/sh /root/periodic/zfsdiff > 46126 - I 354:34,75 zfs diff tank/data/tank0@2025-02-03 tank/data/tank0@2025-02-04 > 49620 - I 2155:14,42 zfs diff tank/data/tank1@2025-02-10 tank/data/tank1@2025-02-11 > 53243 - Is 0:00,00 /bin/sh /root/periodic/zfsdiff > 53255 - I 3607:34,83 zfs diff tank/data/tank0@2025-02-09 tank/data/tank0@2025-02-10 > 56849 - Is 0:00,00 /bin/sh /root/periodic/zfsdiff > 59725 - I 3630:23,01 zfs diff tank/data/tank2@2025-01-27 tank/data/tank2@2025-01-28 > 65460 - I 1425:25,55 zfs diff tank/data/tank1@2025-02-03 tank/data/tank1@2025-02-04 > 82371 - I 111:25,63 zfs diff tank/data/tank3@2025-02-11 tank/data/tank3@2025-02-12 > 98172 - Is 0:00,00 /bin/sh /root/periodic/zfsdiff > 98223 - I 4792:11,99 zfs diff tank/data/tank3@2025-02-04 tank/data/tank3@2025-02-05 > 40589 2 IN 18108:48,07 zfs diff tank/data/tank2@2025-01-20 tank/data/tank2@2025-01-21 > 28649 6 I+ 471:24,81 zfs diff tank/data/tank1@2025-02-03 > > ===Cut=== > > Surprisingly, this has little to no correlation to the size of the snapshot, for instance I have the relatively small snapshot diff that fails to process (notice the idle process above): > > ===Cut=== > > tank/data/tank1@2025-02-03 31.6M - 16.0T - > tank/data/tank1@2025-02-04 32.5M - 16.0T - > > ===Cut=== > > Also, some of these leave no output, without any traces of the script killed or crashed which is very suspicious as well. You could say that this probably means there were no changes, but the snapshot size thinks there were some. > > Is there any trick there ? Does this look like a race condition, do I have to run these sequentially, like one diff at a time ? Can those interfere with only their fellow diffs, or also with snapshot creation ? > > > Thanks. > > Eugene. > > > > >