Re: data, metadata, backup, and archive integrity and correction

From: David Christensen <dpchrist_at_holgerdanske.com>
Date: Sat, 24 Sep 2022 01:58:24 UTC
On 9/23/22 16:37, Ralf Mardorf wrote:
> On Fri, 2022-09-23 at 15:42 -0700, David Christensen wrote:
>> All versions of the photograph file opened correctly with a viewer.
>> All photographs looked the same on the screen.  But, at least one file
>> is corrupt.  Which file(s)?  I never  figured it out.  I kept all
>> versions of the file.  (And, I have kept all camera media.)
> 
> Hi David,
> 
> I'm a digital photographer newbie. I started digital photography in
> 2020. For developing photos and more editing, graphic art, I'm using a
> Linux desktop machine, but most of the times an iPad Pro. I copy all my
> cam's SD data to my Linux desktop PC, my iPad Pro and to at least two
> USB HDDs (ext4 and hfs+ without journaling) in the first place. Before I
> format a SD again, I take a look at all copied photos using a viewer.
> The photo backup/archiving is completely "decoupled" from all other
> backup/archiving. Non-destructive editing of photos, done on different
> machines, in my experiences results in chaos. Way before verifying a
> probably corrupted backup, I already loose control. For example, it's
> already impossible to gain control over naming files of edited photos.
> Sharing edited photos among apps running on iPad OS already is a PITA,
> let alone sharing photos among operating systems.
> 
> I've got tons of unneeded duplicates of some photos. Deleting a
> duplicated photo might render separately stored meta-data useless.


Two of the features that attracted me to ZFS were de-duplication and 
compression.  They work great for filesystem copy backups (e.g. rsync). 
  Here are the backups of my daily driver root filesystem:


2022-09-23 18:23:47 toor@f3 ~
# du -m -s /var/local/backup/laalaa.tracy.holgerdanske.com/
4781	/var/local/backup/laalaa.tracy.holgerdanske.com/

2022-09-23 18:26:30 toor@f3 ~
# ls /var/local/backup/laalaa.tracy.holgerdanske.com/.zfs/snapshot/ | wc -l
      203

2022-09-23 18:16:03 toor@f3 ~
# du -m -c -s 
/var/local/backup/laalaa.tracy.holgerdanske.com/.zfs/snapshot/*
<snip>
984122	total

2022-09-23 18:31:21 toor@f3 ~
# zfs get all p3/backup/laalaa.tracy.holgerdanske.com | sort | egrep 
'compress|used|dedup'
p3/backup/laalaa.tracy.holgerdanske.com  compression            lz4 
                                          inherited from p3
p3/backup/laalaa.tracy.holgerdanske.com  compressratio          2.16x 
                                          -
p3/backup/laalaa.tracy.holgerdanske.com  dedup                  verify 
                                          inherited from p3/backup
p3/backup/laalaa.tracy.holgerdanske.com  logicalused            52.9G 
                                          -
p3/backup/laalaa.tracy.holgerdanske.com  refcompressratio       1.84x 
                                          -
p3/backup/laalaa.tracy.holgerdanske.com  used                   25.3G 
                                          -
p3/backup/laalaa.tracy.holgerdanske.com  usedbychildren         0 
                                          -
p3/backup/laalaa.tracy.holgerdanske.com  usedbydataset          4.62G 
                                          -
p3/backup/laalaa.tracy.holgerdanske.com  usedbyrefreservation   0 
                                          -
p3/backup/laalaa.tracy.holgerdanske.com  usedbysnapshots        20.6G 
                                          -


So, 4.62G source filesystem size, 203 backups, 961G apparent size of the 
backups, 52.9G de-deduplicated size of the backups, and 25.3G compressed 
and de-deduplicated size of the backups.  So, ZFS de-duplication and 
compression of the backups provided a savings of about 38:1.  Without 
ZFS, I would have far fewer backups.


But de-duplication and compression of other data is debatable. 
Photograph files are already compressed; so ZFS compression will be 
useless.  10 copies of the exact same photograph file should 
de-duplicate nicely.  But, open a photograph file in an editor, make 
some changes, save as another file, and repeat 8 more times is likely to 
result in 10 files all with different blocks; so ZFS de-duplication will 
be useless.


David