data, metadata, backup, and archive integrity and correction

From: David Christensen <dpchrist_at_holgerdanske.com>
Date: Wed, 21 Sep 2022 10:58:47 UTC
On 9/21/22 00:33, Ralf Mardorf wrote:
 > On Tue, 2022-09-20 at 12:00 -0700, David Christensen wrote:
 >> For off-line back up disks, I find mobile racks to be more reliable
 >> than USB/ Firewire/ eSATA:
 >
 > Hi,
 >
 > I tested a lot of casings and started using casings that have both,
 > USB3 <= 5 Gbit/s and eSATA <= 3Gbit/s plugs and that are powered by
 > their own power supply. I don't know if everything is powered by the
 > casings' power supply, parts might still be bus powered. The firmware
 > of the casings has got no enforced power saving feature, hence the
 > drives are always spinning, the heads never park, the drives are
 > always ready for action. USB was reliable when using those casings for
 > years and it still is almost reliable. However, "was reliable" +
 > "still is almost reliable" = unreliable.
 >
 > In my experiences eSATA <= 3Gbit/s is reliable, but way too slow.
 >
 > I never used a mobile rack, but this is something I consider to use in
 > the future, too. Unfortunately I'm using the external drives by
 > rotation not only to backup data from a tower/desktop PC that can hold
 > a rack mount. I'm also using drives with iPadOS, that can only access
 > an external drive via USB.
 >
 > It's not possible to completely abandon USB drives. Once data is saved
 > by USB and verified it's safe. If restoring data from an USB drive
 > fails, it's still possible to remove the HDD from the casing and to
 > connected it by SATA. The casings I'm using provide eSATA, hence I
 > even don't need to open the casing.
 >
 > Fazit: USB drives are a PITA. Most even don't fit the category "was
 > reliable" + "still is almost reliable", they are often completely
 > useless, only working for Windows users, that every now and then move a
 > few GiB and for users that never verify their archives. Many users
 > notice that their archives are corrupted, when they try to restore
 > data from an archive, because they never listed the contend after
 > creating an archive with exit status 0. The exit status 0 from
 > creating an archive with tar doesn't grant that an archive isn't
 > corrupted, it only says that no error was noticed, not that no error
 > happened.


Integrity checking of data and metadata is important -- both for live 
data and, especially, for backups and archives.  When corruption is 
detected (e.g. damaged optical media, bad blocks/ cells, "bit rot"), a 
correction mechanism is desirable.


Traditional filesystems (UFS, ext4), volumes (geom, LVM), RAID (geom, 
md), etc., may detect failing or failed drives, but are may not detect 
all forms of corruption.


I use Debian Stable on desktops/ workstations and have read about the 
Linux dm-integrity layer, but dm-integrity does not seem to be fully 
integrated into Debian (yet).


AIUI both ZFS and btrfs both implement integrity checking of data and 
metadata, and can automatically correct corruption if redundancy is 
provisioned.  I tried btrfs on Debian and found it to be lacking.  ZFS 
is mature and fully integrated on FreeBSD.  I migrated my servers to 
FreeBSD and ZFS.


MD5/ SHA256 checksum files are multi-platform, but only cover the data 
and only provide pass/fail detect for whole files.  mtree(8) adds 
integrity checking for most Unix metadata, but I am unsure if mtree(8) 
covers ACL's.  mtree(8) needs a specification incremental update feature 
for practical use on large data stores.  mtree(8) is not well supported 
on platforms other than the BSD's.  I have written scripts when I wanted 
these kinds of checks.


Restoring backups is a worthwhile exercise.  But, then you need to 
validate the restored copy against the original.  If the original has 
changed over time, you need something like saved mtree(8) 
specifications.  If the validation fails, how do you correct?


lzip(1) is an archiver with integrity and correction features.  I need 
to evaluate it.


ZFS replication involves producing and consuming a replication stream. 
The replication stream can be saved to a file and consumed later; by the 
same computer and/or by one or more other computers.  This provides new 
possibilities for backup and restore.  I need to explore them.


David