Re: filesystem full showing -29G

From: Edward Sanford Sutton, III <mirror176_at_hotmail.com>
Date: Sat, 26 Oct 2024 08:14:55 UTC
On 10/25/24 23:12, Matthias Apitz wrote:
> Hello,
> 
> I've a bunch of external USB disks to where I copy every month the
> following three files, last on September 20:
> 
> -r--------  1 root wheel   70G 20 sept. 10:34 guru-20240920.tar.gz
> -r--r-----  1 root wheel   62B 20 sept. 10:44 guru-20240920.tar.gz.md5
> -r--r-----  1 root wheel   59M 20 sept. 10:52 guru-20240920.tar.gz.lst
> 
> The suffixes say what they contain, esp. the MD5 hash of the tar
> archive.
> 
> The yesterday's copy ended up with no space left on device. The curious
> thing is that it shows -29G:

   That should have likely only happened if transferring it as root or 
if limits were adjusted after the transfer. Unless you had an error 
during the transfer, you should expect it went through all the way. You 
will need to delete data if you need.

> # df -kh /mnt/backups/
> Filesystem    Size    Used   Avail Capacity  Mounted on
> /dev/da0p4    652G    629G    -29G   105%    /mnt
> 
> How is this possible?

   I presume this is with UFS though people also use other choices like 
exfat, ntfs, and even ZFS on external drives. Knowing the filesystem 
could help know what properties it has.
   Filesystems usually have reserved free space. Performing activities 
as root will normally allow you to exceed the normal reserve but even 
then its common that a little bit of room is kept for some of its own 
housekeeping.
   Filesystems like ZFS also require having available space to write to 
when a delete is to take place; if ZFS becomes full, it could become 
difficult to impossible to get it unstuck unless the pool can be grown 
(larger partition/disk). Even root is supposed to not be allowed to use 
such final amounts of free space. Following OpenZFS bug reports shows 
users do still sometimes reach such a condition and if it occurs then 
detailed bug reports are important.
   Separately, ZFS in particular can lead you down unexpected results if 
using external tools that are not ZFS aware like df; they make 
assumptions that are not true on ZFS like the size of the partition not 
changing.

> I will later re-calculate the MD5 hash of the last tar archive guru-20240920.tar.gz
> and compare it with what is stored in guru-20240920.tar.gz.md5

   If using md5 as a verification against corruption, gzip also contains 
crc32 checksum though a second check with a second algorithm can further 
limit a chance of different data having the same hash. If it is to 
identify tampering then there are other more secure algorithms you might 
consider in place of md5 or use public/private key signing.
   If you have the source, you can try comparing the source and 
destination size with `ls -l` and without the -h parameter to get a more 
precise byte count. You could also use a tool like jdupes to compare 
source vs destination byte for byte.

   Some separate considerations to make backups and restores faster and 
using less resources:
   Unless you need compatibility with older systems that have limited 
archive format support, you could consider using compressors that can 
make smaller archives more quickly. With gzip you may be CPU 
bottlenecked for writing the compressed archive. During a copy your USB 
drives are either limited by the drive speed or USB speed for writing 
and reading; any additional compression ratio = data moves to and from 
the drives faster. Using zstd will usually allow you to compress data 
smaller while using less CPU and extracting should run much faster.
   If these archives are created and copied regularly to store mostly 
the same data with just incremental changes, you could evaluate if other 
archival tools that support incremental archiving are better suited like 
zpaqfranz. Such tools can help only store+transfer what has changed, 
store multiple revisions without taking the full space for each copy and 
deduplicate data in case of multiple copies of the same file end up in it.
   If using ZFS for the original and backup, you could use zfs 
replication which has several differences. You can perform incremental 
transfers and only changed blocks within the filesystem will be read and 
transferred. ZFS compressed files can be transferred without 
decompressing/recompressing or you can do so to make them smaller on the 
destination; Restoring will then have less data to read from disk so it 
restores faster and again can be transferred with or without 
recompressing so your source disk can get the additional space savings. 
Higher compression likely requires a bit more RAM when files are read 
but on the other hand ZFS ARC (=RAM cache) will hold the compressed 
version of the files so your cache can hold more file data. ZFS with 
zstd compression benefits from one thread per block allowing 
muntithreaded decompression which zstd otherwise doesn't have yet for 
the standalone program. If you received it to a ZFS pool instead of 
storing the replication stream into a file, you will have quick access 
to any file in the backup without having to decompress/extract anything 
more than what you need.
   Unfortunately compressing each ZFS record (usually 128k) separately 
usually performs worse than compressing the files as a whole using a 
comparable compressor and archiving multiple (preferably similar) files 
gives even better results.

> Thanks
> 
> 	matthias
>