Re: du measures in 4K blocks resulting in inaccuracies

From: Dewayne Geraghty <dewayne_at_heuristicsystems.com.au>
Date: Thu, 07 Nov 2024 04:46:09 UTC

On 7/11/2024 12:24 pm, Greg 'groggy' Lehey wrote:
> On Thursday,  7 November 2024 at 12:01:36 +1100, Dewayne Geraghty wrote:
>> I'm trying to size a disk.  Unfortunately /usr/bin/du is misleading.  An
>> example referencing two files, one 60 Bytes, other 908 Bytes:
>>
>> unset BLOCKSIZE
>> echo ;         ls -l /projectx/adm/README /projectx/adm/gen_pw.sh
>> echo "1 ---" ; ls -lh /projectx/adm/README
>> echo "2 ---" ; du /projectx/adm/README
>> echo "3 ---" ; du -ckh /projectx/adm/README /projectx/adm/gen_pw.sh
>>
>> -rw-r-----  1 sysman  wheel   60 Jul 28  2023 /projectx/adm/README  #60B
>> -rwx------  1 sysman  wheel  968 Jul 28  2023 /projectx/adm/gen_pw.sh
>> 1 ---
>> -rw-r-----  1 sysman  wheel    60B Jul 28  2023 /projectx/adm/README
>> 2 ---
>> 8       /projectx/adm/README    <<< 8 sectors
>> 3 ---
>> 4.0K    /projectx/adm/README    <<< min count is 4K, so sectorsize?
>> 4.0K    /projectx/adm/gen_pw.sh
>> 8.0K    total			<<< Expect at most 2K
>>
>> # diskinfo -v /dev/ada2p3
>> /dev/ada2p3
>>          512             # sectorsize
>>
>>
>> Perhaps my understanding is wrong, so to authority "man du"
>> -k      Display block counts in 1024-byte (1 kiB) blocks.  (incorrect)
> 
> What is incorrect about that?

Hmm.  Upon reflection I can see how I missed it.  I didn't factor in 
that the displayed size is actually per fragment size that newfs puts on 
the device, for files less than the fragment size. So for example, the 
disk I used has a sector size of 512 and a block/fragment of 32k/4k. So 
the minimum reported size is 4.0K

When I checked a memory disk that I use that has very many small files - 
newfs'ed with -b 8k -f 1k.  du reported the smallest file as
1    /m/audit/current

Hence my misunderstanding.

> 
>> -h      "Human-readable" output.  Use unit suffixes: Byte, Kilobyte,
>>            Megabyte, Gigabyte, Terabyte and Petabyte based on powers of
>>            1024.    (?  4K)
>>
>> 4.1k    /projectx/adm/README   # <<< 4.1K??!
>> 4.1k    /projectx/adm/gen_pw.sh
>>
>> What am I missing?  Should the doc reflect the minimum reporting size is 4K?
> 
> No, du is reporting correctly.  From the man page:

But the actual file is 60B, 4.0K would be expected not 4.1K

> 
>     The du utility displays the file system block usage for each file
>     argument
> 

Yes.  I didn't take that bit in. Though for small files I think its the 
fragment size. Please refer below for large file example.

> It doesn't describe the size of the files in those blocks.  That's
> particularly the case for files with holes in them, where it can show
> "sizes" that are much less than the file size.
> 
> You don't say what file system you're using, but it looks like UFS.
> By default, a UFS 2 file system has 32 kB blocks and 4 kB fragments.
> Files less than 4 kB in size allocate one fragment, and that's what
> you're seeing.
> 
> The real question is: what are you trying to do?  "Size a disk"
> suggests that you really do want to know how much storage is being
> used.  In this case, du is your friend.  Your 60 byte file really does
> use 4 kB on disk.
> 

Thank-you you've answered the question - the minimum size reported by du 
is the fragment size of the device, for each file less than frag size.

Yes, I really did want to make a sizing estimate based on fragments 
used. I hadn't realised that du was being helpful. :)

I was mislead by references to 1K &/or block sizes.

Yes, its ufs2, apologies for being tardy in my description (zfs doesn't 
provide multi-labels per my mac_biba need, so its out-of-mind). ;)

Large files are a bit confusing:
The file ./src-ports-migration.tar.xz is 11787028, but
du reports as 11552 while
11787028 / 1024 = 11510.

I think I need to read the code to understand, perhaps the inode 
occupies 42?  Thank-you for providing some clues. :)

Kind regards, Dewayne.