Re: du measures in 4K blocks resulting in inaccuracies

From: Greg 'groggy' Lehey <grog_at_freebsd.org>
Date: Fri, 08 Nov 2024 01:08:45 UTC
On Thursday,  7 November 2024 at 15:46:09 +1100, Dewayne Geraghty wrote:
> On 7/11/2024 12:24 pm, Greg 'groggy' Lehey wrote:
>> On Thursday,  7 November 2024 at 12:01:36 +1100, Dewayne Geraghty wrote:
>>> What am I missing?  Should the doc reflect the minimum reporting size is 4K?
>>
>> No, du is reporting correctly.  From the man page: ...

Oh.  Yes, I missed the 4.1k instead of 4.0K, along with this in the
original message:

> but even --si gives
> 4.1k    /projectx/adm/README   # <<< 4.1K??!
> 4.1k    /projectx/adm/gen_pw.sh

"Even" --si.  I had never heard of it.  It doesn't look BSD-like at
all, and I see it was added in August 2017.  Maybe it's a Linuxism.

In any case, the man page is your friend there too:

     --si    "Human-readable" output.  Use unit suffixes: Byte, Kilobyte,
             Megabyte, Gigabyte, Terabyte and Petabyte based on powers of
             1000.

And that works as described:

  === grog@hydra (/dev/pts/11) /var/tmp 558 -> echo foo > bar
  === grog@hydra (/dev/pts/11) /var/tmp 559 -> ls -l bar
  -rw-r--r--  1 grog  wheel  4  8 Nov 10:34 bar
  === grog@hydra (/dev/pts/11) /var/tmp 560 -> du -sk bar
  4       bar
  === grog@hydra (/dev/pts/11) /var/tmp 561 -> du --s bar
  4.1k    bar

Note the distinction between k (1000 bytes) and K (1024 bytes).

> Large files are a bit confusing:
> The file ./src-ports-migration.tar.xz is 11787028, but
> du reports as 11552 while
> 11787028 / 1024 = 11510.
>
> I think I need to read the code to understand, perhaps the inode
> occupies 42?

Interesting.  No, the inode is not counted in the du output (and it's
512 bytes long).  But it seems that the indirect blocks are, and I
can't find that documented anywhere.  Consider:

  === grog@hydra (/dev/pts/35) /var/tmp 3 -> dd if=/dev/zero of=baz bs=384k count=1; ls -l baz; du -sk baz
  393216 bytes transferred in 0.000208 secs (1893921588 bytes/sec)
  -rw-r--r--  1 grog  wheel  393216  8 Nov 11:59 baz
  384     baz
  === grog@hydra (/dev/pts/35) /var/tmp 4 -> dd if=/dev/zero of=baz bs=385k count=1; ls -l baz; du -sk baz
  394240 bytes transferred in 0.000375 secs (1052038886 bytes/sec)
  -rw-r--r--  1 grog  wheel  394240  8 Nov 11:59 baz
  448     baz

The first 12 block pointers (to data blocks of 32 kB each) are in the
inode.  Beyond that they're in indirect blocks.  So a 385 kB file has
13 data blocks (416 kB) and one indirect block (32 kB).  This
continues for all larger files, though at some point second and third
level indirect blocks are assigned, giving steps of 64 or 96 kB.

Thanks for bringing this to my attention.  I'll try to find a way to
document it.

Greg
--
When replying to this message, please copy the original recipients.
If you don't, I may ignore the reply or reply to the original recipients.
For more information, see http://www.lemis.com/questions.html
Sent from my desktop computer.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA.php