Re: SEEK_HOLE at EOF

From: alan somers <asomers_at_gmail.com>
Date: Fri, 05 Apr 2024 14:13:18 UTC
On Fri, Apr 5, 2024 at 7:54 AM Poul-Henning Kamp <phk@phk.freebsd.dk> wrote:
>
> --------
> Alan Somers writes:
> > On Thu, Apr 4, 2024 at 11:43=E2=80=AFPM Poul-Henning Kamp <phk@phk.freebsd.=
> > dk> wrote:
>
> > > Just two minor quibbles:
> > >
> > > If the file position is EOF, then you /are/ "beyond the end of the file"
> > > because a read(2) would not be able to return any data.
> >
> > Do you distinguish between "at EOF" and "beyond EOF"?  And does it not
> > trouble you that calling SEEK_HOLE from the beginning of the "virtual
> > hole at EOF" will return ENXIO, even though calling SEEK_HOLE from the
> > beginning of any real hole will return the current offset?
>
> EOF is where the file ends and there's no "hole" there, because there
> no more file on the other side of that "hole".
>
> When you stand on a cliff, the ocean is not "a hole in the landscape",
> it's where the landscape ends.

Except there is a hole at EOF, a virtual hole.  The draft spec
specifically says "all seekable files shall have a virtual hole
starting at the
current size of the file".

>
> > > And returning ENXIO is more informative than returning the size of the
> > > file, since it atomically tells you that there are no more holes.
> >
> > Ahh, that's a good point.  It's the first point I've heard in favor of
> > this option.  Are you aware of any applications that need to know
> > that?
>
> No, but that should not get in the way of good syscall architecture :-)
>
> It might be useful for archivers which try to be smart about sparse files.

I imagine that most archivers would work like this:
ofs = 0
loop {
    let start = lseek(fd, ofs, SEEK_DATA);
    if ENXIO {
        // No more data regions
        break
    }
    let end = lseek(fd, ofs, SEEK_HOLE);
    assert!(!ENXIO) // thanks to the virtual hole, we should never
have ENXIO here
    copy(fd, start, end - start, ...)
    ofs = end
}
truncate(output_file, fd.fsize)

Since archivers really only care about data regions, not holes, I
don't think that they would usually call SEEK_HOLE at EOF.

>
> --
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk@FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD committer       | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by incompetence.