RFC: What should a copy_file_range(2) syscall do by default?
Rick Macklem
rmacklem at uoguelph.ca
Sat Jun 22 16:02:01 UTC 2019
Hi,
sef@ made this comment on phabricator. I don't believe phabricator is the correct
place for "big picture" discussions, so I'm posting it here (I'm assuming sef@ doesn't
mind, since the phabricator comments are public).
sef@ wrote:
>This much work in the kernel for what //should// be user-space makes me twitchy... >but there is lots of precedent for it, so I obviously have to get with the times.
>
> I've done a quick review of the code; it seems most of the complexity is in the hole->detection. I'm also annoyed that linux used size_t for the amount to copy, when >off_t would have been more appropriate. But not much to do about that now.
>
> Having a default implementation means that user-space can't fall back if it's not >supported, and do it better (e.g., parallel I/O). Should we also have a pathconf for >the feature?
>
> WRT your question on -fs, I have no objections to this working cross-filesystem, >although I think I might ask to have a flag to fail in that case.
Well, all I am interested in is a system call/VOP call so the NFSv4.2 client can do
a file copy locally on the NFS server instead of doing Reads/Writes across the wire.
The current code has gotten fairly complex, so I'll try and ask "how complex" this
syscall/VOP call should be?
The range of variants I can think of are:
0) - Don't do it at all.
1) - The syscall could just do a VOP_COPY_FILE_RANGE() and return whatever error
it returns.
--> This implies an error return for all file systems for now, with support for
NFSv4.2mounts being added later (FreeBSD13 hopefully).
2) - The syscall could fall back on a simple copy loop, but not try to deal with holes.
--> The Linux man page mentions using copy_file_range(2) in a loop with
lseek(SEEK_DATA)/lseek(SEEK_HOLE) for sparse files. This suggests that
the Linux fallback code doesn't try to handle holes.
3) - The current patch which tries to handle holes and copy the entire byte range
in one call.
As sef@ mentions, there is also the question of handling copying across multiple
file systems. I asked about this before and I only got the one response, which was
"do it". I have seen a discussion of adding cross-mount to the syscall for Linux, but
I don't know if/when the Linux one might support that. (They have not created
a "flag" option for this, as far as I've seen.)
It happens without additional complexity for #2 and #3 above.
Linux discussions have talked about improved performance for local file systems
based on reduced # of system calls, but I have not seen any data to show what,
if any, performance improvement has been observed. (The slow hardware I have
to test on won't be useful for performance evaluation.)
So, what do others think w.r.t. the above? rick
More information about the freebsd-fs
mailing list