RFC: copy_file_range(3)

Wed Sep 23 01:18:21 UTC 2020

Alan Somers wrote:
[lots of stuff snipped]
>1) In order to quickly respond to a signal, a program must use a modest len with >copy_file_range
For the programs you have mentioned, I think the only signal handling would
be termination (<ctrl>C or SIGTERM if you prefer).
I'm not sure what is a reasonable response time for this.
I'd like to hear comments from others?
- 1sec, less than 1sec, a few seconds, ...

> 2) If a hole is larger than len, that will cause vn_generic_copy_file_range to
> truncate the output file to the middle of the hole.  Then, in the next invocation, 
> truncate it again to a larger size.
> 3) The result is a file that is not as sparse as the original.
Yes. So, the trick is to use the largest "len" you can live with, given how long you
are willing to wait for signal processing.

> For example, on UFS:
> $ truncate -s 1g sparsefile
Not a very interesting sparse file. I wrote a little program to create one.
> $ cp sparsefile sparsefile2
> $ du -sh sparsefile*
>  96K sparsefile
>  32M sparsefile2
>
> My idea for a userland wrapper would solve this problem by using 
> SEEK_HOLE/SEEK_DATA to copy holes in their entirety, and use copy_file_range for
> everything else with a modest len.  Alternatively, we could eliminate the need for
> the wrapper by enabling copy_file_range for every file system, and making 
> vn_generic_copy_file_range interruptible, so copy_file_range can be called with 
> large len without penalizing signal handling performance.

Well, I ran some quick benchmarks using the attached programs, plus "cp" both
before and with your copy_file_range() patch.
copya - Does what I think your plan is above, with a limit of 2Mbytes for "len".
copyb -Just uses copy_file_range() with 128Mbytes for "len".

I first created the sparse file with createsparse.c. It is admittedly a worst case,
creating alternating holes and data blocks of the minimum size supported by
the file system. (I ran it on a UFS file system created with defaults, so the minimum
hole size is 32Kbytes.)
The file is 1Gbyte in size with an Allocation size of 524576 ("ls -ls").

I then ran copya, copyb, old-cp and new-cp. For NFS, I redid the mount before
each copy to avoid data caching in the client.
Here's what I got:
                      Elapsed time           #RPCs                  Allocation size ("ls -ls" on server)
NFSv4.2    
copya             39.7sec          16384copy+32768seek       524576
copyb             10.2sec          104copy                              524576
old-cp             21.9sec          16384read+16384write      1048864
new-cp            10.5sec          1024copy                            524576

NFSv4.1
copya             21.8sec          16384read+16384write      1048864
copyb             21.0sec          16384read+16384write      1048864
old-cp             21.8sec          16384read+16384write      1048864
new-cp           21.4sec           16384read+16384write      1048864

Local on the UFS file system
copya             9.2sec                       n/a                             524576
copyb             8.0sec                       n/a                             524576
old-cp            15.9sec                      n/a                            1048864
new-cp           7.9sec                        n/a                             524576

So, for a NFSv4.2 mount, using SEEK_DATA/SEEK_HOLE is definitely
a performance hit, due to all the RPC rtts.
Your patched "cp" does fine, although a larger "len" reduces the
RPC count against the server.
All variants using copy_file_range() retain the holes.

For NFSv4.1, it (not surprisingly) doesn't matter, since only NFSv4.2
supports SEEK_DATA/SEEK_HOLE and VOP_COPY_FILE_RANGE().

For UFS, everything using copy_file_range() works pretty well and
retains the holes.
Although "copya" is guaranteed to retain the holes, it does run noticably
slower than the others. Not sure why? Does the extra SEEK_DATA/SEEK_HOLE
syscalls cost that much?

The limitation of not using SEEK_DATA/SEEK_HOLE is that you will not
retain holes that straddle the byte range copied by two subsequent
copy_file_range(2) calls.
--> This can be minimized by using a large "len", but that large "len"
      results in slower response to signal handling.

I've attached the little programs, so you can play with them.
(Maybe try different sparse schemes/sizes? It might be fun to
 make the holes/blocks some random multiple of hole size up
 to a limit?)

rick
ps: In case he isn't reading hackers these days, I've added kib@
      as a cc. He might know why UFS is 15% slower when SEEK_HOLE
      SEEK_DATA is used.

-Alan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: copyb.c
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20200923/75fb5f75/attachment.c>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: createsparse.c
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20200923/75fb5f75/attachment-0001.c>