RFC: copy_file_range(3)

Wed Sep 23 15:08:10 UTC 2020

Rick Macklem wrote:
>Alan Somers wrote:
>[lots of stuff snipped]
>>1) In order to quickly respond to a signal, a program must use a modest len with >>copy_file_range
>For the programs you have mentioned, I think the only signal handling would
>be termination (<ctrl>C or SIGTERM if you prefer).
>I'm not sure what is a reasonable response time for this.
>I'd like to hear comments from others?
>- 1sec, less than 1sec, a few seconds, ...
>
>> 2) If a hole is larger than len, that will cause vn_generic_copy_file_range to
>> truncate the output file to the middle of the hole.  Then, in the next invocation,
>> truncate it again to a larger size.
>> 3) The result is a file that is not as sparse as the original.
>Yes. So, the trick is to use the largest "len" you can live with, given how long you
>are willing to wait for signal processing.
>
>> For example, on UFS:
>> $ truncate -s 1g sparsefile
>Not a very interesting sparse file. I wrote a little program to create one.
>> $ cp sparsefile sparsefile2
>> $ du -sh sparsefile*
>>  96K sparsefile
>>  32M sparsefile2
Btw, this happens because, at least for UFS (not sure about other file
systems), if you grow a file's size via VOP_SETATTR() of size, it allocates a
block at the new EOF, even though no data has been written there.
--> This results in one block being allocated at the end of the range used
    for a copy_file_range() call, if that file offset is within a hole.
    --> The larger the "len" argument, the less frequently it will occur.

>>
>> My idea for a userland wrapper would solve this problem by using
>> SEEK_HOLE/SEEK_DATA to copy holes in their entirety, and use copy_file_range for
>> everything else with a modest len.  Alternatively, we could eliminate the need for
>> the wrapper by enabling copy_file_range for every file system, and making
>> vn_generic_copy_file_range interruptible, so copy_file_range can be called with
>> large len without penalizing signal handling performance.
>
>Well, I ran some quick benchmarks using the attached programs, plus "cp" both
>before and with your copy_file_range() patch.
>copya - Does what I think your plan is above, with a limit of 2Mbytes for "len".
>copyb -Just uses copy_file_range() with 128Mbytes for "len".
>
>I first created the sparse file with createsparse.c. It is admittedly a worst case,
>creating alternating holes and data blocks of the minimum size supported by
>the file system. (I ran it on a UFS file system created with defaults, so the minimum
>>hole size is 32Kbytes.)
>The file is 1Gbyte in size with an Allocation size of 524576 ("ls -ls").
>
>I then ran copya, copyb, old-cp and new-cp. For NFS, I redid the mount before
>each copy to avoid data caching in the client.
>Here's what I got:
>                      Elapsed time           #RPCs                  Allocation size ("ls -ls" on server)
>NFSv4.2
>copya             39.7sec          16384copy+32768seek       524576
>copyb             10.2sec          104copy                              524576
When I ran the tests I had vfs.nfs.maxcopyrange set to 128Mbytes on the
server. However it was still the default of 10Mbytes on the client,
so this test run used 10Mbytes per Copy. (I wondered why it did 104 Copyies?)
With both set to 128Mbytes I got:
copyb                10.0sec          8copy                                  524576
>old-cp             21.9sec          16384read+16384write      1048864
>new-cp            10.5sec          1024copy                            524576
>
>NFSv4.1
>copya             21.8sec          16384read+16384write      1048864
>copyb             21.0sec          16384read+16384write      1048864
>old-cp             21.8sec          16384read+16384write      1048864
>new-cp           21.4sec           16384read+16384write      1048864
>
>Local on the UFS file system
>copya             9.2sec                       n/a                             524576
This turns out to be just variability in the test. I get 7.9sec->9.2sec
for runs of all three of copya, copyb and new-cp for UFS.
I think it is caching related, since I wasn't unmounting/remounting the
UFS file system between test runs.
>copyb             8.0sec                       n/a                             524576
>old-cp            15.9sec                      n/a                            1048864
>new-cp           7.9sec                        n/a                             524576
>
>So, for a NFSv4.2 mount, using SEEK_DATA/SEEK_HOLE is definitely
>a performance hit, due to all the RPC rtts.
>Your patched "cp" does fine, although a larger "len" reduces the
>RPC count against the server.
>All variants using copy_file_range() retain the holes.
>
>For NFSv4.1, it (not surprisingly) doesn't matter, since only NFSv4.2
>supports SEEK_DATA/SEEK_HOLE and VOP_COPY_FILE_RANGE().
>
>For UFS, everything using copy_file_range() works pretty well and
>retains the holes.

>Although "copya" is guaranteed to retain the holes, it does run noticably
>slower than the others. Not sure why? Does the extra SEEK_DATA/SEEK_HOLE
>syscalls cost that much?
Ignore this. It was just variability in the test runs.

>The limitation of not using SEEK_DATA/SEEK_HOLE is that you will not
>retain holes that straddle the byte range copied by two subsequent
>copy_file_range(2) calls.
This statement is misleading. These holes are partially retained, but there
will be a block allocated (at least for UFS) at the boundary, due the property of
growing a file via VOP_SETATTR(size) as noted above.

>--> This can be minimized by using a large "len", but that large "len"
>      results in slower response to signal handling.
I'm going to play with "len" to-day and come up with some numbers
w.r.t. signal handling response time vs the copy_file_range() "len" argument.

>I've attached the little programs, so you can play with them.
>(Maybe try different sparse schemes/sizes? It might be fun to
> make the holes/blocks some random multiple of hole size up
> to a limit?)
>
>rick
>ps: In case he isn't reading hackers these days, I've added kib@
>      as a cc. He might know why UFS is 15% slower when SEEK_HOLE
>      SEEK_DATA is used.

rick

-Alan