Re: SEEK_DATA/SEEK_HOLE with vnode locked

From: Rick Macklem <rmacklem_at_uoguelph.ca>
Date: Sun, 21 Aug 2022 22:19:56 UTC
Konstantin Belousov <kostikbel@gmail.com> wrote:
> On Sun, Aug 21, 2022 at 12:02:48AM +0000, Rick Macklem wrote:
> > Just to summarize this...
> > I was able to do a VOP_SEEK() which would be called with a
> > LK_SHARED locked vnode and it seemed to work fine.
> >
> > However, ReadPlus (which is like Read, but allows for
> > holes to be represented as <offset, length> in the reply
> > instead of a stream of 0 bytes) seems to be a performance
> > dud.
> >
> > I was surprised how poorly it performed compares to ordinary
> > Read. Typically it would take 60% longer to read a file. I tried
> > sparse and non-sparse files of various sizes and they always
> > took longer. (If I disabled SEEK_DATA/SEEK_HOLE in the server
> > code, so it never actually did holes, it worked comparably to
> > regular Read, so somehow the overhead of doing SEEK_DATA/SEEK_HOLE
> > was a big performance hit. It was using LK_SHARED locks, so
> > it wasn't serializing the reads, but I don't really know why it
> > performed so poorly?)
> What filesystem did you used on server?
The 60% slower was for tests like this with UFS:
- I created a file with a 1Gbyte hole, followed by 1Gbyte of data.
- Then I read the file with "time dd if=<file> of=/dev/null bs=10M"
  after remounting over NFS (to avoid NFS client caching).
Here's the elapsed time for 4 runs for a UFS exported fs:
Read                              ReadPlus
20.4, 4.3, 4.6, 4.3            18.7, 7.6, 7.7, 7.3
(The first run was right after booting, so there was nothing
 cached within UFS.)
--> So, as you can see, it took about 60% longer via ReadPlus.

Now, what about the same test on an exported ZFS fs:
Read                                ReadPlus
6.4, 5.7, 5.6, 5.4                110.8, 113.3, 110.7, 110.9
--> Yep, only about 20 times (or 2000% longer).

For a kernel build over NFS, it took about 70% longer
when on a ZFS exported fs (I can't remember the UFS
number, but it was significantly longer.)

So, yes, ZFS is a lot worse, but UFS is bad enough that
I can't imagine anyone using ReadPlus instead of ordinary
Read?

LANs have gobs of bandwidth these days. WANs might
benefit from the lack of long streams of 0 bytes, but some
(like my little DSL modem for my internet connection) will
compress them out anyhow, I think?

> >
> > Anyhow, unless the performance issue gets resolved, there is
> > no reason to commit the code to FreeBSD's main.
> > (NFSv4.2 operations, like ReadPlus, are all optional and are not
> >  required for an RFC conformant implementation.)
> 
> Why not commit?  It might make sense to add it, but guard under some
> knob.
Commit it with a "never use this, performance is terrible" doesn't
make a lot of sense to me, unless the ZFS performance issue
were somehow resolved.

I am now actually concerned about copy_file_range(2), which uses
SEEK_HOLE/SEEK_DATA. There is a patch under review that at least
increases the blocksize for ZFS, but the effect of disabling the use of
SEEK_HOLE/SEEK_DATA in copy_file_range(2) also needs to be
explored.
--> Retaining holes as unallocated regions is nice, but at the very
      least, it could compare va_size with va_bytes to decide if there
      are holes worth looking for.

rick