Re: git: 5b353925ff61 - main - vnode read(2)/write(2): acquire rangelock regardless of do_vn_io_fault()

From: Mateusz Guzik <mjguzik_at_gmail.com>
Date: Fri, 28 Jul 2023 00:17:51 UTC
On 7/25/23, Konstantin Belousov <kib@freebsd.org> wrote:
> The branch main has been updated by kib:
>
> URL:
> https://cgit.FreeBSD.org/src/commit/?id=5b353925ff61b9ddb97bb453ba75278b578ed7d9
>
> commit 5b353925ff61b9ddb97bb453ba75278b578ed7d9
> Author:     Konstantin Belousov <kib@FreeBSD.org>
> AuthorDate: 2023-07-23 15:55:50 +0000
> Commit:     Konstantin Belousov <kib@FreeBSD.org>
> CommitDate: 2023-07-24 22:02:59 +0000
>
>     vnode read(2)/write(2): acquire rangelock regardless of
> do_vn_io_fault()
>
>     To ensure atomicity of reads against parallel writes and truncates,
>     vnode lock was not enough at least since introduction of vn_io_fault().
>     That code only take rangelock when it was possible that vn_read() and
>     vn_write() could drop the vnode lock.
>
>     At least since the introduction of VOP_READ_PGCACHE() which generally
>     does not lock the vnode at all, rangelocks become required even
>     for filesystems that do not need vn_io_fault() workaround.  For
>     instance, tmpfs.
>

Is there a bug with pgcache reads disabled (as in when the vnode lock
is held for reads?)

Note this patch adds 2 lock trips which were previously not present,
which has to slow things down single-threaded, but I did not bother
measuring that part.

As this adds to vnode-wide *lock* acquires this has to very negatively
affect scalability.

This time around I ran: ./readseek3_processes -t 10 (10 workers
reading from *disjoint* offsets from the same vnode. this in principle
can scale perfectly)

I observed a 90% drop in performance:
before: total:25723459 ops/s
after: total:2455794 ops/s

Going to an unpatched kernel and disabling pgcache reads instead:
disabled: total:6522480 ops/s

or about 2.6x of performance of the current kernel

In other words I think the thing to do here is to revert the patch and
instead flip pgcache reads to off by default until a better fix can be
implemented.

>     PR:     272678
>     Analyzed and reviewed by:       Andrew Gierth
> <andrew@tao11.riddles.org.uk>
>     Sponsored by:   The FreeBSD Foundation
>     MFC after:      1 week
>     Differential revision:  https://reviews.freebsd.org/D41158
> ---
>  sys/kern/vfs_vnops.c | 21 ++++++++++++++-------
>  1 file changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
> index 83e95731d7c4..306840ff0357 100644
> --- a/sys/kern/vfs_vnops.c
> +++ b/sys/kern/vfs_vnops.c
> @@ -1443,6 +1443,7 @@ vn_io_fault(struct file *fp, struct uio *uio, struct
> ucred *active_cred,
>  	void *rl_cookie;
>  	struct vn_io_fault_args args;
>  	int error;
> +	bool rl_locked;
>
>  	doio = uio->uio_rw == UIO_READ ? vn_read : vn_write;
>  	vp = fp->f_vnode;
> @@ -1465,12 +1466,7 @@ vn_io_fault(struct file *fp, struct uio *uio, struct
> ucred *active_cred,
>  	}
>
>  	foffset_lock_uio(fp, uio, flags);
> -	if (do_vn_io_fault(vp, uio)) {
> -		args.kind = VN_IO_FAULT_FOP;
> -		args.args.fop_args.fp = fp;
> -		args.args.fop_args.doio = doio;
> -		args.cred = active_cred;
> -		args.flags = flags | FOF_OFFSET;
> +	if (vp->v_type == VREG) {
>  		if (uio->uio_rw == UIO_READ) {
>  			rl_cookie = vn_rangelock_rlock(vp, uio->uio_offset,
>  			    uio->uio_offset + uio->uio_resid);
> @@ -1482,11 +1478,22 @@ vn_io_fault(struct file *fp, struct uio *uio, struct
> ucred *active_cred,
>  			rl_cookie = vn_rangelock_wlock(vp, uio->uio_offset,
>  			    uio->uio_offset + uio->uio_resid);
>  		}
> +		rl_locked = true;
> +	} else {
> +		rl_locked = false;
> +	}
> +	if (do_vn_io_fault(vp, uio)) {
> +		args.kind = VN_IO_FAULT_FOP;
> +		args.args.fop_args.fp = fp;
> +		args.args.fop_args.doio = doio;
> +		args.cred = active_cred;
> +		args.flags = flags | FOF_OFFSET;
>  		error = vn_io_fault1(vp, uio, &args, td);
> -		vn_rangelock_unlock(vp, rl_cookie);
>  	} else {
>  		error = doio(fp, uio, active_cred, flags | FOF_OFFSET, td);
>  	}
> +	if (rl_locked)
> +		vn_rangelock_unlock(vp, rl_cookie);
>  	foffset_unlock_uio(fp, uio, flags);
>  	return (error);
>  }
>


-- 
Mateusz Guzik <mjguzik gmail.com>