Re: git: 5b353925ff61 - main - vnode read(2)/write(2): acquire rangelock regardless of do_vn_io_fault()
- Reply: Konstantin Belousov : "Re: git: 5b353925ff61 - main - vnode read(2)/write(2): acquire rangelock regardless of do_vn_io_fault()"
- In reply to: Konstantin Belousov : "git: 5b353925ff61 - main - vnode read(2)/write(2): acquire rangelock regardless of do_vn_io_fault()"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 28 Jul 2023 00:17:51 UTC
On 7/25/23, Konstantin Belousov <kib@freebsd.org> wrote: > The branch main has been updated by kib: > > URL: > https://cgit.FreeBSD.org/src/commit/?id=5b353925ff61b9ddb97bb453ba75278b578ed7d9 > > commit 5b353925ff61b9ddb97bb453ba75278b578ed7d9 > Author: Konstantin Belousov <kib@FreeBSD.org> > AuthorDate: 2023-07-23 15:55:50 +0000 > Commit: Konstantin Belousov <kib@FreeBSD.org> > CommitDate: 2023-07-24 22:02:59 +0000 > > vnode read(2)/write(2): acquire rangelock regardless of > do_vn_io_fault() > > To ensure atomicity of reads against parallel writes and truncates, > vnode lock was not enough at least since introduction of vn_io_fault(). > That code only take rangelock when it was possible that vn_read() and > vn_write() could drop the vnode lock. > > At least since the introduction of VOP_READ_PGCACHE() which generally > does not lock the vnode at all, rangelocks become required even > for filesystems that do not need vn_io_fault() workaround. For > instance, tmpfs. > Is there a bug with pgcache reads disabled (as in when the vnode lock is held for reads?) Note this patch adds 2 lock trips which were previously not present, which has to slow things down single-threaded, but I did not bother measuring that part. As this adds to vnode-wide *lock* acquires this has to very negatively affect scalability. This time around I ran: ./readseek3_processes -t 10 (10 workers reading from *disjoint* offsets from the same vnode. this in principle can scale perfectly) I observed a 90% drop in performance: before: total:25723459 ops/s after: total:2455794 ops/s Going to an unpatched kernel and disabling pgcache reads instead: disabled: total:6522480 ops/s or about 2.6x of performance of the current kernel In other words I think the thing to do here is to revert the patch and instead flip pgcache reads to off by default until a better fix can be implemented. > PR: 272678 > Analyzed and reviewed by: Andrew Gierth > <andrew@tao11.riddles.org.uk> > Sponsored by: The FreeBSD Foundation > MFC after: 1 week > Differential revision: https://reviews.freebsd.org/D41158 > --- > sys/kern/vfs_vnops.c | 21 ++++++++++++++------- > 1 file changed, 14 insertions(+), 7 deletions(-) > > diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c > index 83e95731d7c4..306840ff0357 100644 > --- a/sys/kern/vfs_vnops.c > +++ b/sys/kern/vfs_vnops.c > @@ -1443,6 +1443,7 @@ vn_io_fault(struct file *fp, struct uio *uio, struct > ucred *active_cred, > void *rl_cookie; > struct vn_io_fault_args args; > int error; > + bool rl_locked; > > doio = uio->uio_rw == UIO_READ ? vn_read : vn_write; > vp = fp->f_vnode; > @@ -1465,12 +1466,7 @@ vn_io_fault(struct file *fp, struct uio *uio, struct > ucred *active_cred, > } > > foffset_lock_uio(fp, uio, flags); > - if (do_vn_io_fault(vp, uio)) { > - args.kind = VN_IO_FAULT_FOP; > - args.args.fop_args.fp = fp; > - args.args.fop_args.doio = doio; > - args.cred = active_cred; > - args.flags = flags | FOF_OFFSET; > + if (vp->v_type == VREG) { > if (uio->uio_rw == UIO_READ) { > rl_cookie = vn_rangelock_rlock(vp, uio->uio_offset, > uio->uio_offset + uio->uio_resid); > @@ -1482,11 +1478,22 @@ vn_io_fault(struct file *fp, struct uio *uio, struct > ucred *active_cred, > rl_cookie = vn_rangelock_wlock(vp, uio->uio_offset, > uio->uio_offset + uio->uio_resid); > } > + rl_locked = true; > + } else { > + rl_locked = false; > + } > + if (do_vn_io_fault(vp, uio)) { > + args.kind = VN_IO_FAULT_FOP; > + args.args.fop_args.fp = fp; > + args.args.fop_args.doio = doio; > + args.cred = active_cred; > + args.flags = flags | FOF_OFFSET; > error = vn_io_fault1(vp, uio, &args, td); > - vn_rangelock_unlock(vp, rl_cookie); > } else { > error = doio(fp, uio, active_cred, flags | FOF_OFFSET, td); > } > + if (rl_locked) > + vn_rangelock_unlock(vp, rl_cookie); > foffset_unlock_uio(fp, uio, flags); > return (error); > } > -- Mateusz Guzik <mjguzik gmail.com>