Re: livelock in vfs_bio_getpages with vn_io_fault_uiomove

Reply: Konstantin Belousov : "Re: livelock in vfs_bio_getpages with vn_io_fault_uiomove"
In reply to: Konstantin Belousov : "Re: livelock in vfs_bio_getpages with vn_io_fault_uiomove"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Alan Somers <asomers_at_freebsd.org>
Date: Fri, 01 Oct 2021 03:59:11 UTC
On Thu, Sep 30, 2021 at 9:53 PM Konstantin Belousov <kostikbel@gmail.com> wrote:
>
> On Thu, Sep 30, 2021 at 09:33:24PM -0600, Alan Somers wrote:
> > On Thu, Sep 30, 2021 at 9:08 PM Konstantin Belousov <kostikbel@gmail.com> wrote:
> > >
> > > On Thu, Sep 30, 2021 at 03:11:02PM -0600, Alan Somers wrote:
> > > > I'm trying to adapt fusefs to use vn_io_fault_uiomove to fix the
> > > > deadlock described in the comments above vn_io_fault_doio [^1].  I can
> > > > reproduce the deadlock readily enough, and the fix seems simple enough
> > > > in other filesystems [^2][^3].  But when I try to apply the same fix
> > > > to fusefs, the deadlock changes into a livelock.  vfs_bio_getpages
> > > > loops infinitely because it reaches the "redo = true" state.  But on
> > > > looping, it never attempts to read from fusefs again.  Instead,
> > > > breadn_flags returns 0 without ever calling bufstrategy, from which I
> > > > infer that getblkx returned a block with B_CACHE set.  Despite that,
> > > > at least one of the requested pages in vfs_bio_getpages fails the
> > > > vm_page_all_valid(ma[i]) check.  Debugging further is wandering
> > > > outside my areas of expertise.  Could somebody please give me a tip?
> > > > What is supposed to mark those pages as valid?  Are there any other
> > > > undocumented conditions needed to use vn_io_fault_uiomove that msdosfs
> > > > and nfscl just happened to already meet?
> > > >
> > > > Grateful for any help,
> > > > -Alan
> > > >
> > > > [^1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238340
> > > > [^2] https://github.com/freebsd/freebsd-src/commit/2aa3944510b50cbe6999344985a5a9c3208063b2
> > > > [^3] https://github.com/freebsd/freebsd-src/commit/ddfc47fdc98460b757c6d1dbe4562a0a339f228b
> > >
> > > Perhaps first you need to confirm that vfs_bio_getpages() sees a buffer
> > > with B_CACHE set but some pages still not fully valid (except the last
> > > page that is allowed to have invalid tail at EOF).
> > >
> > > When buffer strategy read returns, vfs_vmio_iodone() updates the pages
> > > validity bitmap according to the b_resid field of the buffer.  Look there,
> > > might be you did not set it properly.  Practically, both b_resid and b_bcount
> > > must be correctly set after io.
> > >
> > > In fact, after writing out all that, I realized that I am confused
> > > by your question. vn_io_fault_uiomove() needs to be used from
> > > VOP_READ/VOP_WRITE. If filesystem utilizes buffer cache, then there is
> > > a VOP_STRATEGY() implementation that fullfils the buffer cache requests
> > > for buffers reads and writes. VOP_READ and VOP_STRATEGY simply occurs
> > > at very different layers of the io stack. Typically, VOP_READ() does
> > > bread() which might trigger VOP_STRATEGY() to get the buffer, and then
> > > it performs vn_io_fault() to move data from locked buffer to userspace.
> > >
> > > The fact that your addition of vn_io_fault breaks something in VOP_STRATEGY()
> > > does not make sense.
> >
> > Ahh, that last piece of information is useful.  In fusefs, both
> > VOP_READ and VOP_STRATEGY can end up going through the same path,
> > depending on cache settings, O_DIRECT, etc.  And during a VOP_STRATEGY
> > read, fusefs needs to use uiomove in order to move data not into the
> > user's buffer, but from the fuse daemon into the kernel.  But that's
> > not the cause of the livelock, because whether I use uiomove or
> > vn_io_fault_uiomove during VOP_STRATEGY I still get the same livelock.
> > I'll check b_bcount and b_resid tomorrow.
>
> But uiomove call from VOP_STRATEGY() to copy user data into kernel buffer
> is not prepared for vn_io_fault.  I suspect that what happens there is
> the following:
> - top level of syscall, like read(2), does vn_read()
> - vn_read() checks conditions and goes through vn_io_fault, calling VOP_READ()
>   there it prepares prefaulted held pages _for userspace buffer from read(2)_
> - your VOP_READ() calls into VOP_STRATEGY() that tries to copy data into
>   kernel.  If you use vn_io_fault_uiomove() at this point, it wrongly
>   consumes held pages for unrelated userspace buffer
>
> Even if there is some additional bug with b_resid, I suspect that the
> end result is the mess anyway.  You should not use vn_io_fault for recursive
> accesses to userspace, only for VOP_READ/VOP_WRITE level accesses.

How would vn_io_fault_uiomove "wrong consumes held pages"?  Are they
attached to the uio?  Because the call that copies from userspace
(/dev/fuse) into the kernel doesn't have access to the "struct buf".