Re: livelock in vfs_bio_getpages with vn_io_fault_uiomove

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Fri, 01 Oct 2021 03:07:47 UTC
On Thu, Sep 30, 2021 at 03:11:02PM -0600, Alan Somers wrote:
> I'm trying to adapt fusefs to use vn_io_fault_uiomove to fix the
> deadlock described in the comments above vn_io_fault_doio [^1].  I can
> reproduce the deadlock readily enough, and the fix seems simple enough
> in other filesystems [^2][^3].  But when I try to apply the same fix
> to fusefs, the deadlock changes into a livelock.  vfs_bio_getpages
> loops infinitely because it reaches the "redo = true" state.  But on
> looping, it never attempts to read from fusefs again.  Instead,
> breadn_flags returns 0 without ever calling bufstrategy, from which I
> infer that getblkx returned a block with B_CACHE set.  Despite that,
> at least one of the requested pages in vfs_bio_getpages fails the
> vm_page_all_valid(ma[i]) check.  Debugging further is wandering
> outside my areas of expertise.  Could somebody please give me a tip?
> What is supposed to mark those pages as valid?  Are there any other
> undocumented conditions needed to use vn_io_fault_uiomove that msdosfs
> and nfscl just happened to already meet?
> 
> Grateful for any help,
> -Alan
> 
> [^1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238340
> [^2] https://github.com/freebsd/freebsd-src/commit/2aa3944510b50cbe6999344985a5a9c3208063b2
> [^3] https://github.com/freebsd/freebsd-src/commit/ddfc47fdc98460b757c6d1dbe4562a0a339f228b

Perhaps first you need to confirm that vfs_bio_getpages() sees a buffer
with B_CACHE set but some pages still not fully valid (except the last
page that is allowed to have invalid tail at EOF).

When buffer strategy read returns, vfs_vmio_iodone() updates the pages
validity bitmap according to the b_resid field of the buffer.  Look there,
might be you did not set it properly.  Practically, both b_resid and b_bcount
must be correctly set after io.

In fact, after writing out all that, I realized that I am confused
by your question. vn_io_fault_uiomove() needs to be used from
VOP_READ/VOP_WRITE. If filesystem utilizes buffer cache, then there is
a VOP_STRATEGY() implementation that fullfils the buffer cache requests
for buffers reads and writes. VOP_READ and VOP_STRATEGY simply occurs
at very different layers of the io stack. Typically, VOP_READ() does
bread() which might trigger VOP_STRATEGY() to get the buffer, and then
it performs vn_io_fault() to move data from locked buffer to userspace.

The fact that your addition of vn_io_fault breaks something in VOP_STRATEGY()
does not make sense.