Re: swap_pager: cannot allocate bio

In reply to: Chris Ross : "Re: swap_pager: cannot allocate bio"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Mark Johnston <markj_at_freebsd.org>
Date: Tue, 18 Jan 2022 20:29:48 UTC
On Fri, Dec 31, 2021 at 09:08:48PM -0500, Chris Ross wrote:
> 
> 
> > On Nov 25, 2021, at 00:18, Chris Ross <cross+freebsd@distal.com> wrote:
> >>> On Nov 20, 2021, at 13:23, Mark Johnston <markj@freebsd.org <mailto:markj@freebsd.org>> wrote:
> >>> 
> >>> Here is a patch which tries to address the proximate cause of the
> >>> problem.  
> 
> > 
> > The system is still cooking along, running the job that previously was
> > causing it to get stuck after 48ish hours.  It’s been running more than
> > 80 hours now, so a definite improvement.
> > 
> > zfs-stats reports things as stable, 80% cache hit ratio, 64GB max arc,
> > ~85% of that currently as “adaptive target”.  If there’s anything you
> > would like to get from the system, data-wise, let me know.  Happy to
> > share, and help get this fix, or a different better fix if needed, into the
> > tree.
> 
> Hello all.  Just was curious if anyone had a different solution for the
> problem I was seeing, or if not, if the patch from Mark that I manually
> applied can be integrated to the tree for current, and MFR’d to 13.
> 
> Thank you.  Please update me as to the current status of this issue,
> so I don’t update and lose functionality at some later point.  :-)

Sorry for the delay.  I submitted a pull request to openzfs which fixes
the problem in a different way:
https://github.com/openzfs/zfs/pull/12985#pullrequestreview-855857099
I expect it will be merged in some form soon, and merged into FreeBSD
within the next several weeks.

I still see some problems around low memory handling on NUMA systems
when the ARC consumes most of RAM, but these aren't particularly related
to the deadlock.  More specifically, under severe memory pressure the
page daemon will shrink UMA caches asynchronously, but the pages freed
this way are not counted as frees by the page daemon, which may thus
conclude that it's not making progress and trigger an OOM kill.  Further
exacerbating the problem is that the ARC's grow_retry constant was
changed from 60s to 5s by default with the OpenZFS, which is smaller
than the default lowmem period (10s).  This makes it easy for the page
daemon to fall behind its target, causing the integral term of its PID
controller to grow quite large.  The page daemon then goes into
overdrive even though the instantaneous magnitude of the domain's page
shortage is quite small.