small patch for pageout. Comments?

Thu Nov 30 18:49:23 UTC 2017

On Thu, Nov 30, 2017 at 11:37:35AM -0700, Warner Losh wrote:
> On Thu, Nov 30, 2017 at 10:34 AM, Larry McVoy <lm at mcvoy.com> wrote:
> 
> > In a recent numa meeting that Scott called, Jeff suggested a small
> > patch to the pageout daemon (included below).
> >
> > It's rather dramatic the difference it makes for me.  If I arrange to
> > thrash the crap out of memory, without this patch the kernel is so
> > borked with all the processes in disk wait that I can't kill them,
> > I can't reboot, my only option is to power off.
> >
> > With the patch there is still some borkage, the kernel is randomly
> > killing processes because of out of mem, it should kill one of my
> > processes that is causing the problem but it doesn't, it killed
> > random stuff like dhclient, getty (logged me out), etc.
> >
> > But the system is responsive.
> >
> > What the patch does is say "if we have more than one core, don't sleep
> > in pageout, just keep running until we freed enough mem".
> >
> > Comments?
> >
> 
> Just to confirm why this patch works.
> 
> For UP systems, we have to pause here to allow work to complete, otherwise
> we can't switch to their threads to complete the I/Os. For MP, however, we
> can continue to schedule more work because that work can be completed on
> other CPUs. This parallelism greatly increases the pageout rate, allowing
> the system to keep up better when some ass-hat process (or processes) is
> thrashing memory.

Yep.

> I'm pretty sure the UP case was also designed to not flood the lower layers
> with work, starving other consumers. Does this result in undo flooding, and
> would we get better results if we could schedule up to the right amount of
> work rather flooding in the MP case?

I dunno if there is a "right amount".  I could make it a little smarter by
keeping track of how many pages we freed and sleep if we freed none in a 
scan (which seems really unlikely).

All I know for sure is that without this you can lock up the system to
the point it takes a power cycle to unwedge it.  With this the system
is responsive.

Rather than worrying about the smartness, I'd argue this is an improvement,
ship it, and then I can go look at how the system decides to kill processes
(because that's currently busted).