small patch for pageout. Comments?

Thu Nov 30 18:37:37 UTC 2017

On Thu, Nov 30, 2017 at 10:34 AM, Larry McVoy <lm at mcvoy.com> wrote:

> In a recent numa meeting that Scott called, Jeff suggested a small
> patch to the pageout daemon (included below).
>
> It's rather dramatic the difference it makes for me.  If I arrange to
> thrash the crap out of memory, without this patch the kernel is so
> borked with all the processes in disk wait that I can't kill them,
> I can't reboot, my only option is to power off.
>
> With the patch there is still some borkage, the kernel is randomly
> killing processes because of out of mem, it should kill one of my
> processes that is causing the problem but it doesn't, it killed
> random stuff like dhclient, getty (logged me out), etc.
>
> But the system is responsive.
>
> What the patch does is say "if we have more than one core, don't sleep
> in pageout, just keep running until we freed enough mem".
>
> Comments?
>

Just to confirm why this patch works.

For UP systems, we have to pause here to allow work to complete, otherwise
we can't switch to their threads to complete the I/Os. For MP, however, we
can continue to schedule more work because that work can be completed on
other CPUs. This parallelism greatly increases the pageout rate, allowing
the system to keep up better when some ass-hat process (or processes) is
thrashing memory.

I'm pretty sure the UP case was also designed to not flood the lower layers
with work, starving other consumers. Does this result in undo flooding, and
would we get better results if we could schedule up to the right amount of
work rather flooding in the MP case?

Warner

--lm
>
> diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c
> index 4ecae5ad5fd..f59a09e96e2 100644
> --- a/sys/vm/vm_pageout.c
> +++ b/sys/vm/vm_pageout.c
> @@ -1815,10 +1815,18 @@ vm_pageout_worker(void *arg)
>                          * (page reclamation) scan, then increase the level
>                          * and scan again now.  Otherwise, sleep a bit and
>                          * try again later.
> +                        * LM: per discussions with the numa team, don't
> +                        * sleep if we have at least 2 cpus, just keep
> +                        * scanning.  This makes a HUGE difference when
> +                        * the system is thrashing on memory, it's the
> +                        * difference between usable and borked.
>                          */
>                         mtx_unlock(&vm_page_queue_free_mtx);
> -                       if (pass >= 1)
> -                               pause("psleep", hz / VM_INACT_SCAN_RATE);
> +                       if (pass >= 1) {
> +                               if (mp_ncpus < 2) {
> +                                       pause("psleep", hz
> /VM_INACT_SCAN_RATE);
> +                               }
> +                       }
>                         pass++;
>                 } else {
>                         /*
> _______________________________________________
> freebsd-arch at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"
>