The out-of-swap killer makes poor choices
Alan Somers
asomers at freebsd.org
Tue Feb 23 21:20:34 UTC 2021
On Tue, Feb 23, 2021 at 2:11 PM Konstantin Belousov <kostikbel at gmail.com>
wrote:
> On Tue, Feb 23, 2021 at 01:49:49PM -0700, Alan Somers wrote:
> > To me it's always seemed like the out-of-swap killer kills the wrong
> > process. Oh, it does the right thing with a trivial while(1) {malloc()}
> > test program, but not with real workloads. To summarize the logic in
> > vm_pageout_oom:
> >
> > * Don't kill system, protected, or killed processes
> > * Don't kill processes with a thread that isn't running or suspended
> > * Kill whichever process is using the most swap or swap + ram, depending
> on
> > the shortage variable. On ties, kill the newest one.
> >
> > This algorithm probably made sense in the days when computers had much
> more
> > swap than RAM. But now it leads to several problems:
> >
> > * It's almost guaranteed to do the wrong thing when shortage ==
> > VM_OOM_SWAPZ and there is little or no swap configured. If no swap is
> > configured, it will kill the newest running or suspended process. If a
> > little bit is configured, it will probably kill some idle process, like
> > zfsd, that is swapped out because it doesn't run very often.
> >
> > * Even if multiple GB of swap are configured, the OOM killer is still
> > biased towards killing idle processes when shortage == VM_OOM_SWAPZ.
> Most
> > often, the process responsible for an out-of-memory condition is not
> idle,
> > and is consuming large amounts of RAM.
> >
> > * It ignores RLIMIT_RSS. We consider that rlimit when deciding whether
> to
> > move a process from RAM to swap.
> >
> > * The "out of swap space" kernel message doesn't specify whether the
> > process was killed because of insufficient swap or RAM (the shortage
> > variable)
> >
> > I propose the following changes:
> >
> > * Incorporate shortage into the "out of swap space" message.
> ok with me, not sure if users could make any action based on discretion
>
> > * When walking the process list, if any process exceeds its RLIMIT_RSS,
> > choose it immediately, without bothering to compare it to older
> processes.
> RSS was never supposed to be a limit on how many pages are resident.
> It only provided some preference for more aggressive paging out process'
> pages.
>
> Or put it differently, RSS is not supposed to be the working set size
> in VMS/NT sense.
>
Sure, but given that we must kill _something_, preferentially killing a
process that was specifically limited sounds better than killing a process
that wasn't, won't you agree?
>
> > * Always consider the sum of a process's RAM + swap, regardless of the
> > shortage variable.
> >
> > Does this make sense? Am I missing something about shortage ==
> > VM_OOM_SWAPZ? I don't understand why you would ever want to exclude
> > processes' RAM usage. That logic was added in revision
> > 2025d69ba7a68a5af173007a8072c45ad797ea23, but I don't understand the
> > rationale.
>
> SWAPZ means that swap zone is exhausted. In this case, killing a process
> that does not use swap, would not free any space in the zone. Similarly,
> we should select a process with largest swap (== metadata kept in swap
> zone)
> use to free something in swap zone.
>
But killing a process that does not use swap could reduce the need for more
swap by other processes. How many cases are there where a process needs
more SWAP and won't settle for RAM instead?
>
> In other words, such kill could be not enough and really require more and
> more rounds of OOM, esp. on machine with very small swap configured.
>
More information about the freebsd-hackers
mailing list