swap space issues

Mon Jul 13 08:42:17 UTC 2020

On Mon, 13 Jul 2020 00:45:36 -0500 Scott Bennett bennett at sdf.org said

> Don Wilde <dwilde1 at gmail.com> wrote:
> 
> >
> > On 7/11/20 11:28 PM, Scott Bennett via freebsd-stable wrote:
> > >       I have read this entire thread to date with growing dismay, and I
> > > thank Donald Wilde for reporting his ongoing troubles, although they
> > > spoil my hopes that the kernel's memory management bugs that first became
> > > apparent in 11.2-RELEASE (and -STABLE around the same time) were not
> > > propagated into 12.x.  A recent update to stable/12 source tree made it
> > > finally possible for me to build 12.1-STABLE under 11.4-PRERELEASE, and I
> > > was just about to install the upgrade when this thread appeared.
> > Spoiler alert. Since I gave up on Synth, I haven't had a single swap 
> > issue. It does appear to be one particular port that drove it nuts 
> > (apparently, one of the 'Google performance' bits, with a 
> > mismatched-brackets problem). I have rebuilt the machine several times, 
> > but that's more for my sense of tidiness than anything.
> >
> > I've got a little Crystal script that walks the installed packages and 
> > ports and updates them with system() calls.
> > The machine is very slow, but it's not swapping at all.
> 
>     That's good.  I use portmaster, but not often at present because a
> "portmaster -a" run can only be done two or three times per boot before real
> memory is locked down to the extent that the system is no longer functional
> (i.e., even a scrub of ZFS pools comes to a halt in mid scrub due to lack of
> a
> sufficient supply of free page frames).
>     The build procedures of certain ports consistently get killed by the
> OOM
> killer, along with much collateral damage.  I've noticed that lang/golang
> and
> lang/rust are prime examples now, although both used to build without
> problems.
> >
> > It is quite usable now with 12-STABLE.
> 
>     I don't see any good reason to go through the hassle and lost time of
> an
> upgrade across a major release boundary if I still won't have a production
> OS
> afterward.  I'm already dealing with a graphics stack rendered unsafe to use
> by
> the ongoing churn in X11 code.  (See PR #247441, kindly filed for me by Pau
> Amma.)
> > >
> > >       On Fri, 26 Jun 2020 03:55:04 -0700 : Donald Wilde <dwilde1 at gmail.com>
> > > wrote:
> > >
> > >> On 6/26/20, Peter Jeremy <peter at rulingia.com> wrote:
> > >>>
> > [snip]
> > >>> I strongly suggest you don't have more than one swap device on spinning
> > >>> rust - the VM system will stripe I/O across the available devices and
> > >>> that will give particularly poor results when it has to seek between the
> > >>> partitions.
> > >       True.  The only reason I can think of to use more than one swapping/
> > > paging area on the same device for the same OS instance is for emergencies
> > > or highly unusual, temporary situations in which more space is needed
> > until
> > > those situations conclude. and even in such situations, if the space can
> > be
> > > found on another device, it should be placed there.  Interleaving of swap
> > > space across multiple devices is intended as a performance enhancement
> > > akin to striping (a.k.a. RAID0), although the virtual memory isn't
> > > necessarily always actually striped across those devices.  Adding a paging
> > > area on the same device as an existing one is an abhorrent situation, as
> > > Peter Jeremy noted, and it should be eliminated via swapoff(8) as soon as
> > > the extraordinary situation has passed.  N.B. the GENERIC kernel sets a
> > > limit of four swap devices, although it can be rebuilt with a different
> > > limit.
> > That's good data, Scott, thanks! The only reason I got into that 
> > situation of trying to add another swap device was that it was crashing 
> > with OO swap messages.
> 
>     I don't recall you posting those messages, but it sounds like exactly
> the
> *temporary* situation in which adding an inappropriately placed paging area
> can
> be used long enough to get you out of a bind without a reboot, even though
> performance will probably suffer until you have removed it again.  Poor
> performance is usually preferable to no performance if it is only temporary.
>     One cautionary note in such situations, though, applies to remote
> paging
> areas.  Sparse files allocated on the remote system should not be used as
> paging areas.  For example, I discovered the hard way (i.e., the problem was
> not documented) that SunOS would crash if a sparse file via NFS were added
> as
> a paging area and the SunOS system tried to write a page out to an
> unallocated
> region of the file, which was essentially all of the file at first.
> 
> > >> My intent is to make this machine function -- getting the bear
> > >> dancing. How deftly she dances is less important than that she dances
> > >> at all. My for-real boxen will have real HP and real cores and RAM.
> > >>
> > >>> Also, you can't actually use 64GB swap with 4GB RAM.  If you look back
> > >>> through your boot messages, I expect you'll find messages like:
> > >>> warning: total configured swap (524288 pages) exceeds maximum
> > recommended
> > >>> amount (498848 pages).
> > >>> warning: increase kern.maxswzone or reduce amount of swap.
> > >       Also true.  Unfortunately, no guidance whatsoever is provided to advise
> > > system administrators who need more space as to how to increase the
> > relevant
> > > table sizes and limits.  However, that is a documentation bug, not a code
> > > bug.
> > I've got both my kern.max* and CCACHE set up mostly correctly. 
> > Everything builds and runs well, although I've found that it's helpful 
> > to only use -j3 while building, not -j4 which would be appropriate for 
> > my HAMMER i3. I'd much rather have the bear *dancing* than running into 
> > walls. :D
> 
>     I have encountered many ports where MAKE_JOBS_UNSAFE should have been
> set,
> but hadn't been.  If you have installed ports-mgmt/portcont, you can set this
> on
> a per-port basis as you encounter these ports.  There are others that fail
> to
> build with MAKE_JOBS_NO >= 4, but will build just fine with MAKE_JOBS_NO=3 or
> 2.
> However, such failures to build are usually timing problems where one
> process
> tries to put a file into a directory that doesn't exist yet or to read a
> file
> that hasn't yet been created.  These are not situations involving the OOM
> killer.
> If you'd like the lines from my /usr/local/etc/ports.conf file for those
> I've
> encountered to date, just email me privately for them.
> 
> > >> Yes, as I posted, those were part of the failure stream from the synth
> > >> program. When I had kern.maxswzone increased, it got through boot
> > >> without complaining.
> > >>
> > >>> or maybe:
> > >>> WARNING: reducing swap size to maximum of xxxxMB per unit
> > >> The warnings were there, in the as-it-failed complaints.
> > >>
> > >>> The absolute limit on swap space is vm.swap_maxpages pages but the
> > >>> realistic
> > >>> limit is about half that.  By default the realistic limit is about 4?RAM
> > >>> (on
> > >>> 64-bit architectures), but this can be adjusted via kern.maxswzone
> > (which
> > >>> defines the #bytes of RAM to allocate to swzone structures - the actual
> > >>> space allocated is vm.swzone).
> > >>>
> > >>> As a further piece of arcana, vm.pageout_oom_seq is a count that
> > controls
> > >>> the number of passes before the pageout daemon gives up and starts
> > killing
> > >>> processes when it can't free up enough RAM.  "out of swap space"
> > messages
> 
>     Yeah, those messages are half truth and half lie.  The true part is
> that
> the processes mentioned have indeed been killed.  The lie is that the system
> is
> out of swap space.  (I have seen these messages issued with as little as 217
> MB
> in use out of 24 GB available on my system.)  The kernel might not always
> provide
> all relevant information in error messages, but it should *never* LIE to us.
> 
> > >>> generally mean that this number is too low, rather than there being a
> > >>> shortage of swap - particularly if your swap device is rather slow.
> > >>>
> > >> Thanks, Peter!
> > >       A second round of thanks to Peter Jeremy for pointing out this sysctl
> > > variable (vm.pageout_oom_seq), although thus far I have yet to see that it
> > is
> > > actually effective in working around the memory management bugs.  I have
> > added
> > > the following lines to /etc/sysctl.conf.
> > >
> > > # Because FreeBSD 11.{2,3,4} tie up page frames unnecessarily, set value
> > high
> > > #vm.pageout_wakeup_thresh=14124 # Default value
> > > vm.pageout_wakeup_thresh=112640 # 410 MB
> >
> > [snip]
> >
> > I do totally agree that these are crucial issues for both operation and 
> > documentation, although my issues stemmed from bad _userland_ stack 
> > control.
> 
>     Yes, this is a frequent problem I've observed in the attitudes of
> programmers
> who never experienced working with real-memory-only OS.  They often lack any
> awareness of wasteful memory usage, ordering of array accesses, locality of
> reference issues, etc., resulting in truly ridiculous amounts of bloat and
> lost
> performance, not to mention the failures to perform at all such as you
> encountered.
> In their minds, virtual memory frees them from all concerns about these
> issues, so
> their schoolteachers, now brought up the same way, don't even teach them
> about such
> things and perhaps still don't know about them themselves.
Feeling the same way. C++ IMHO was the beginning of the end -- abstraction /
objects do not lead to a better understanding of what you're doing, if you've
never worked on "bare metal" (at the "chip" level). Those w/o knowledge in
assembler never really fully understand what their doing.
Sorry. Couldn't resist.

>     Another problem, especially with programmers whose memories have not
> yet
> accumulated many painful experiences, is the attraction toward newer, more
> exciting
> features accompanied by a disinterest in tracking down and fixing existing
> bugs,
> even fairly critical bugs.  This problem, if left unchecked by management,
> can lead
> to terrible predicaments like the one FreeBSD is in now, namely, having no
> production releases being supported.  DragonflyBSD, NetBSD, and OpenBSD do
> not,
> AFAIK, suffer from this predicament at present.  They are behind to varying
> degrees
> in terms of newer, more exciting features, but at least they appear to work. 
> For
> example, sdf.org has well over 70,000 users and runs quite a few servers to
> do so.
> It runs
> 
> NetBSD miku 8.1_STABLE NetBSD 8.1_STABLE (GENERIC) #0: Wed Sep 11 03:47:45
> UTC 2019  root at ol:/sdf/sys/NetBSD-8/sys/arch/amd64/compile/GENERIC amd64
> 
> at present.  (miku.sdf.org is one of the servers.)  Its uptime is currently
> 306 days.
> They run several VMs of FreeBSD, OpenBSD, LINUX, and possibly others on some
> of the
> servers.  ZFS appeared in NetBSD 9.0.  I don't know the sysadmin's reasons
> for not
> upgrading to it so far, but I suspect they have to do with the number of
> systems to
> upgrade, the fact that it is a .0 release, and that root on ZFS and ZFS boot
> environments are not yet supported, as used to be the case with FreeBSD.  I'm
> not
> ready to switch to NetBSD quite yet and would not enjoy doing so, but it has
> been
> a steadily improving alternative to FreeBSD of late, and if FreeBSD does not
> release
> a production system in the meantime, NetBSD may become a better choice for
> many of
> us who want to run a production OS.  It also offers an alternative to
> Micro$lop for
> the so-called "Internet of Things", which no other FOSS OS does, AFAIK,
> although I
> don't know enough about LINUX to be sure.
> >
> > Those who live on -CURRENT are used to OOPS, but the rest of us get paid 
> > not to have them.
> 
>     I've been using -STABLE for the last several major releases, but because
> of
> the vast numbers of conflicts and failures buried throughout the ports tree
> and
> the horrendous amount of time it takes to rebuild most of my installed ports
> I am
> considering surrendering to using -RELEASE and using quarterly packages, in
> spite
> of the loss of features that doing so entails.  That would still not deal
> with the 
> dependency conflicts or the installation of identically named files by
> different
> ports, but it would reduce the time spent on building ports that fail to
> install.
> >
> > I am happy with what the Core Team gives us, AND of course we want 
> > ['more','better','faster','STABLE']. :D
> >
>     As Mark Linimon pointed out, the Core Team only does that indirectly. 
> However,
> it is the Core Team's job to give firm direction or redirection to those who
> do the
> designing and coding to avoid regressions, avoid ignoring the introduction of
> bugs,
> especially those that render a system unfit for production use, enhance
> testing,
> and so on.
> 
> 
>                                  Scott Bennett, Comm. ASMELG, CFIAG
> **********************************************************************
> * Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
> *--------------------------------------------------------------------*
> * "A well regulated and disciplined militia, is at all times a good  *
> * objection to the introduction of that bane of all free governments *
> * -- a standing army."                                               *
> *    -- Gov. John Hancock, New York Journal, 28 January 1790         *
> **********************************************************************

--Chris