watchdogd, jemalloc, and mlockall

Sun Nov 4 17:02:10 UTC 2012

On Sun, 2012-11-04 at 09:36 -0700, Warner Losh wrote:
> On Nov 3, 2012, at 12:50 PM, Ian Lepore wrote:
> 
> > On Sat, 2012-11-03 at 20:41 +0200, Konstantin Belousov wrote:
> >> On Sat, Nov 03, 2012 at 12:38:39PM -0600, Ian Lepore wrote:
> >>> In an attempt to un-hijack the thread about memory usage increase
> >>> between 6.4 and 9.x, I'm starting a new thread here related to my recent
> >>> discovery that watchdogd uses a lot more memory since it began using
> >>> mlockall(2).
> >>> 
> >>> I tried statically linking watchdogd and it made a small difference in
> >>> RSS, presumably because it doesn't wire down all of libc and libm.
> >>> 
> >>> VSZ   RSS
> >>> 10236 10164  Dynamic
> >>> 8624  8636  Static
> >>> 
> >>> Those numbers are from ps -u on an arm platform.  I just updated the PR
> >>> (bin/173332) with some procstat -v output comparing with/without
> >>> mlockall().
> >>> 
> >>> It appears that the bulk of the new RSS bloat comes from jemalloc
> >>> allocating vmspace in 8MB chunks.  With mlockall(MCL_FUTURE) in effect
> >>> that leads to wiring 8MB to satisfy what probably amounts to a few
> >>> hundred bytes of malloc'd memory.
> >>> 
> >>> It would probably also be a good idea to remove the floating point from
> >>> watchdogd to avoid wiring all of libm.  The floating point is used just
> >>> to turn the timeout-in-seconds into a power-of-two-nanoseconds value.
> >>> There's probably a reasonably efficient way to do that without calling
> >>> log(), considering that it only happens once at program startup.
> >> 
> >> No, I propose to add a switch to turn on/off the mlockall() call.
> >> I have no opinion on the default value of the suggested switch.
> > 
> > In a patch I submitted along with the PR, I added code to query the
> > vm.swap_enabled sysctl and only call mlockall() when swapping is
> > enabled.  
> > 
> > Nobody yet has said anything about what seems to me to be the real
> > problem here:  jemalloc grabs 8MB at a time even if you only need to
> > malloc a few bytes, and there appears to be no way to control that
> > behavior.  Or maybe there's a knob in there that didn't jump out at me
> > on a quick glance through the header files.
> 
> Isn't that only for non-production builds?
> 
> Warner

I just realized the implication of what you asked.  I think it must be
that jemalloc always allocates big chunks of vmspace at a time (unless
tuned to do otherwise; I haven't looked into the tuning stuff yet), but
when MALLOC_PRODUCTION isn't defined it also touches all the pages
within that allocated space, presumably to lay in known byte patterns or
other debugging info.

-- Ian