All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64

Mark Martinec Mark.Martinec+freebsd at ijs.si
Wed Aug 1 07:12:15 UTC 2018


> On Tue, Jul 31, 2018 at 11:54:29PM +0200, Mark Martinec wrote:
>> I have now upgraded this host from 11.1-RELEASE-p11 to 11.2-RELEASE
>> and the situation has not improved. Also turned off all services.
>> ZFS is still leaking memory about 30 MB per hour, until the host
>> runs out of memory and swap space and crashes, unless I reboot it
>> first every four days.
>> 
>> Any advise before I try to get rid of that faulted disk with a pool
>> (or downgrade to 10.3, which was stable) ?

2018-08-01 00:09, Mark Johnston wrote:
> If you're able to use dtrace, it would be useful to try tracking
> allocations with the solaris tag:
> 
> # dtrace -n 'dtmalloc::solaris:malloc {@allocs[stack(), args[3]] =
>   count()} dtmalloc::solaris:free {@frees[stack(), args[3]] = 
> count();}'
> 
> Try letting that run for one minute, then kill it and paste the output.
> Ideally the host will be as close to idle as possible while still
> demonstrating the leak.

Good and bad news:

The suggested dtrace command bails out:

# dtrace -n 'dtmalloc::solaris:malloc {@allocs[stack(), args[3]] = 
count()} dtmalloc::solaris:free {@frees[stack(), args[3]] = count();}'
dtrace: description 'dtmalloc::solaris:malloc ' matched 2 probes
Assertion failed: (buf->dtbd_timestamp >= first_timestamp), file 
/usr/src/cddl/contrib/opensolaris/lib/libdtrace/common/dt_consume.c, 
line 3330.
Abort trap

But I did get one step further, localizing the culprit.

I realized that the "solaris" malloc count goes up in sync with
the 'telegraf' monitoring service polls, which also has a ZFS plugin
which monitors the zfs pool and ARC. This plugin runs 'zpool list -Hp'
periodically.

So after stopping telegraf (and other remaining services),
the 'vmstat -m' shows that InUse count for "solaris" goes up by 552
every time that I run "zpool list -Hp" :

# (while true; do zpool list -Hp >/dev/null; vmstat -m | \
     fgrep solaris; sleep 1; done) | awk '{print $2-a; a=$2}'
6664427
541
552
552
552
552
552
552
552
552
556
548
552
552
552
552
552
552
552
552
552

# zpool list -Hp
floki   68719476736     37354102272     31365374464     -       -       
49%     54      1.00x   ONLINE  -
stuff   -       -       -       -       -       -       -       -       
UNAVAIL -


   Mark


More information about the freebsd-stable mailing list