10.0-RC1: bad mbuf leak?

Wed Jan 8 20:57:08 UTC 2014

On 01/08/2014 15:45, Mark Felder wrote:
> On Wed, Jan 8, 2014, at 14:32, Adam McDougall wrote:
>> On 01/06/2014 13:32, Mark Felder wrote:
>>> It's not looking promising. mbuf usage is really high again. I haven't
>>> hit the point where the system is unavailable on the network but it
>>> appears to be approaching.
>>>
>>> root at skeletor:/usr/home/feld # netstat -m
>>> 4093391/3109/4096500 mbufs in use (current/cache/total)
>>> 1025/1725/2750/1017354 mbuf clusters in use (current/cache/total/max)
>>> 1025/1725 mbuf+clusters out of packet secondary zone in use
>>> (current/cache)
>>> 0/492/492/508677 4k (page size) jumbo clusters in use
>>> (current/cache/total/max)
>>> 0/0/0/150719 9k jumbo clusters in use (current/cache/total/max)
>>> 0/0/0/84779 16k jumbo clusters in use (current/cache/total/max)
>>> 1025397K/6195K/1031593K bytes allocated to network (current/cache/total)
>>> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>>> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
>>> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
>>> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
>>> 0 requests for sfbufs denied
>>> 0 requests for sfbufs delayed
>>> 0 requests for I/O initiated by sendfile
>>>
>>> root at skeletor:/usr/home/feld # vmstat -z | grep mbuf
>>> mbuf_packet:            256, 6511065,    1025,    1725, 9153363,   0,  
>>> 0
>>> mbuf:                   256, 6511065, 4092367,    1383,74246554,   0,  
>>> 0
>>> mbuf_cluster:          2048, 1017354,    2750,       0,    2750,   0,  
>>> 0
>>> mbuf_jumbo_page:       4096, 508677,       0,     492, 2655317,   0,   0
>>> mbuf_jumbo_9k:         9216, 150719,       0,       0,       0,   0,   0
>>> mbuf_jumbo_16k:       16384,  84779,       0,       0,       0,   0,   0
>>> mbuf_ext_refcnt:          4,      0,       0,       0,       0,   0,   0
>>>
>>> root at skeletor:/usr/home/feld # uptime
>>> 12:30PM  up 15:05, 1 user, load averages: 0.24, 0.23, 0.27
>>>
>>> root at skeletor:/usr/home/feld # uname -a
>>> FreeBSD skeletor.feld.me 10.0-PRERELEASE FreeBSD 10.0-PRERELEASE #17
>>> r260339M: Sun Jan  5 21:23:10 CST 2014
>>> _______________________________________________
>>> freebsd-stable at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>>>
>>
>> Can you try your NFS mounts from directly within the jails, or stop one
>> or more jails for a night and see if it becomes stable?  Anything else
>> unusual besides the jails/nullfs such as pf, ipfw, nat, vimages?  My
>> systems running 10 seem fine including the one running poudriere builds
>> which uses jails and I think nullfs, but not nfs.  Do mbufs go up when
>> you cause nfs traffic?
>>
> 
> You can't do NFS mounts from within a jail, which is why I have to do it
> this way.
> 
> Nothing else unusual. Very few services running. The box sits mostly
> idle and the traffic is light -- watching some TV shows (the jail runs
> Plex Media Server). I haven't been able to locate a reason for the mbufs
> to go up, but often a wake up in the morning after it has been doing
> nothing all night and see it made a large jump in mbufs used. When I'm
> running an 11-CURRENT kernel these problems do not exist.

Can you have a script run some stats like netstat -m every few minutes
during the night to see if it happens at a particular time?  I'm
wondering if the system scripts are crawling the mountpoints to cause
this.  Alternately, as far as NFS mounts and jails, with a reasonable
amount of work could you replace the nullfs/nfs usage with temporary NFS
mounts outside of the jails but mounted in the jail root fs?