Sudden mbuf demand increase and shortage under the load (igb
issue?)
Maxim Sobolev
sobomax at FreeBSD.org
Tue Feb 16 18:11:06 UTC 2010
OK, here is some new data that I think rules out any issues with the
applications. Following Alfred's suggestion I have made a script to run
every second and output some system statistics:
date
netstat -m
vmstat -i
ps -axl
pstat -T
vmstat -z
sysctl -a
The problem had hit us again today several times and upon investigating
the log I found that increase in the mbuf usage happened in one step -
going from normal 10% to 100% between two script runs. What is more
interesting, is that time from two such subsequent runs were about 2
minutes apart (instead of 1 second as it should be) and when inspecting
cron logs I noticed the same time gap in there. I ruled out any VM
starvation as a cause of the delay because system has plenty of free
memory. The incoming network traffic was not sufficient to starve VM so
quickly either - it was about 7MB/sec at that time, so even if all
receivers stopped draining their buffers it should have taken at least
1-2 seconds to fill up mbuf cache and create demand for an additional
kernel memory. The failure would likely to be more gradual and I should
have seen how it builds up in the debug log.
So it looks like kernel issue of a sort, which causes all userland
activity to cease for 2 minutes when the system reaches certain load.
Mbuf build-up is only the by-product of this, not really a cause. igb(4)
is being the primary suspect now, since we have other machines with more
load not having this problem and we don't have anybody else using this
driver. The chip is the following:
igb0 at pci0:5:0:0: class=0x020000 card=0x323f103c chip=0x10c98086
rev=0x01 hdr=0x00
vendor = 'Intel Corporation'
class = network
subclass = ethernet
igb1 at pci0:5:0:1: class=0x020000 card=0x323f103c chip=0x10c98086
rev=0x01 hdr=0x00
vendor = 'Intel Corporation'
class = network
subclass = ethernet
Hardware in question is a new HP DL160G6. I have also checked IPMI logs
and sensors and have not found any issue in there as well. No sensors
reported off-range values and chassis temperature is within normal limits.
I am not sure how to debug this problem further. We are now
investigating opportunity to install external non-igb card to the server
and see if it solves the issue.
I have the whole log if anyone wants to take a closer peek.
Regards,
--
Maksym Sobolyev
Sippy Software, Inc.
Internet Telephony (VoIP) Experts
T/F: +1-646-651-1110
Web: http://www.sippysoft.com
MSN: sales at sippysoft.com
Skype: SippySoft
More information about the freebsd-net
mailing list