svn commit: r242910 - in user/andre/tcp_workqueue/sys: kern sys
Robert Watson
rwatson at FreeBSD.org
Mon Dec 3 09:38:03 UTC 2012
On Mon, 3 Dec 2012, Maxim Sobolev wrote:
>>> We are also in quite mbufs hungry environment, is's not 10GigE, but we are
>>> dealing with forwarding voice traffic, which consists of predominantly
>>> very small packets (20-40 bytes). So we have a lot of small packets
>>> in-flight, which uses a lot of MBUFS.
>>>
>>> What however happens, the network stack consistently lock up after we put
>>> more than 16-18MB/sec onto it, which corresponds to about 350-400 Kpps.
>>
>> Can you drop into kdb? Do you have any backtrace to see where or how it
>> lock up?
>
> Unfortunately it's hardly and option in production, unless we can reproduce
> the issue on the test machine. It is not locking up per se, but all
> network-related activity ceases. We can still get in through kvm console.
Could you share the results of vmstat -z and netstat -m for the box?
(FYI, if you do find yourself in DDB, "show uma" is essentially the same as
"vmstat -z".)
Robert
>
>>> This is way lower than any nmbclusters/maxusers limits we have
>>> (1.5m/1500).
>>>
>>> With half of that critical load right now we see something along those
>>> lines:
>>>
>>> 66365/71953/138318/1597440 mbuf clusters in use (current/cache/total/max)
>>> 149617K/187910K/337528K bytes allocated to network (current/cache/total)
>>>
>>> Machine has 24GB of ram.
>>>
>>> vm.kmem_map_free: 24886267904
>>> vm.kmem_map_size: 70615040
>>> vm.kmem_size_scale: 1
>>> vm.kmem_size_max: 329853485875
>>> vm.kmem_size_min: 0
>>> vm.kmem_size: 24956903424
>>>
>>> So my question is whether there are some other limits that can cause
>>> MBUFS starvation if the number
>>> of allocated clusters grows to more than 200-250k? I am curious how it
>>> works in the dynamic system -
>>> since no memory is pre-allocated for MBUFS, what happens if the
>>> network load increases gradually
>>> while the system is running? Is it possible to get to ENOMEM
>>> eventually with all memory already
>>> taken for other pools?
>>
>> Yes, mbuf allocation is not guaranteed and can fail before the limit is
>> reached. What may happen is that a RX DMA ring refill failed and the
>> driver wedges. This would be a driver bug.
>>
>> Can you give more information on the NIC's and drivers you use?
>
> All of them use various incarnations of Intel GigE chip, mostly igb(4), but
> we've seen the same behaviour with em(4) as well.
>
> Both 8.2 and 8.3 are affected. We have not been able to confirm if 9.1 has
> the same issue.
>
> igb1: <Intel(R) PRO/1000 Network Connection version - 2.3.1> port
> 0xec00-0xec1f mem
> 0xfbee0000-0xfbefffff,0xfbec0000-0xfbedffff,0xfbe9c000-0xfbe9ffff irq 40 at
> device 0.1 on pci10
> igb1: Using MSIX interrupts with 9 vectors
> igb1: Ethernet address: 00:30:48:cf:bb:1d
> igb1: [ITHREAD]
> igb1: Bound queue 0 to cpu 8
> igb1: [ITHREAD]
> igb1: Bound queue 1 to cpu 9
> igb1: [ITHREAD]
> igb1: Bound queue 2 to cpu 10
> igb1: [ITHREAD]
> igb1: Bound queue 3 to cpu 11
> igb1: [ITHREAD]
> igb1: Bound queue 4 to cpu 12
> igb1: [ITHREAD]
> igb1: Bound queue 5 to cpu 13
> igb1: [ITHREAD]
> igb1: Bound queue 6 to cpu 14
> igb1: [ITHREAD]
> igb1: Bound queue 7 to cpu 15
> igb1: [ITHREAD]
>
> igb1 at pci0:10:0:1: class=0x020000 card=0x10c915d9 chip=0x10c98086
> rev=0x01 hdr=0x00
> vendor = 'Intel Corporation'
> class = network
> subclass = ethernet
>
> -Maxim
>
More information about the svn-src-user
mailing list