Sudden mbuf demand increase and shortage under the load

Tue Feb 16 11:29:56 UTC 2010

Maxim Sobolev wrote:
> Sergey Babkin wrote:
>> Maxim Sobolev wrote:
>>> Hi,
>>>
>>> Our company have a FreeBSD based product that consists of the numerous
>>> interconnected processes and it does some high-PPS UDP processing
>>> (30-50K PPS is not uncommon). We are seeing some strange periodic
>>> failures under the load in several such systems, which usually evidences
>>> itself in IPC (even through unix domain sockets) suddenly either
>>> breaking down or pausing and restoring only some time later (like 5-10
>>> minutes). The only sign of failure I managed to find was the increase of
>>> the "requests for mbufs denied" in the netstat -m and number of total
>>> mbuf clusters (nmbclusters) raising up to the limit.
>>
>> As a simple idea: UDP is not flow-controlled. So potentially
>> nothing stops an application from sending the packets as fast as it
>> can. If it's faster than the network card can process,
>> they would start collecting. So this might be worth a try
>> as a way to reproduce the problem and see if the system has
>> a safeguard against it or not.
>>
>> Another possibility: what happens if a process is bound to
>> an UDP socket but doesn't actually read the data from it?
>> FreeBSD used to be pretty good at it, just throwing away
>> the data beyond a certain limit, SVR4 was running out of
>> network memory. But it might have changed, so might be
>> worth a look too.
>
> Thanks. Yes, the latter could be actually the case. The former is less
> likely since the system doesn't generate so much traffic by itself, but
> rather relays what it receives from the network pretty much in 1:1
> ratio. It could happen though, if somehow the output path has been
> stalled. However, netstat -I igb0 shows zero Oerrs, which I guess means
> that we can rule that out too, unless there is some bug in the driver.
>
> So we are looking for potential issues that can cause UDP forwarding
> application to stall and not dequeue packets on time. So far we have
> identified some culprits in application logic that can cause such stalls
> in the unlikely event of gettimeofday() time going backwards. I've seen
> some messages from ntpd around the time of the problem, although it's
> unclear whether those are result of the that mbuf shortage or could
> indicate the root issue. We've also added some debug output to catch any
> abnormalities in the processing times.
>
> In any case I am a little bit surprised on how easy the FreeBSD can let
> mbuf storage to overflow. I'd expect it to be more aggressive in
> dropping things received from network once one application stalls.
> Combined with the fact that we apparently use shared storage for
> different kinds of network activity and perhaps IPC too, this gives an
> easy opportunity for DOS attacks. To me, separate limits for separate
> protocols or even classes of traffic (i.e. local/remote) would make much
> sense.

Can it be related to this issue somehow?

http://lists.freebsd.org/pipermail/freebsd-current/2009-August/011013.html
http://lists.freebsd.org/pipermail/freebsd-current/2009-August/010740.html

It was tested on FreeBSD 8 and high UDP traffic on igb interfaces emits 
messages "GET BUF: dmamap load failure - 12" and later results in kernel 
panic.
We have not received any response to this report.

Miroslav Lachman