dummynet dropping too many packets

Tue Oct 6 15:28:43 UTC 2009

Eugene Grosbein wrote:
> On Tue, Oct 06, 2009 at 06:14:58PM +0500, rihad wrote:
> 
>>> No, generally handles much more. Please show your ipfw rule(s)
>>> containing 'tablearg'.
>> 01031         x            x allow ip from any to any
>> 01040         x            x skipto 1100 ip from table(127) to any out 
>> recv bce0 xmit bce1
>> 01060         x            x pipe tablearg ip from any to table(0) out 
>> recv bce0 xmit bce1
>> 01070         x            x allow ip from any to table(0) out recv bce0 
>> xmit bce1
>> 01100         x            x pipe tablearg ip from any to table(2) out
>> 65535         x            x allow ip from any to any
>>
>> table(127) contains country-wide ISPs' netblocks (under 100 entries).
>> table(0) and table(2) contain same user IP addresses, but different pipe 
>> IDs - normally around 3-4k entries each.
>>
>> Now please pay special attention to rule 1031. I've added it to bypass 
>> dummynet and stop packets from being dropped for now. Normally the rule 
>> isn't there.
>>
>> As I found out today after rebooting, drops only start occurring when 
>> the number of entries in table(0) exceeds 2000 or so (please see my 
>> previous email). Maybe it's a coincidence - I don't know. Global traffic 
>> load doesn't matter - it was approximately the same before and after the 
>> drops (around 450 mbit/s).
> 
> It's possible that pipe lookup by its number is inefficient
> and firewall keeps its lock for too much time while searching the pipe,
> just a guess. And packets start to drop, eh?
> 
> Try setting net.isr.direct to 0 and make large net.inet.ip.intr_queue_maxlen.
> This way, one of your cores may run bce's thread, enqueue incoming
> packets and return to work immediately. The rest of processing may be
> performed by another kernel thread, hopefully using another core.
> Just to see if this changes anything. top -S should help here too.
> 
Since this is a remote production box, I'm really scared of toggling 
such on/off flags I've never used before, particularly under heavy 
traffic loads, they're way too eager to lock up the whole system. I 
might try this tomorrow morning, though, first on another less critical 
box. p.s.: I've just tested toggling the flag on my virtual machine, it 
went fine.

I don't think net.inet.ip.intr_queue_maxlen is relevant to this problem, 
as net.inet.ip.intr_queue_drops is normally zero or very close to it at 
all times.