dummynet dropping too many packets

Wed Oct 7 09:22:57 UTC 2009

Robert Watson wrote:
> On Wed, 7 Oct 2009, rihad wrote:
> 
>> rihad wrote:
>>> I've yet to test how this direct=0 improves extensive dummynet drops.
>>
>> Ooops... After a couple of minutes, suddenly:
>>
>> net.inet.ip.intr_queue_drops: 1284
>>
>> Bumped it up a bit.
> 
> Yes, I was going to suggest that moving to deferred dispatch has 
> probably simply moved the drops to a new spot, the queue between the 
> ithreads and the netisr thread.  In your setup, how many network 
> interfaces are in use, and what drivers?
> 
bce -- Broadcom NetXtreme II (BCM5706/BCM5708) PCI/PCIe Gigabit Ethernet
      adapter driver
device bce compiled into a 7.1-RELEASE-p8 kernel.
2 network cards: bce0 used for ~400-500 mbit/s input, bce1 for output, 
i.e. acting as a smart router. It has 2 quad core CPUs.

Now the probability of drops (as monitored by netstat -s's "output 
packets dropped due to no bufs, etc.") is definitely a function of 
traffic load and the number of items in a ipfw table. I've just 
decreased the size of the two tables from ~2600 to ~1800 each and the 
drops instantly went away, even though the traffic passing through the 
box didn't decrease, it even increased a bit due to now shaping fewer 
clients (luckily "ipfw pipe tablearg" passes packets failing a table 
lookup untouched).

> If what's happening is that you're maxing out a CPU then moving to 
> multiple netisrs might help if your card supports generating flow IDs, 
> but most lower-end cards don't.  I have patches to generate those flow 
> IDs in software rather than hardware, but there are some downsides to 
> doing so, not least that it takes cache line misses on the packet that 
> generally make up a lot of the cost of processing the packet.
> 
> My experience with most reasonable cards is that letting them doing the 
> work distribution with RSS and use multiple ithreads is a more 
> performant strategy than using software work distribution on current 
> systems, though.
> 
So should we prefer a bunch of expensive quality 10 gig cards? Any you 
would recommend?

> Someone has probably asked for this already, but -- could you send a 
> snapshot of the top -SH output in the steady state?  Let top run for a 
> few minutes and then copy/paste the first 10-20 lines into an e-mail.
> 
Sure. Mind you: now there's only 1800 entries in each of the two ipfw 
tables, so any drops have stopped. But it only takes another 200-300 
entries to start dropping.

155 processes: 10 running, 129 sleeping, 16 waiting
CPU:  2.4% user,  0.0% nice,  2.0% system,  9.3% interrupt, 86.2% idle
Mem: 1691M Active, 1491M Inact, 454M Wired, 130M Cache, 214M Buf, 170M Free
Swap: 2048M Total, 12K Used, 2048M Free

   PID USERNAME   PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
    15 root       171 ki31     0K    16K CPU3   3  22.4H 97.85% idle: cpu3
    14 root       171 ki31     0K    16K CPU4   4  23.0H 96.29% idle: cpu4
    12 root       171 ki31     0K    16K CPU6   6  23.8H 94.58% idle: cpu6
    16 root       171 ki31     0K    16K CPU2   2  22.5H 90.72% idle: cpu2
    13 root       171 ki31     0K    16K CPU5   5  23.4H 90.58% idle: cpu5
    18 root       171 ki31     0K    16K RUN    0  20.3H 85.60% idle: cpu0
    17 root       171 ki31     0K    16K CPU1   1 910:03 78.37% idle: cpu1
    11 root       171 ki31     0K    16K CPU7   7  23.8H 65.62% idle: cpu7
    21 root       -44    -     0K    16K CPU7   7  19:03 48.34% swi1: net
    29 root       -68    -     0K    16K WAIT   1 515:49 19.63% irq256: bce0
    31 root       -68    -     0K    16K WAIT   2  56:05  5.52% irq257: bce1
    19 root       -32    -     0K    16K WAIT   5  50:05  3.86% swi4: 
clock sio
   983 flowtools   44    0 12112K  6440K select 0  13:20  0.15% flow-capture
   465 root       -68    -     0K    16K -      3  51:19  0.00% dummynet
     3 root        -8    -     0K    16K -      1   7:41  0.00% g_up
     4 root        -8    -     0K    16K -      2   7:14  0.00% g_down
    30 root       -64    -     0K    16K WAIT   6   5:30  0.00% irq16: mfi0