NET.ISR and CPU utilization performance w/ HP DL 585 using FreeBSD 7.1 Beta2

Sun Nov 16 03:18:49 PST 2008

> ----- Original Message ----

> From: Jeremy Chadwick <koitsu at FreeBSD.org>
> To: Won De Erick <won.derick at yahoo.com>
> Cc: rwatson at freebsd.org; freebsd-hackers at freebsd.org
> Sent: Saturday, November 15, 2008 10:16:31 PM
> Subject: Re: NET.ISR and CPU utilization performance w/ HP DL 585 using FreeBSD 7.1 Beta2
> 
> On Sat, Nov 15, 2008 at 04:59:16AM -0800, Won De Erick wrote:
> > Hello,
> > 
> > I tested HP DL 585 (16 CPUs, w/ built-in Broadcom NICs) running FreeBSD 7.1 Beta2 under heavy network traffic (TCP).
> > 
> > SCENARIO A : Bombarded w/ TCP traffic:
> > 
> > When net.isr.direct=1,
> > 
> >   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
> >    52 root        1 -68    -     0K    16K CPU11  b  38:43 95.36% irq32: bce1
> >    51 root        1 -68    -     0K    16K CPU10  a  25:50 85.16% irq31: bce0
> >    16 root        1 171 ki31     0K    16K RUN    a  65:39 15.97% idle: cpu10
> >    28 root        1 -32    -     0K    16K WAIT   8  12:28  5.18% swi4: clock sio
> >    15 root        1 171 ki31     0K    16K RUN    b  52:46  3.76% idle: cpu11
> >    45 root        1 -64    -     0K    16K WAIT   7   7:29  1.17% irq17: uhci0
> >    47 root        1 -64    -     0K    16K WAIT   6   1:11  0.10% irq16: ciss0
> >    27 root        1 -44    -     0K    16K WAIT   0  28:52  0.00% swi1: net
> > 
> > When net.isr.direct=0,
> > 
> >    16 root        1 171 ki31     0K    16K CPU10  a 106:46 92.58% idle: cpu10
> >    19 root        1 171 ki31     0K    16K CPU7   7 133:37 89.16% idle: cpu7
> >    27 root        1 -44    -     0K    16K WAIT   0  52:20 76.37% swi1: net
> >    25 root        1 171 ki31     0K    16K RUN    1 132:30 70.26% idle: cpu1
> >    26 root        1 171 ki31     0K    16K CPU0   0 111:58 64.36% idle: cpu0
> >    15 root        1 171 ki31     0K    16K CPU11  b  81:09 57.76% idle: cpu11
> >    52 root        1 -68    -     0K    16K WAIT   b  64:00 42.97% irq32: bce1
> >    51 root        1 -68    -     0K    16K WAIT   a  38:22 12.26% irq31: bce0
> >    45 root        1 -64    -     0K    16K WAIT   7  11:31 12.06% irq17: uhci0
> >    47 root        1 -64    -     0K    16K WAIT   6   1:54  3.66% irq16: ciss0
> >    28 root        1 -32    -     0K    16K WAIT   8  16:01  0.00% swi4: clock sio
> > 
> > Overall CPU utilization has significantly dropped, but I noticed that swi1 has taken CPU0 with high utilization when the net.isr.direct=0.
> > What does this mean?
> > 
> > SCENARIO B : Bombarded w/ more TCP traffic:
> > 
> > Worst thing, the box has become unresponsive (can't be PINGed, inaccessible through SSH) after more traffic was added retaining net.isr.direct=0.
> > This is due maybe to the 100% utilization on CPU0 for sw1:net (see below result, first line). bce's and swi's seem to race each other based on the result when net.isr.direct=1, swi1 . 
> > The rest of the CPUs are sitting pretty (100% Idle). Can you shed some lights on this?
> > 
> > When net.isr.direct=0:
> >    27 root        1 -44    -     0K    16K CPU0   0   5:45 100.00% swi1: net
> >    11 root        1 171 ki31     0K    16K CPU15  0   0:00 100.00% idle: cpu15
> >    13 root        1 171 ki31     0K    16K CPU13  0   0:00 100.00% idle: cpu13
> >    17 root        1 171 ki31     0K    16K CPU9   0   0:00 100.00% idle: cpu9
> >    18 root        1 171 ki31     0K    16K CPU8   0   0:00 100.00% idle: cpu8
> >    21 root        1 171 ki31     0K    16K CPU5   5 146:17 99.17% idle: cpu5
> >    22 root        1 171 ki31     0K    16K CPU4   4 146:17 99.07% idle: cpu4
> >    14 root        1 171 ki31     0K    16K CPU12  0   0:00 99.07% idle: cpu12
> >    16 root        1 171 ki31     0K    16K CPU10  a 109:33 98.88% idle: cpu10
> >    15 root        1 171 ki31     0K    16K CPU11  b  86:36 93.55% idle: cpu11
> >    52 root        1 -68    -     0K    16K WAIT   b  59:42 13.87% irq32: bce1
> > 
> > When net.isr.direct=1,
> >    52 root        1 -68    -     0K    16K CPU11  b  55:04 97.66% irq32: bce1
> >    51 root        1 -68    -     0K    16K CPU10  a  33:52 73.88% irq31: bce0
> >    16 root        1 171 ki31     0K    16K RUN    a 102:42 26.86% idle: cpu10
> >    15 root        1 171 ki31     0K    16K RUN    b  81:20  3.17% idle: cpu11
> >    28 root        1 -32    -     0K    16K WAIT   e  13:40  0.00% swi4: clock sio
> > 
> > With regards to bandwidth in all scenarios above, the result is extremely low (expected is several hundred Mb/s). Why? 

The below result should be under scenario B above only. 

> > 
> >   -         iface                   Rx                   Tx                Total
> >   ==============================================================================
> >              bce0:           4.69 Mb/s           10.49 Mb/s           15.18 Mb/s
> >              bce1:          20.66 Mb/s            4.68 Mb/s           25.34 Mb/s
> >               lo0:           0.00  b/s            0.00  b/s            0.00  b/s
> >   ------------------------------------------------------------------------------
> >             total:          25.35 Mb/s           15.17 Mb/s           40.52 Mb/s
> > 
> > 
> > Thanks,
> > 
> > Won
> 
> And does this behaviour change if you use some other brand of NIC?

With Intel Pro NIC ( 82571):

When net.isr.direct=1,

   49 root        1 -68    -     0K    16K CPU12  c   6:50 100.00% em0 taskq
   15 root        1 171 ki31     0K    16K CPU11  b   5:47 100.00% idle: cpu11
   50 root        1 -68    -     0K    16K CPU13  d   6:15 86.96% em1 taskq
   25 root        1 171 ki31     0K    16K CPU1   1   9:27 79.79% idle: cpu1
   28 root        1 -32    -     0K    16K WAIT   1   1:33 22.75% swi4: clock sio
   13 root        1 171 ki31     0K    16K RUN    d   4:14 12.26% idle: cpu13
   14 root        1 171 ki31     0K    16K RUN    c   3:37  0.00% idle: cpu12

em0 and em1 have high CPU utilization, and with netstat, there were packet errors.

# netstat -I em0 -w 1 -d
            input          (em0)           output
   packets  errs      bytes    packets  errs      bytes colls drops
     15258  3066   22748316      18468     0    4886567     0     0
     15461  3096   22783724      18379     0    5350130     0     0

When net.isr.direct=0,
   12 root        1 171 ki31     0K    16K CPU14  e  22:28 100.00% idle: cpu14
   20 root        1 171 ki31     0K    16K CPU6   6  24:32 97.85% idle: cpu6
   25 root        1 171 ki31     0K    16K RUN    1  21:51 96.97% idle: cpu1
   27 root        1 -44    -     0K    16K CPU2   2   5:12 91.55% swi1: net
   13 root        1 171 ki31     0K    16K CPU13  d  11:04 86.96% idle: cpu13
   14 root        1 171 ki31     0K    16K CPU12  c  10:51 81.59% idle: cpu12
   49 root        1 -68    -     0K    16K CPU12  c  13:48 22.17% em0 taskq
   24 root        1 171 ki31     0K    16K RUN    2  19:16 12.16% idle: cpu2
   50 root        1 -68    -     0K    16K -      d  13:34 11.87% em1 taskq
   28 root        1 -32    -     0K    16K WAIT   3   3:48  0.00% swi4: clock sio

sw1:net is taking high CPU utilization this time, but without packet errors:

# netstat -I em0 -w 1 -d
            input          (em0)           output
   packets  errs      bytes    packets  errs      bytes colls drops
      4275     0    5528012      24878     0   24162198     0     0
      4317     0    5585954      24880     0   24066583     0     0

Is this related to the context switching in FreeBSD 7.x? I noticed that there were no significant difference in enabling and disabling net.isr.direct in FreeBSD 6.2.
Also, is there any significance of enabling device polling?

> 
> -- 
> | Jeremy Chadwick                                jdc at parodius.com |
> | Parodius Networking                      http://www.parodius.com/ |
> | UNIX Systems Administrator                  Mountain View, CA, USA |
> | Making life hard for others since 1977.              PGP: 4BD6C0CB |