em driver input errors
alexpalias-bsdnet at yahoo.com
alexpalias-bsdnet at yahoo.com
Wed Aug 19 12:52:27 UTC 2009
Greetings.
--- On Mon, 8/17/09, Дмитрий Замураев <gigabyte.tmn at gmail.com> wrote:
> From: Дмитрий Замураев <gigabyte.tmn at gmail.com>
> Subject: RE: em driver input errors
> To: alexpalias-bsdnet at yahoo.com
> Cc: freebsd-net at freebsd.org
> Date: Monday, August 17, 2009, 6:17 PM
>
>
> >/boot/loader.conf:
> >hw.em.rxd=4096
> >hw.em.txd=4096
> why you are using this
> values? try default (without
> this lines in loader.conf)
As said in my original email, I was getting way more errors with the defaults.
> > Witout the above we
> were seeing way more
> errors, now they are reduced, but still come in bursts of
> over 1000 errors on
> em0.
> >Still seeing errros,
> after some searching the
> mailing lists we also added:
> ># the four lines below
> are repeated for em1,
> em2,
> em3
> >dev.em.0.rx_int_delay=0
> >dev.em.0.rx_abs_int_delay=0
> >dev.em.0.tx_int_delay=0
> >dev.em.0.tx_abs_int_delay=0
> try to increase
> rx_int_delay to 600 and
> rx_abs_int_delay to 1000, tx_*_delay without changes ->
> by default
> (100?)
Thanks for the suggestion.
From a "clean" box:
dev.em.0.rx_int_delay: 0
dev.em.0.tx_int_delay: 66
dev.em.0.rx_abs_int_delay: 66
dev.em.0.tx_abs_int_delay: 66
I reset all the values (errors still appearing), then tried your suggestion (rx_int_delay=600, rx_abs_int_delay=1000). This has reduced the number of interrupts for em0 (from about 7200/sec to around 6500/sec). After some time, I started getting errors again. But that has made me try this also:
dev.em.0.tx_int_delay=600
dev.em.0.tx_abs_int_delay=1000
Meaning using your suggested values for tx too. Now em0 is seeing about 1800 interrupts/second, which is way better, but after some time I saw errors again...
From the output of "netstat -nI em0 -w 5":
input (em0) output
packets errs bytes packets errs bytes colls
87267 0 50372599 106931 0 81598993 0
86496 0 50990332 105467 0 80064657 0
81726 3056 49876613 99080 0 73273640 0
90425 0 59172531 105299 0 77110096 0
120292 0 70369292 109597 0 78626248 0
... a few minutes pass with zero errors ...
89646 0 56951878 111240 0 86493393 0
86031 0 53549721 108695 0 83592747 0
77760 3054 48505562 96912 0 73185576 0
87508 0 56116394 106094 0 79130608 0
89031 0 56490982 103039 0 77398567 0
What's interesting is that I'm seeing errors in a 80k packets/5 sec (so around 16k packets/s) zone, but no errors at 120k packets/5sec (24kpps).
Currently, I've set the delay to 600 and abs_delay to 1000 on all interfaces (em0, em1, em2, em3), thus reducing the number of interrupts.
I'm currently seeing (in systat -vmstat 2):
Around 1800 irqs/s for em0, 1800 for em1, 1800 for em2, under 10/s for em3
Around 2000 irqs/s for cpu0:time, 2000 more for cpu1:time, 2000 for cpu2:time and 2000 for cpu3:time.
Interrupts total (as reported by systat): around 13500/second. I would estimate the old IRQ load at around 30000-35000/second, which doesn't seem too much to me, for a dual xeon machine.
> >kern.ipc.nmbclusters=655360
> no need. see netstat
> -m
Thanks, but as I said, I did try almost *EVERYTHING* I could without rebooting. Including this.
Speaking of which, I did compile the kernel with "options DEVICE_POLLING", but enabling polling only made the errors appear more often, and in greater numbers.
> P.S. change copper cable,
> turn off the flow-control
> (if is on)
There are 4 em interfaces on this machine, with new cat6 cables. 2 more em interfaces on another machine that was seeing the same errors (the old router), on different cables. And 2 more em interfaces on another machine that's in production, also with new cables. The input errors (as debugged by sysctl dev.em.0.stats=1 -> read dmesg) are only 2 because of CRC errors, as opposed to around 2.500.000 from other causes. I tend to feel the cable isn't the problem.
Flow control is off, I just checked. I forgot about that one, thanks for reminding me.
Thank you for your help
Alex
More information about the freebsd-net
mailing list