em driver input errors
Barney Cordoba
barney_cordoba at yahoo.com
Sun Aug 2 16:54:42 UTC 2009
--- On Sat, 8/1/09, alexpalias-bsdnet at yahoo.com <alexpalias-bsdnet at yahoo.com> wrote:
> From: alexpalias-bsdnet at yahoo.com <alexpalias-bsdnet at yahoo.com>
> Subject: em driver input errors
> To: freebsd-net at freebsd.org
> Date: Saturday, August 1, 2009, 9:05 AM
> Good day
>
> I'm running a FreeBSD 7.2 router and I am seeing a lot of
> input errors on one of the em interfaces (em0), coupled with
> (at approximately the same times) much fewer errors on em1
> and em2. Monitoring is done with SNMP from another
> machine, and the CPU load as reported via SNMP is mostly
> below 30%, with a couple of spikes up to 35%.
>
> Software description:
>
> - FreeBSD 7.2-RELEASE-p2, amd64
> - bsnmpd with modules: hostres and (from ports) snmp_ucd
> - quagga 0.99.12 (running only zebra and bgpd)
> - netgraph (ng_ether and ng_netflow)
>
> Hardware description:
>
> - Dell machine, dual Xeon 3.20 GHz, 4 GB RAM
> - 2 x built-in gigabit interfaces (em0, em1)
> - 1 x dual-port gigabit interface, PCI-X (em2, em3) [see
> pciconf near the end]
>
>
> The machine receives the global routing table ("netstat -nr
> | wc -l" gives 289115 currently).
>
> All of the em interfaces are just configured "up", with
> various vlan interfaces on them. Note that I use "kpps" to
> mean "thousands of packets per second", sorry if that's the
> wrong shorthand.
>
> - em0 sees a traffic of 10...22 kpps in, and 15...35 kpps
> out. In bits, it's 30...120Mbits/s in, and
> 100...210Mbits/s out. Vlans configured are vlan100 and
> vlan200, and most of the traffic is on vlan100 (vlan200 sees
> 4kpps in / 0.5kpps out maximum, with the average at about
> one third of this). em0 is the external interface, and its
> traffic corresponds to the sum of traffic through em1 and
> em2
>
> - em1 has 5 vlans, and sees about 22kpps in / 11kpps out
> (maximum)
>
> - em2 has a single VLAN, and sees about 4...13kpps both in
> and out (almost equal in/out during most of the day)
>
> - em3 is a backup interface, with 2 VLANS, and is the only
> one which has seen no errors.
>
> Only the vlans on em0 are analyzed by ng_netflow, and the
> errors I'm seeing have started appearing days before
> netgraph was even loaded in the kernel.
>
> Tuning done:
>
> /boot/loader.conf:
> hw.em.rxd=4096
> hw.em.txd=4096
>
> Witout the above we were seeing way more errors, now they
> are reduced, but still come in bursts of over 1000 errors on
> em0.
>
> /etc/sysctl.conf:
> net.inet.ip.fastforwarding=1
> dev.em.0.rx_processing_limit=300
> dev.em.1.rx_processing_limit=300
> dev.em.2.rx_processing_limit=300
> dev.em.3.rx_processing_limit=300
>
> Still seeing errros, after some searching the mailing lists
> we also added:
>
> # the four lines below are repeated for em1, em2, em3
> dev.em.0.rx_int_delay=0
> dev.em.0.rx_abs_int_delay=0
> dev.em.0.tx_int_delay=0
> dev.em.0.tx_abs_int_delay=0
>
> Still getting errors, so I also added:
>
> net.inet.ip.intr_queue_maxlen=4096
> net.route.netisr_maxqlen=1024
>
> and
>
> kern.ipc.nmbclusters=655360
>
>
> Also tried with rx_processing_limit set to -1 on all em
> interfaces, still getting errors.
>
> Looking at the shape of the error and packet graphs, there
> seems to be a correlation between the number of packets per
> second on em0 and the height of the error "spikes" on the
> error graph. These spikes are spread throughout the day,
> with spaces (zones with no errors) of various lengths (10
> minutes ... 2 hours spaces within the last 24 hours), but
> sometimes there are errors even in the lowest kpps times of
> the day.
>
> em0 and em1 error times are correlated, with all errors on
> the graph for em0 having a smaller corresponding error spike
> on em1 at the same time, and sometimes an error spike on
> em2.
>
> The old router was seeing about the same traffic, and had
> em0, em1, re0 and re1 network cards, and was only seeing
> errors on the em cards. It was running
> 7.2-PRERELEASE/i386
>
>
> Any suggestions would be greatly appreciated. Please note
> that this is a live router, and I can't reboot it (unless
> absolutely necessary). Tuning that can be applied without
> rebooting will be tried first.
>
> Here are some more details:
>
> Trimmed output of netstat -ni (sorry if there are line
> breaks):
> Name Mtu Network Address
> Ipkts Ierrs Opkts Oerrs Coll
> em0 1500 <Link#1> 00:14:22:xx:xx:xx
> 19744458839 15494721 24284439443 0 0
> em1 1500 <Link#2> 00:14:22:xx:xx:xx
> 12832245469 123181 10105031790 0 0
> em2 1500 <Link#3> 00:04:23:xx:xx:xx
> 12082552403 10964 10339416865 0 0
> em3 1500 <Link#4> 00:04:23:xx:xx:xx
> 79912337 0 48178737 0 0
>
> Relevant part of pciconf -vl:
>
> em0 at pci0:6:7:0: class=0x020000 card=0x016d1028
> chip=0x10768086 rev=0x05 hdr=0x00
> vendor = 'Intel Corporation'
> device = '82541EI Gigabit Ethernet
> Controller'
> class = network
> subclass = ethernet
> em1 at pci0:7:8:0: class=0x020000 card=0x016d1028
> chip=0x10768086 rev=0x05 hdr=0x00
> vendor = 'Intel Corporation'
> device = '82541EI Gigabit Ethernet
> Controller'
> class = network
> subclass = ethernet
> em2 at pci0:9:4:0: class=0x020000 card=0x10128086
> chip=0x10108086 rev=0x01 hdr=0x00
> vendor = 'Intel Corporation'
> device = '82546EB Dual Port Gigabit Ethernet
> Controller (Copper)'
> class = network
> subclass = ethernet
> em3 at pci0:9:4:1: class=0x020000 card=0x10128086
> chip=0x10108086 rev=0x01 hdr=0x00
> vendor = 'Intel Corporation'
> device = '82546EB Dual Port Gigabit Ethernet
> Controller (Copper)'
> class = network
> subclass = ethernet
>
> Kernel messages after sysctl dev.em.0.stats=1:
> (note that I've removed the lines which only showed zeros
> in the second and third outputs)
>
> em0: Excessive collisions = 0
> em0: Sequence errors = 0
> em0: Defer count = 0
> em0: Missed Packets = 15435312
> em0: Receive No Buffers = 16446113
> em0: Receive Length Errors = 0
> em0: Receive errors = 1
> em0: Crc errors = 2
> em0: Alignment errors = 0
> em0: Collision/Carrier extension errors = 0
> em0: RX overruns = 96826
> em0: watchdog timeouts = 0
> em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK MSIX IRQ = 0
> em0: XON Rcvd = 0
> em0: XON Xmtd = 0
> em0: XOFF Rcvd = 0
> em0: XOFF Xmtd = 0
> em0: Good Packets Rcvd = 19002068797
> em0: Good Packets Xmtd = 23168462599
> em0: TSO Contexts Xmtd = 0
> em0: TSO Contexts Failed = 0
>
> [later]
> em0: Excessive collisions = 0
> em0: Missed Packets = 15459111
> em0: Receive No Buffers = 16447082
> em0: Receive errors = 1
> em0: Crc errors = 2
> em0: RX overruns = 96835
> em0: Good Packets Rcvd = 19165047284
> em0: Good Packets Xmtd = 23386976960
>
> [later]
> em0: Excessive collisions = 0
> em0: Missed Packets = 15470583
> em0: Receive No Buffers = 16447686
> em0: Receive errors = 1
> em0: Crc errors = 2
> em0: RX overruns = 96840
> em0: Good Packets Rcvd = 19255466068
> em0: Good Packets Xmtd = 23519004546
>
Note that "most" pcix motherboards wire onboard NICs to 32bits and 33Mhz, mainly because its apparently easier to do so. Its likely that your
add-on card is running at 64bits and 133Mhz.
32bits/33Mhz isn't really fast enough to manage gigabit traffic flows, as
its max burst is only 1 Gb/s, so you really can't use them for any sort
of primary traffic flow. Check with you MB manufacturer as they usually
don't advertise it.
Barney
More information about the freebsd-net
mailing list