Intel em0: watchdog timeout
Kirk Davis
kirk.davis at epsb.ca
Mon Feb 22 20:46:35 UTC 2010
From: Jack Vogel [mailto:jfvogel at gmail.com]
Try `sysctl dev.em.0.stats=1` and em.2, you're right though,
doesn't look like any
system mbuf failures.
Does this need to be done in loader.conf? It doesn't seem to take from
the command line.
# sysctl dev.em.2.stats=1
dev.em.2.stats: -1 -> -1
# sysctl dev.em.2.stats
dev.em.2.stats: -1
7.2 seems to be a stable base OS and driver, 8 is better in some
respects, but
has not been without its reported problems. I leave the choice
to you.
Without more data I am not sure what is causing the watchdog.
Yes, I am having trouble tracking it down. I up'ed the mbufs to 65536
just to see if it made any difference but it is still happening.
############ SET NMBCLUSTERS TO 65536 ##########################
Feb 22 12:45:21 inet-gw kernel: em0: watchdog timeout -- resetting
Feb 22 12:45:21 inet-gw kernel: em0: link state changed to DOWN
Feb 22 12:45:25 inet-gw kernel: em0: link state changed to UP
Feb 22 12:45:25 inet-gw kernel: em0: link state changed to DOWN
Feb 22 12:45:28 inet-gw kernel: em0: link state changed to UP
Feb 22 12:45:29 inet-gw kernel: em0: link state changed to DOWN
Feb 22 12:45:31 inet-gw kernel: em0: link state changed to UP
# netstat -m
8183/6037/14220 mbufs in use (current/cache/total)
7160/3598/10758/65536 mbuf clusters in use (current/cache/total/max)
7160/3592 mbuf+clusters out of packet secondary zone in use
(current/cache)
0/104/104/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
16365K/9121K/25487K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines
I guess I will have to build up the new server with 7.3 on it and see if
the newer driver makes any difference.
---- Kirk
On Mon, Feb 22, 2010 at 10:55 AM, Kirk Davis
<kirk.davis at epsb.ca> wrote:
I have a backup server sitting here that I am going to
load 7.3-RC1 onto and test with it. It is the exact duplicate hardware
so that should help with the upgraded driver. Does it make sence to go
to 8.0?
Here is the mbuf usage on this server. I'm nore sure
exactly how to read this but it seem to looks OK.
# netstat -m
8181/5904/14085 mbufs in use (current/cache/total)
7159/3471/10630/25600 mbuf clusters in use
(current/cache/total/max)
7159/3465 mbuf+clusters out of packet secondary zone in
use (current/cache)
0/104/104/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use
(current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use
(current/cache/total/max)
16363K/8834K/25197K bytes allocated to network
(current/cache/total)
0/0/0 requests for mbufs denied
(mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines
---- Kirk
________________________________
From: Jack Vogel [mailto:jfvogel at gmail.com]
Sent: Monday, February 22, 2010 11:43 AM
To: Kirk Davis
Cc: freebsd-net at freebsd.org
Subject: [SPAM:#] Re: Intel em0: watchdog timeout
With the increased load you might be running out of
mbufs more easily,
would suggest you increase the mbuf pool.
This is an old old driver now, you might consider going
to something a
bit more recent.
Jack
On Mon, Feb 22, 2010 at 10:14 AM, Kirk Davis
<kirk.davis at epsb.ca> wrote:
Hi,
I have a FreeBSD server running Quagga as
a BGP router. It has
a number of interfaces in it both bce and em.
The most heavily used
interfaces are starting to give me watchdog
timeout errors just in the
last week. We normally sustain about 300Mb/s on
both of these
interfaces but in the last week this now up to
380Mb/s.
This is a Intel Pro/1000 PT dual
interface PCI-E card. There is
two of them in the server. The server is a Dell
2950
Searching the mailing list and checking
on google has not turned
up much. Since this is our main router it is
difficult to test with. I
have seen one message that suggests trying to
set hw.em.rxd=1024 and
hw.em.txd=1024 in loader.conf and another that
suggested turning off
but none this has not made any difference.
The odd thing is that this just started.
This box has been up
and running fine for a while. The only thing
different on our network
had been an increase in the bandwidth.
Any idea where I go from here to trouble
shoot this?
# uname -a
FreeBSD inet-gw.epsb.ca 7.1-STABLE FreeBSD
7.1-STABLE #3: Mon Mar 23
16:08:53 MDT 2009
root at inet-gw-test.epsb.ca:/usr/obj/usr/src/sys/DELL2950 amd64
# tail /var/log/messages
Feb 19 12:26:04 inet-gw kernel: em0: watchdog
timeout -- resetting
Feb 19 12:26:04 inet-gw kernel: em0: link state
changed to DOWN
Feb 19 12:26:07 inet-gw kernel: em0: link state
changed to UP
Feb 19 12:26:08 inet-gw kernel: em0: link state
changed to DOWN
Feb 19 12:26:10 inet-gw kernel: em0: link state
changed to UP
Feb 19 14:44:20 inet-gw kernel: em0: watchdog
timeout -- resetting
Feb 19 14:44:20 inet-gw kernel: em0: link state
changed to DOWN
Feb 19 14:44:23 inet-gw kernel: em0: link state
changed to UP
Feb 19 15:05:03 inet-gw kernel: em2: watchdog
timeout -- resetting
Feb 19 15:05:03 inet-gw kernel: em2: link state
changed to DOWN
Feb 19 15:05:05 inet-gw kernel: em2: link state
changed to UP
Feb 19 15:07:39 inet-gw kernel: em2: watchdog
timeout -- resetting
Feb 19 15:07:39 inet-gw kernel: em2: link state
changed to DOWN
Feb 19 15:07:42 inet-gw kernel: em2: link state
changed to UP
# from /var/run/dmesg.boot
em0: <Intel(R) PRO/1000 Network Connection
6.9.6> port 0xdce0-0xdcff mem
0xd5ee0000-0xd5efffff,0xd5ec0000-0xd5edffff irq
17 at device 0.0 on pci8
em0: Using MSI interrupt
em0: [FILTER]
em0: Ethernet address: 00:15:17:a6:ae:94
em2: <Intel(R) PRO/1000 Network Connection
6.9.6> port 0xcce0-0xccff mem
0xde3e0000-0xde3fffff,0xde3c0000-0xde3dffff irq
16 at device 0.0 on
pci10
em2: Using MSI interrupt
em2: [FILTER]
em2: Ethernet address: 00:15:17:a6:af:d6
# pciconf -lv
em0 at pci0:8:0:0: class=0x020000 card=0x135e8086
chip=0x105e8086 rev=0x06
hdr=0x00
vendor = 'Intel Corporation'
device = 'PRO/1000 PT'
class = network
subclass = ethernet
em2 at pci0:10:0:0: class=0x020000
card=0x135e8086 chip=0x105e8086
rev=0x06 hdr=0x00
vendor = 'Intel Corporation'
device = 'PRO/1000 PT'
class = network
subclass = ethernet
# netstat -bdhI em2 2
input (em2) output
packets errs bytes packets errs
bytes colls drops
65K 0 72M 51K 0
9.4M 0 0
69K 0 78M 52K 0
8.5M 0 0
76K 0 88M 55K 0
11M 0 0
74K 0 85M 54K 0
10M 0 0
78K 0 91M 56K 0
9.0M 0 0
75K 0 86M 54K 0
8.7M 0 0
74K 0 85M 54K 0
9.2M 0 0
75K 0 86M 56K 0
10M 0 0
78K 0 88M 55K 0
12M 0 0
78K 0 90M 58K 0
12M 0 0
76K 0 87M 54K 0
10M 0 0
79K 0 91M 56K 0
10M 0 0
---- Kirk
------------------------------------------------------------------------
--------
Kirk Davis
Senior Network Analyst, ITS
Edmonton Public Schools
One Kingsway Ave.
Edmonton, Alberta, Canada
T5H 4G9
_______________________________________________
freebsd-net at freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to
"freebsd-net-unsubscribe at freebsd.org"
More information about the freebsd-net
mailing list