Intel em0: watchdog timeout

Mon Feb 22 20:46:35 UTC 2010

	From: Jack Vogel [mailto:jfvogel at gmail.com] 

	Try `sysctl dev.em.0.stats=1` and em.2, you're right though,
doesn't look like any
	system mbuf failures.

Does this need to be done in loader.conf?  It doesn't seem to take from
the command line.
# sysctl dev.em.2.stats=1   
dev.em.2.stats: -1 -> -1

# sysctl dev.em.2.stats
dev.em.2.stats: -1

	7.2 seems to be a stable base OS and driver, 8 is better in some
respects, but
	has not been without its reported problems. I leave the choice
to you.

	Without more data I am not sure what is causing the watchdog.

Yes, I am having trouble tracking it down.  I up'ed the mbufs to 65536
just to see if it made any difference but it is still happening.

############ SET NMBCLUSTERS TO 65536 ##########################
Feb 22 12:45:21 inet-gw kernel: em0: watchdog timeout -- resetting
Feb 22 12:45:21 inet-gw kernel: em0: link state changed to DOWN
Feb 22 12:45:25 inet-gw kernel: em0: link state changed to UP
Feb 22 12:45:25 inet-gw kernel: em0: link state changed to DOWN
Feb 22 12:45:28 inet-gw kernel: em0: link state changed to UP
Feb 22 12:45:29 inet-gw kernel: em0: link state changed to DOWN
Feb 22 12:45:31 inet-gw kernel: em0: link state changed to UP

# netstat -m
8183/6037/14220 mbufs in use (current/cache/total)
7160/3598/10758/65536 mbuf clusters in use (current/cache/total/max)
7160/3592 mbuf+clusters out of packet secondary zone in use
(current/cache)
0/104/104/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
16365K/9121K/25487K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

I guess I will have to build up the new server with 7.3 on it and see if
the newer driver makes any difference.

---- Kirk

	On Mon, Feb 22, 2010 at 10:55 AM, Kirk Davis
<kirk.davis at epsb.ca> wrote:

		I have a backup server sitting here that I am going to
load 7.3-RC1 onto and test with it.  It is the exact duplicate hardware
so that should help with the upgraded driver.  Does it make sence to go
to 8.0?

		Here is the mbuf usage on this server.  I'm nore sure
exactly how to read this but it seem to looks OK.
		# netstat -m
		8181/5904/14085 mbufs in use (current/cache/total)
		7159/3471/10630/25600 mbuf clusters in use
(current/cache/total/max)
		7159/3465 mbuf+clusters out of packet secondary zone in
use (current/cache)
		0/104/104/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
		0/0/0/6400 9k jumbo clusters in use
(current/cache/total/max)
		0/0/0/3200 16k jumbo clusters in use
(current/cache/total/max)
		16363K/8834K/25197K bytes allocated to network
(current/cache/total)
		0/0/0 requests for mbufs denied
(mbufs/clusters/mbuf+clusters)
		0/0/0 requests for jumbo clusters denied (4k/9k/16k)
		0/0/0 sfbufs in use (current/peak/max)
		0 requests for sfbufs denied
		0 requests for sfbufs delayed
		0 requests for I/O initiated by sendfile
		0 calls to protocol drain routines

		---- Kirk

________________________________

		From: Jack Vogel [mailto:jfvogel at gmail.com] 
		Sent: Monday, February 22, 2010 11:43 AM
		To: Kirk Davis
		Cc: freebsd-net at freebsd.org
		Subject: [SPAM:#] Re: Intel em0: watchdog timeout

		With the increased load you might be running out of
mbufs more easily,
		would suggest you increase the mbuf pool.

		This is an old old driver now, you might consider going
to something a
		bit more recent. 

		Jack

		On Mon, Feb 22, 2010 at 10:14 AM, Kirk Davis
<kirk.davis at epsb.ca> wrote:

			Hi,
			       I have a FreeBSD server running Quagga as
a BGP router.  It has
			a number of interfaces in it both bce and em.
The most heavily used
			interfaces are starting to give me watchdog
timeout errors just in the
			last week.  We normally sustain about 300Mb/s on
both of these
			interfaces but in the last week this now up to
380Mb/s.

			       This is a Intel Pro/1000 PT dual
interface PCI-E card. There is
			two of them in the server.  The server is a Dell
2950

			       Searching the mailing list and checking
on google has not turned
			up much. Since this is our main router it is
difficult to test with.  I
			have seen one message that suggests trying to
set hw.em.rxd=1024 and
			hw.em.txd=1024 in loader.conf and another that
suggested turning off
			but none this has not made any difference.

			       The odd thing is that this just started.
This box has been up
			and running fine for a while. The only thing
different on our network
			had been an increase in the bandwidth.

			       Any idea where I go from here to trouble
shoot this?

			# uname -a
			FreeBSD inet-gw.epsb.ca 7.1-STABLE FreeBSD
7.1-STABLE #3: Mon Mar 23
			16:08:53 MDT 2009

root at inet-gw-test.epsb.ca:/usr/obj/usr/src/sys/DELL2950  amd64

			# tail /var/log/messages
			Feb 19 12:26:04 inet-gw kernel: em0: watchdog
timeout -- resetting
			Feb 19 12:26:04 inet-gw kernel: em0: link state
changed to DOWN
			Feb 19 12:26:07 inet-gw kernel: em0: link state
changed to UP
			Feb 19 12:26:08 inet-gw kernel: em0: link state
changed to DOWN
			Feb 19 12:26:10 inet-gw kernel: em0: link state
changed to UP
			Feb 19 14:44:20 inet-gw kernel: em0: watchdog
timeout -- resetting
			Feb 19 14:44:20 inet-gw kernel: em0: link state
changed to DOWN
			Feb 19 14:44:23 inet-gw kernel: em0: link state
changed to UP
			Feb 19 15:05:03 inet-gw kernel: em2: watchdog
timeout -- resetting
			Feb 19 15:05:03 inet-gw kernel: em2: link state
changed to DOWN
			Feb 19 15:05:05 inet-gw kernel: em2: link state
changed to UP
			Feb 19 15:07:39 inet-gw kernel: em2: watchdog
timeout -- resetting
			Feb 19 15:07:39 inet-gw kernel: em2: link state
changed to DOWN
			Feb 19 15:07:42 inet-gw kernel: em2: link state
changed to UP

			# from /var/run/dmesg.boot
			em0: <Intel(R) PRO/1000 Network Connection
6.9.6> port 0xdce0-0xdcff mem
			0xd5ee0000-0xd5efffff,0xd5ec0000-0xd5edffff irq
17 at device 0.0 on pci8
			em0: Using MSI interrupt
			em0: [FILTER]
			em0: Ethernet address: 00:15:17:a6:ae:94
			em2: <Intel(R) PRO/1000 Network Connection
6.9.6> port 0xcce0-0xccff mem
			0xde3e0000-0xde3fffff,0xde3c0000-0xde3dffff irq
16 at device 0.0 on
			pci10
			em2: Using MSI interrupt
			em2: [FILTER]
			em2: Ethernet address: 00:15:17:a6:af:d6

			# pciconf -lv
			em0 at pci0:8:0:0: class=0x020000 card=0x135e8086
chip=0x105e8086 rev=0x06
			hdr=0x00
			   vendor     = 'Intel Corporation'
			   device     = 'PRO/1000 PT'
			   class      = network
			   subclass   = ethernet
			em2 at pci0:10:0:0:        class=0x020000
card=0x135e8086 chip=0x105e8086
			rev=0x06 hdr=0x00
			   vendor     = 'Intel Corporation'
			   device     = 'PRO/1000 PT'
			   class      = network
			   subclass   = ethernet

			# netstat -bdhI em2 2
			           input          (em2)           output
			  packets  errs      bytes    packets  errs
bytes colls drops
			      65K     0        72M        51K     0
9.4M     0     0
			      69K     0        78M        52K     0
8.5M     0     0
			      76K     0        88M        55K     0
11M     0     0
			      74K     0        85M        54K     0
10M     0     0
			      78K     0        91M        56K     0
9.0M     0     0
			      75K     0        86M        54K     0
8.7M     0     0
			      74K     0        85M        54K     0
9.2M     0     0
			      75K     0        86M        56K     0
10M     0     0
			      78K     0        88M        55K     0
12M     0     0
			      78K     0        90M        58K     0
12M     0     0
			      76K     0        87M        54K     0
10M     0     0
			      79K     0        91M        56K     0
10M     0     0

			---- Kirk

------------------------------------------------------------------------
			--------
			Kirk Davis
			Senior Network Analyst, ITS
			Edmonton Public Schools
			One Kingsway Ave.
			Edmonton, Alberta, Canada
			T5H 4G9

			_______________________________________________
			freebsd-net at freebsd.org mailing list

http://lists.freebsd.org/mailman/listinfo/freebsd-net
			To unsubscribe, send any mail to
"freebsd-net-unsubscribe at freebsd.org"