6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial
workaround)
Mike Andrews
mandrews at bit0.com
Tue Jan 16 18:04:24 UTC 2007
I have a strange issue with em0 watchdog timeouts that I think is not the
same as the ones everyone was having during the 6.2 beta cycle...
I have six systems, each with two Intel GigE ports onboard:
Systems A and B: Supermicro PDSMi+
Systems C and D: Supermicro PDSMi (without the plus)
System E: Tyan S2730U3GN
System F: Supermicro X5DPA-GG
On each system:
em0 is connected to a Cisco Catalyst 2960G layer 2 gigabit ethernet switch.
em1 is connected to a Foundry Serveriron XL layer 4-7 fast ethernet switch.
All six run FreeBSD 6.2-RELEASE i386, even though the first four are
capable of running amd64. They all have 2 GB of memory, except E which
has 4 GB. The kernel configs are all identical, and are not that far from
GENERIC + SMP.
Several times a day, em0 will go down, give a watchdog timeout error on
the console, then come right back up on its own a few seconds later. But
here's the weird twist: it ONLY happens on systems A and B, and ONLY when
running at gigabit speed. If I knock the two switch ports down to 100
meg, the problem goes away.
The other four systems C thru F never have watchdog timeout issues; they
always work perfectly even at gigabit speed.
So I'm trying to figure out if there are any other obvious hardware
differences between the plus and non-plus version of the PDSMi that would
be causing issues on the plus version. Fortunately, at the moment we are
not (yet) pushing anywhere near even 100 meg worth of traffic through
these ports, so it's a tolerable workaround... just kinda annoying. :)
The chipset is a bit different: the PDSMi is the Intel E7230 chipset for
Pentium D servers, where the PDSMi+ is the E3000 that adds Core 2 Duo
support. But apparently the NIC chips are identical: 82573V for em0 and
82573L for em1. The BIOS is identical too, so the chipsets must be pretty
similar. Nothing shares an IRQ with the NICs. (USB is disabled in the
BIOS.) They do have different disk systems; A and B are SATA gmirror
setups, while C and D use LSI Megaraid SCSI cards for their mirrors.
I have tried the obvious switching the cables out. No difference at all.
I have NOT yet tried a different gigabit switch.
Hopefully that's enough detail to start; I can get into more specifics as
needed. (Kernel configs, dmesg output, IRQ details, disk details, IPMI,
running apps, serial console access if needed...)
More information about the freebsd-stable
mailing list