FreeBSD 10 network flapping, ix driver unreliable?
Kevin Bowling
kevin.bowling at kev009.com
Mon Feb 17 21:41:40 UTC 2014
On 2/16/2014 9:04 PM, George Neville-Neil wrote:
>
> On Feb 15, 2014, at 21:32 , Kevin Bowling <kevin.bowling at kev009.com> wrote:
>
>> On 2/15/2014 4:43 PM, George Neville-Neil wrote:
>>>
>>> On Feb 15, 2014, at 15:14 , Kevin Bowling <kevin.bowling at kev009.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have FreeBSD 10.0-RELEASE installed on two Dell C6100 nodes. Each node has an Intel X520-DA2 dual port 10gig card. One of the ports on each go to a switch using direct attach coaxial cables. The other port is directly connected between the two nodes (think crossover in twisted pair terminology) again using direct attach coaxial cables.
>>>>
>>>> On both machines, and on both ports (including the "crossover"), the links flap several times per day.
>>>>
>>>> I've pasted the output of lspci -vv and dmesg here:
>>>> https://gist.github.com/kev009/9024442
>>>>
>>>> There's nothing outstanding about the setup otherwise. I suspected some interaction with the switch initially but the "crossover" has eliminated that suspicion.
>>>>
>>>> It seems the ix driver is not very reliable under common conditions, i.e. https://forums.freebsd.org/viewtopic.php?f=7&t=44570 and a search of this list. Any recommendations or tests?
>>>>
>>>
>>> Can you post (to your gist link) the output of sysctl dev.ix ?
>>
>> Hi George,
>>
>> sysctl info added to gist link. ix0 has been up for around 27 days. ix1 for about 24hrs.
>>
>
> I think this has something to do with it.
>
> dev.ix.0.mac_stats.local_faults: 314
> dev.ix.0.mac_stats.remote_faults: 41
>
> The device is seeing errors at the MAC layer, which I don’t think a driver bug would
> cause, though there is always the possibility of a misconfiguration causing flapping.
> Can you try different cables?
>
> When you hook it to the switch does the switch give better diagnostics? Reading
> over the Intel 82599 chip manual is not, shall we say, illuminating,
> "Number of faults in the local MAC. This register is valid only when the link speed is 10 Gb/s.”
Appreciate your help, this led me to find some new info although it
doesn't entirely answer what local_faluts are for me:
http://grouper.ieee.org/groups/802/3/ae/public/nov00/taborek_2_1100.pdf
I may have spoke too soon, the "crossover" ix1 seems to be holding
steady, so the local and remote faults must have been during negotiation
and me bringing up the interfaces.
On the other system's ix0, the faults are almost all local and quite a
bit more frequent:
dev.ix.0.mac_stats.local_faults: 10752
dev.ix.0.mac_stats.remote_faults: 2
I then noticed the switch had mandatory flow control on both send and
receive for 10gig, but the FreeBSD box was only negotiating receive flow
control. I disabled both on the switch and rebooted but am still seeing
some increments of local_faults.
Could it be a switch STP problem? Switch is a Cisco 4948-10ge. Configs
look like below, which is working well on some copper gigabit interfaces:
spanning-tree mode pvst
spanning-tree portfast default
spanning-tree extend system-id
!
interface TenGigabitEthernet1/49
switchport trunk encapsulation dot1q
switchport mode trunk
spanning-tree portfast trunk
!
interface TenGigabitEthernet1/50
switchport trunk encapsulation dot1q
switchport mode trunk
flowcontrol receive desired
flowcontrol send desired
spanning-tree portfast trunk
!
It will be hard for me to source SFPs and fiber, but I can try to see if
it's a physical layer problem. In the mean time I might try imaging one
of the systems with a different OS and seeing if the problem persists.
Regards,
Kevin Bowling
More information about the freebsd-net
mailing list