ng_one2many v.s. AFT (NIC Fault Tolerance/Fail Over/Redundancy
Revisited)
Brian A. Seklecki
lavalamp at spiritual-machines.org
Wed Feb 15 18:03:20 PST 2006
FYI, to bring this thread back to the list
---------- Forwarded message ----------
Date: Wed, 15 Feb 2006 20:53:59 -0500 (EST)
From: Brian A. Seklecki <lavalamp at spiritual-machines.org>
To: Jonathan Donaldson <donaldson at cisco.com>, glebius at freebsd.org,
glebius at cell.sick.ru
Cc: jks at clickcom.com, Brian J. Creasy <bcreasy at collaborativefusion.com>,
Chad Ziccardi <cz at digitalfreaks.org>, Danny Howard <dannyman at toldme.com>,
Brad Bendy <brad at shockwebhost.com>
Subject: Re: ng_one2many v.s. AFT (NIC Fault Tolerance/Fail Over/Redundancy
Revisited) (fwd)
On Wed, 15 Feb 2006, Jonathan Donaldson wrote:
> Take a look here:
>
> http://www.freebsd.org/cgi/getmsg.cgi?fetch=607312+0+/usr/local/www/db/text/2004/cvs-all/20041128.cvs-all
>
Yea, I see it now. Sorry. I'm CC'ing the developer who commited the changes,
and the the MFC.
The man page needs to be updated, and it should mention your caveat.
I got caught by your caveat with the one-link-down-at-boot.
However, the code begins to work after bringing up the down link, as if it
would if they were both active at boot, which is good.
Where I got tripped up was that I thought that quote: "The node listens to flow
control message from many hooks, and considers link failed if NGM_LINK_IS_DOWN
is received.",
Where "Flow Control Messages" I interrpted that as something on the wire like a
STP/802.1q BPDU.
Apparently, it's really an In-Kernel event related to the new ethernet
link-state code in 6.x, or maybe just glorrified poll()'ing.
Either way, it works well. Sorry for jumping the gun.
~lava
P.S., in 7.0-CURRENT, there appears to be an import of the OpenBSD bridge(4) to
relate the old-school "options BRIDGE" code. This one being 802.1q STP aware.
When 7.x becomes release production, I suspect I'll end up using that instead
since it works so well with NetBSD/OpenBSD for HA ethernet, plus I'd rather
have a PVST+ Cisco switch make the packet forwarding the decisions >:}
~lava
> and then look here:
>
> http://fxr.watson.org/fxr/source/netgraph/ng_one2many.h?v=RELENG6
>
>
> 65 /* Algorithms for detecting link failure (XXX only one so far) */
> 66 #define NG_ONE2MANY_FAIL_MANUAL 1 /* use enabledLinks[]
> array */
> 67 #define NG_ONE2MANY_FAIL_NOTIFY 2 /* listen to flow control
> msgs */
>
>
> so set your fail alg to 2 and see if you see the messages and failover...
>
>
>
> On Feb 15, 2006, at 8:11 PM, Brian A. Seklecki wrote:
>
>> On Thu, 12 Jan 2006, Brian J. Creasy wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Brian A. Seklecki wrote:
>>> |
>>> | Johnathan's comments suggest that we may need to move to 6.x on the
>>> | production cluster.
>>> |
>>> | 6.x has been upgraded from a technology release to stable, and our goal
>>> | is stability.
>>> |
>>> | Brian: What are you thoughts so far on the 6.x experience?
>>>
>>> no complaints here.. though, i have it running only on my laptop and
>>
>> ....Okay.
>>
>> | <jonathan> As of Freebsd 6_0 (which is at RC1 now), the NG_ONE2MANY does
>> | support the failure of a link which does not end up with 50% packet
>> | loss. There is new code in the One2Many module that xmits a layer 2 "I'm
>> | alive" broadcast out all links, as long as this is picked up on the
>> | other links, then all interfaces are considered alive. If one of the
>> | packets is not received, then after 2 x heartbeat duration that link is
>> | considered "down". I have tested this in the 6.0 code and it works with
>> | one caveat. When the server is brought up, both interfaces must be
>> | connected and live, or for some reason, the failure algorithm never
>> | seems to kick in. I saw exactly what you saw in 5.4 and newer with
>> | regards to the 50% packet loss.</jonathan>
>>
>> Jonathan:
>>
>> I'm not sure where you got the info about this. Accoring to the
>> NG_ONE2MANY(4) page in CVS -rHEAD (-CURRENT):
>>
>> "Currently, the valid settings for the xmitAlg field are
>> NG_ONE2MANY_XMIT_ROUNDROBIN (default) or NG_ONE2MANY_XMIT_ALL. The only
>> valid setting for failAlg is NG_ONE2MANY_FAIL_MANUAL; this is also the
>> default setting."
>>
>> I have 6.1-BETA1 on a box right now and I've got my config setup for
>> NG_ONE2MANY_XMIT_ROUNDROBIN + NG_ONE2MANY_FAIL_NOTIFY and I don't see any
>> layer2 heartbeat related traffic (watching via tcpdump(8) on another
>> machine in the same segment)
>>
>> Can you share what you saw?
>>
>> ~lava
>>
>>> |> mission critical environment).
>>> |> - Xmit-All causes twice as much load on to be placed on the switch
>>> |> /fabric and switch CPU.
>>> |>
>>> |
>>> | <jonathan> As of Freebsd 6_0 (which is at RC1 now), the NG_ONE2MANY does
>>> | support the failure of a link which does not end up with 50% packet
>>> | loss. There is new code in the One2Many module that xmits a layer 2 "I'm
>>> | alive" broadcast out all links, as long as this is picked up on the
>>> | other links, then all interfaces are considered alive. If one of the
>>> | packets is not received, then after 2 x heartbeat duration that link is
>>> | considered "down". I have tested this in the 6.0 code and it works with
>>> | one caveat. When the server is brought up, both interfaces must be
>>> | connected and live, or for some reason, the failure algorithm never
>>> | seems to kick in. I saw exactly what you saw in 5.4 and newer with
>>> | regards to the 50% packet loss.</jonathan>
>>> |
>>> |
>>> |> What ng_one2many needs is a "Active-Standy" XMIT algorithm (STP BOFH's
>>> |> will think BLOCKING/FORWARDING). It could even be used on top of
>>> |> other NetGraph nodes like ng_fec or possibly (hopefully) ng_802.3ad >:}
>>> |>
>>> |
>>>
>>> - --
>>> Brian J. Creasy
>>> Collaborative Fusion, Inc.
>>> 412.422.3463 x4020 bcreasy at collaborativefusion.com
>>>
>>> pgp public key:
>>> ~ http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x5F94E004
>>>
>>> ****************************************************************
>>> IMPORTANT: This message contains confidential information
>>> and is intended only for the individual named. If the reader of
>>> this message is not an intended recipient (or the individual
>>> responsible for the delivery of this message to an intended
>>> recipient), please be advised that any re-use, dissemination,
>>> distribution or copying of this message is prohibited. Please
>>> notify the sender immediately by e-mail if you have received
>>> this e-mail by mistake and delete this e-mail from your system.
>>> E-mail transmission cannot be guaranteed to be secure or
>>> error-free as information could be intercepted, corrupted, lost,
>>> destroyed, arrive late or incomplete, or contain viruses. The
>>> sender therefore does not accept liability for any errors or
>>> omissions in the contents of this message, which arise as a
>>> result of e-mail transmission.
>>> ****************************************************************
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.2 (FreeBSD)
>>> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
>>>
>>> iD8DBQFDxmXvDgwDm1+U4AQRAr3GAJ42+HcJFO595aZvljztWCkd+NWgvACeMQiu
>>> ILXLchBGR90TZTZHjn6DVCY=
>>> =68DY
>>> -----END PGP SIGNATURE-----
>>>
>>
>> l8*
>> -lava
>>
>> x.25 - minix - bitnet - plan9 - 110 bps - ASR 33 - base8
>>
>
> Thanks,
> Jonathan
> -------------------------------------------------------------
> Jonathan Donaldson
> Technical Lead
>
> Cisco Systems - CV2BU
> 4690 E. Fulton St C-210
> Ada, MI 49301
>
> Office: +1-972-813-5251
> Cell: +1-616-301-4277
> eMail: donaldson at cisco.com
>
>
l8*
-lava
x.25 - minix - bitnet - plan9 - 110 bps - ASR 33 - base8
More information about the freebsd-questions
mailing list