cvs commit: src/sys/conf files options src/sys/net radix.c radix.h route.c route.h rtsock.c src/sys/netinet in_proto.c ip_output.c src/sys/netinet6 in6_proto.c in6_src.c nd6_nbr.c

Thu Apr 17 11:12:23 UTC 2008

Andre Oppermann wrote:
> In OpenBSD multipath one has to install an multipath route explicitly 
> with
> the -mpath modifier to route(8) and for daemons with RTF_MPATH in the 
> routing
> message.  Multipath routes also retain this flag during their 
> lifetime.  If
> not set, the normal one-route-only behavior is kept.  This allows all 
> non-mpath
> aware programs to continue to work.
>
> I think this is the model to follow.  Also for inter-BSD compatibility.

I think it's fair to do this until such time as PF_ROUTE itself may 
evolve. The API has limitations as we know, and it has had a fair share 
of bugs attached to its use of legacy structure, lack of bounds checking 
etc. which has accumulated patches over the years.

There is only 1 bit, RTF_PROTO1, which can be used to tell if a FIB 
entry in PF_ROUTE was dynamically added or not. Everything out there 
overloads this bit, so there is no way to stop daemons from clobbering 
each other.

At the moment, anything written to follow the model of e.g. routed is 
just very likely to clobber other consumers. It is written to assume it 
controls the whole table.

This is not good, considering our implementation of IRDP (Internet 
Router Discovery, RFC 1256) uses the routed code base.

> Yes.  Let me explain.  There are two approaches here: The Quagga/Zebra
> approach where all routing protocol daemons communicate with a central
> daemon that is the single point of contact to the kernel.  The other
> approach is the OpenBGPD/OpenOSPFD approach where each daemon runs on
> its own (because most of the time there is little to no overlap) and
> does its own routing table manipulations.

XORP uses the first approach. It tries to do some side-stepping to 
prevent clobbering other consumers, by e.g. inferring if a route would 
have been created automatically for an interface's subnet address and 
leaving other routes alone.

There is code to restore the previous FIB contents on exit, however 
because of the above lack re RTF_PROTO1, this is not foolproof.

>   The second approach is a
> bit tricky at the moment as the routing socket is not really intended
> for operating in this way and the daemons have to be aware of each
> other in certain ways.

OK, let me just say: I don't believe kernel FIBs should be used as RIBs.
    I am definitely in favour of making changes which allow daemons to 
interoperate, but I don't believe we should be doing things which 
encourage people to use the kernel FIB as a place to exchange routes 
between processes.
    Other than perhaps a "quick hack" for testing something, but it 
really isn't the way to do it on an embedded box (racking up those 
context switches and dirty pages), and the problem with allowing quick 
hacks is that they tend to get perpetuated as kludges.
    After all, if something "kinda" works, people will keep doing it.

    However as you quite correctly point out, when you introduce 
multipath into the kernel FIB, you need to make sure there is no 
collision between old consumers (who know nothing of multiple next-hops) 
and new consumers (who will be checking and using multiple next-hop 
information).

>
> Ideally, and this is what Claudio says as well, we should end up with
> the following functionality:
>
>  - equal cost multipath where one prefix can have multiple next-hops.
>  - ecmp should be explicit with the RTM_MPATH flag.
>  - a hierarchy of multiple prefixes where the one with the highest
>    priority carries the traffic (possibly with ecmp).
>  - the hierarchy should have a number of precedence levels (interface
>    route, static route, IGP route, EGP route, other).
>  - within those precedence levels it should have further subdivision
>    to prefer OSPF over RIP in the IGP category for example.
>  - a change/delete applies to a specific precedence level if specified.
>  - routing socket filters on reading so that routing daemons can
>    select which precendence levels they want to track (IGP doesn't
>    have to track EGP route changes for example).
>
> With this functionality a number of independent but complementary routing
> daemons can work together is a useful and -more important- 
> standardized way.

Linux implements a form of update filtering on the rtnetlink socket.

Re the rest of it:
    This is pretty much what Microsoft does -- up to a point. The API 
for the RIB there isn't really that open, you certainly can't just port 
a Linux or BSD daemon and expect it to work -- it's totally different.
    But they do implement explicit tagging of who owns which route.
    You can see all of these routes with the "route print" or "netsh ip 
show routes" commands, however routes added with either of these CLI 
tools go into a separate RIB table of their own -- you aren't modifying 
the final table.
    To an extent, you can modify the admin distance controlling which 
entries go where in NT, although some things stay hard-coded.

    I actually wrote a DLL for Windows which drops in like any other 
routing protocol, and allows you to inject routes into its RIB manager 
using the BSD routing socket. It uses an NT named pipe for this. Coding 
it wasn't easy as they don't provide a completely working example of how 
to do it -- but their code will get you say 80% of the way there.

    The fact that the BSD PF_ROUTE message format doesn't have support 
for multipath, means that this DLL has to do some side-stepping of its 
own to make sure the routes we plumb into TCPIP.SYS using the RTM API 
are removed properly.

later
BMS