cvs commit: src/sys/conf files options src/sys/net radix.c
radix.h
route.c route.h rtsock.c src/sys/netinet in_proto.c ip_output.c
src/sys/netinet6 in6_proto.c in6_src.c nd6_nbr.c
Bruce M. Simpson
bms at FreeBSD.org
Thu Apr 17 11:12:23 UTC 2008
Andre Oppermann wrote:
> In OpenBSD multipath one has to install an multipath route explicitly
> with
> the -mpath modifier to route(8) and for daemons with RTF_MPATH in the
> routing
> message. Multipath routes also retain this flag during their
> lifetime. If
> not set, the normal one-route-only behavior is kept. This allows all
> non-mpath
> aware programs to continue to work.
>
> I think this is the model to follow. Also for inter-BSD compatibility.
I think it's fair to do this until such time as PF_ROUTE itself may
evolve. The API has limitations as we know, and it has had a fair share
of bugs attached to its use of legacy structure, lack of bounds checking
etc. which has accumulated patches over the years.
There is only 1 bit, RTF_PROTO1, which can be used to tell if a FIB
entry in PF_ROUTE was dynamically added or not. Everything out there
overloads this bit, so there is no way to stop daemons from clobbering
each other.
At the moment, anything written to follow the model of e.g. routed is
just very likely to clobber other consumers. It is written to assume it
controls the whole table.
This is not good, considering our implementation of IRDP (Internet
Router Discovery, RFC 1256) uses the routed code base.
> Yes. Let me explain. There are two approaches here: The Quagga/Zebra
> approach where all routing protocol daemons communicate with a central
> daemon that is the single point of contact to the kernel. The other
> approach is the OpenBGPD/OpenOSPFD approach where each daemon runs on
> its own (because most of the time there is little to no overlap) and
> does its own routing table manipulations.
XORP uses the first approach. It tries to do some side-stepping to
prevent clobbering other consumers, by e.g. inferring if a route would
have been created automatically for an interface's subnet address and
leaving other routes alone.
There is code to restore the previous FIB contents on exit, however
because of the above lack re RTF_PROTO1, this is not foolproof.
> The second approach is a
> bit tricky at the moment as the routing socket is not really intended
> for operating in this way and the daemons have to be aware of each
> other in certain ways.
OK, let me just say: I don't believe kernel FIBs should be used as RIBs.
I am definitely in favour of making changes which allow daemons to
interoperate, but I don't believe we should be doing things which
encourage people to use the kernel FIB as a place to exchange routes
between processes.
Other than perhaps a "quick hack" for testing something, but it
really isn't the way to do it on an embedded box (racking up those
context switches and dirty pages), and the problem with allowing quick
hacks is that they tend to get perpetuated as kludges.
After all, if something "kinda" works, people will keep doing it.
However as you quite correctly point out, when you introduce
multipath into the kernel FIB, you need to make sure there is no
collision between old consumers (who know nothing of multiple next-hops)
and new consumers (who will be checking and using multiple next-hop
information).
>
> Ideally, and this is what Claudio says as well, we should end up with
> the following functionality:
>
> - equal cost multipath where one prefix can have multiple next-hops.
> - ecmp should be explicit with the RTM_MPATH flag.
> - a hierarchy of multiple prefixes where the one with the highest
> priority carries the traffic (possibly with ecmp).
> - the hierarchy should have a number of precedence levels (interface
> route, static route, IGP route, EGP route, other).
> - within those precedence levels it should have further subdivision
> to prefer OSPF over RIP in the IGP category for example.
> - a change/delete applies to a specific precedence level if specified.
> - routing socket filters on reading so that routing daemons can
> select which precendence levels they want to track (IGP doesn't
> have to track EGP route changes for example).
>
> With this functionality a number of independent but complementary routing
> daemons can work together is a useful and -more important-
> standardized way.
Linux implements a form of update filtering on the rtnetlink socket.
Re the rest of it:
This is pretty much what Microsoft does -- up to a point. The API
for the RIB there isn't really that open, you certainly can't just port
a Linux or BSD daemon and expect it to work -- it's totally different.
But they do implement explicit tagging of who owns which route.
You can see all of these routes with the "route print" or "netsh ip
show routes" commands, however routes added with either of these CLI
tools go into a separate RIB table of their own -- you aren't modifying
the final table.
To an extent, you can modify the admin distance controlling which
entries go where in NT, although some things stay hard-coded.
I actually wrote a DLL for Windows which drops in like any other
routing protocol, and allows you to inject routes into its RIB manager
using the BSD routing socket. It uses an NT named pipe for this. Coding
it wasn't easy as they don't provide a completely working example of how
to do it -- but their code will get you say 80% of the way there.
The fact that the BSD PF_ROUTE message format doesn't have support
for multipath, means that this DLL has to do some side-stepping of its
own to make sure the routes we plumb into TCPIP.SYS using the RTM API
are removed properly.
later
BMS
More information about the cvs-src
mailing list