how to read dynamic data structures from the kernel (was Re:
reading routing table)
Bruce M. Simpson
bms at FreeBSD.org
Tue Sep 2 14:55:56 UTC 2008
Luigi Rizzo wrote:
> do you know if any of the *BSD kernels implements some good mechanism
> to access a dynamic kernel data structure (e.g. the routing tree/trie,
> or even a list or hash table) without the flaws of the two approaches
> i indicate above ?
>
Hahaha. I ran into an isomorphic problem with Net-SNMP at work last week.
There's a need to export the BGP routing table via SNMP. Of course
doing this in our framework at work requires some IPC calls which always
require a select() (or WaitForMultipleObjects()) based continuation.
Net-SNMP doesn't support continuations at the table iterator level,
so somehow, we need to implement an iterator which can accomodate our
blocking IPC mechanism.
[No, we don't use threads, and that would actually create more
problems than it solves -- running single-threaded with continuations
lets us run lock free, and we rely on the OS's IPC primitives to
serialize our code. works just fine for us so far...]
So we would end up caching the whole primary key range in the SNMP
sub-agent on a table OID access, a technique which would allow us to
defer the IPC calls providing we walk the entire range of the iterator
and cache the keys -- but even THAT is far too much data for the BGP
table, which is a trie with ~250,000 entries. I hate SNMP GETNEXT.
Back to the FreeBSD kernel, though.
If you look at in_mcast.c, particularly in p4 bms_netdev, this is
what happens for the per-socket multicast source filters -- there is the
linearization of an RB-tree for setsourcefilter().
This is fine for something with a limit of ~256 entries per socket
(why RB for something so small? this is for space vs time -- and also it
has to merge into a larger filter list in the IGMPv3 paths.)
And the lock granularity is per-socket. However it doesn't do for
something as big as a BGP routing table.
C++ lends itself well to expressing these kinds of smart-pointer
idioms, though.
I'm thinking perhaps we need the notion of a sysctl iterator, which
allocates a token for walking a shared data structure, and is able to
guarantee that the token maps to a valid pointer for the same entry,
until its 'advance pointer' operation is called.
Question is, who's going to pull the trigger?
cheers
BMS
P.S. I'm REALLY getting fed up with the lack of openness and
transparency largely incumbent in doing work in p4.
Come one come all -- we shouldn't need accounts for folk to see and
contribute what's going on, and the stagnation is getting silly. FreeBSD
development should not be a committer or chum-of-committer in-crowd.
More information about the freebsd-net
mailing list