Network device driver KPI/ABI and TOE
Robert Watson
rwatson at FreeBSD.org
Sun Jan 6 05:47:26 PST 2008
Dear all,
Last month, Kip Macy committed support for TCP offload to the FreeBSD CVS
repository for the Chelsio 10gbps device driver. We've had interest from
other vendors in supporting TOE on FreeBSD, although it remains unclear as yet
which will end up supporting it. This e-mail is about how we want to treat
the TOE interface with respect to third party device driver support, and more
specifically to propose that we not consider the TOE interface to be part of
our stable network device driver KPI/ABI once it appears in a RELENG_X branch.
The background: in the last few FreeBSD versions (late 5.x, 6.x, 7.x), we've
attempted to offer network and storage device driver authors a stable KPI and
ABI across minor FreeBSD releases. The goal of this has been to allow authors
to produce a device driver module for a .0 release, and then have it continue
to function for .1, .2, and so on. We've not attempted to formalize the
details of this for network device drivers, but implicitly this includes
interface stability for things like mbuf and memory management routines, the
ifnet interface, locking interfaces and data structures, newbus, busdma, and
so on. If we had to, we would break the ABI in order to fix critical bugs
(etc), but we try hard to avoid it in order to improve interface stability,
and, in general, we choose not to MFC features that would break existing
device drivers.
TOE comes with a series of defined interfaces in toedev.h (documentation
forthcoming) and tcp_offload.h (documentation now in comments). However, TOE
implementations must also interact directly with the TCP and other stack
internals, including directly accessing socket buffers, routing, the inpcb and
tcpcb data structures, TCP and inpcb locking protocols, and so on. This
happens for two reasons:
- First, TOE needs to interact with the contents of sockets and TCP in order
to implement the offload (i.e., extracting data from socket buffers to
transmit it, putting data into socket buffers on receive, accessing TCP
connection properties such as socket options, address bindings, listen
state, etc).
- Second, TOE hardware implementations often don't implement all of TCP: they
may implement the steady state but not TCP TIMEWAIT or connection setup, for
example.
To get a sense of the level of intimacy of one such driver, it's well worth
perusing src/sys/dev/cxgb/ulp/tom in HEAD. This is not a criticism, but I do
want people to be aware of what's there before getting involved in this
discussion: TOE takes to a whole new level the mantra that layering is good
for protocol design, but not good for implementation performance, and spans
pretty much all layers of the network stack in its scope.
There are serious ABI implications to this approach, as historically we've
made significant changes to the TCP and socket buffer internals during -stable
branches, such as optimizing performance, adding new TCP features, etc.
There's a fairly aggressive list of forthcoming TCP features for 8.0 with MFC
plans for several of them, such as congestion control selection and multiple
routing tables. I've not attempted to analyze these past or proposed changes
in detail to determine how disruptive they would be to a TOE implementation,
but my guess is that they might well break TOE drivers, especially historic
ones, had TOE been supported at the time.
My proposal, and this is really a proposal to drive discussion as much as a
proposal for a policy, is that the internal TCP data structures exported via
the TOE interfaces and accessed by TOE device drivers *not* be considered
ABI/KPI-stable in -STABLE branches. While I think we shouldn't intentionally
change them to break TOE, it's unrealistic to expect that these network stack
internals won't change as part of normal maintenance and feature development
that take place in -STABLE branches.
For those who aren't involved in those day-to-day internals, a comparable
situation might be if a CAM SCSI storage driver was dependent not only on
there being no changes made to the on-disk layout of UFS (even backwards
compatible ones), but also the in-memory data structures of soft updates. Any
significant changes to soft updates internals would break such device drivers
due to a requirement for forward compatibility. In some ways this isn't a
perfect comparison, as soft updates isn't under active development, but from a
layering and abstraction perspective, it's quite similar.
We don't yet ship TOE in a -STABLE branch, but I believe Kip hopes to MFC TOE
support, and with other device driver vendors starting to take a look, I think
we want out thoughts on the table regarding this matter. I presume that we'll
see the TOE interfaces continue to evolve over the next 6-18 months, and we
should make sure that we know whether or not third party device driver authors
can expect ABI/KPI stability before, rather than after, it hits a -STABLE
branch. On a similar note, these necessary changes to network stack internals
will result in modifications to in-tree device drivers, so device driver
authors who implement TOE should expect to see the TOE parts of their drivers
being significantly modified as development occurs on those other parts of the
stack.
There's also the opportunity to think about whether it's possible to harden
things in such a ways as to not give up our flexibility to keep maintaining
and improving TCP (and other related subsystems), yet improving the quality of
life for a third party TOE driver maintainer. For example, might we provide
accessor routines for certain data structures, or attempt to structure things
to hide more of TCP locking from a TOE implementation? Should we suggest that
non-native TOE implementations rely less on our TCP code and provide there own
where the hardware doesn't provide a complete implementation, in order to
avoid building dependency on things that we know will change?
Robert N M Watson
Computer Laboratory
University of Cambridge
More information about the freebsd-arch
mailing list