Deferring inp_freemoptions() to an asychronous task
John Baldwin
jhb at freebsd.org
Mon Jan 9 16:30:19 UTC 2012
On Monday, January 09, 2012 10:23:48 am Bruce Simpson wrote:
> John,
>
> Sorry it's taken me so long to reply.
>
> No objections in principle to your change, but this seems to point at a
> more general issue with modern network controllers.
>
> You've also stumbled on the behaviour specific to how BSD has
> traditionally dealt with broadcast/multicast sockets. The pcbinfo
> structure can't really be disentangled from this.
>
> Of course, it doesn't help that we have historically required these
> sockets to be bound to INADDR_ANY. It might be useful to break reception
> out using a separate hash/tree, rather than walking all sockets as is
> currently done, but legacy usage needs to be supported.
>
> Interestingly enough, Microsoft has probably done something similar,
> judging from things which appear in MSDN.
>
> John Baldwin wrote:
> > I have a workload at work where a particular device driver can take a while to
> > update its MAC filter table when adding or removing multicast link-layer
> > addresses. One of the ways I've tackled fixing this is to change
> > inp_freemoptions() so that it does all of its actual work asychronously in a
> > separate task. Currently it does its work synchronously; however, it can be
> > invoked while the associated protocol holds a write lock on its pcbinfo lock
> > (e.g. from in_pcbdetach() called from udp_detach()). This stalls all packet
> > reception for that protocol since received packets need a read lock on the
> > pcbinfo to lookup the socket associated with a given (ip, port) tuple.
>
> There is often a delay between asking for the group and actually getting
> the hash filter entry set up in the MAC, so the operations are async.
>
> I can see many apps like to assume the operation is instantaneous rather
> than deferred; they are probably being naive...
>
> The same being true for taking down the hash filter entry is not surprising.
The other fun part in this case is that if it is going to take a long time, a
driver should probably be enabling reception of all multicast (equivalent of
IFF_ALLMULTI) while it reprograms the table to avoid dropping packets for
already-joined groups. I'm not currently doing this as we are using a different
hack, but I think that is something drivers should probably be doing.
--
John Baldwin
More information about the freebsd-net
mailing list