LACP kernel panics: /* unlocking is safe here */

Fri Mar 30 22:12:42 UTC 2012

While investigating a LACP issue, I turned on LACP_DEBUG on a debug kernel.  In this configuration it's easy to panic the kernel - just run 'ifconfig lagg0 laggproto lacp' on a lagg that's already in LACP mode and receiving LACP messages.

The problem is that lagg_lacp_detach() drops the lagg wlock (with the comment in the title), which allows incoming LACP messages to get through lagg_input() while the structure is being destroyed in lacp_detach().

There's a very simple fix, but I don't know if it's the best way to fix it.  Resetting the protocol before calling sc_detach causes any further incoming packets to be dropped until the lagg gets reconfigured.  Thoughts?

Is it safe to just hold on to the lagg wlock across the callout_drain() calls in lacp_detach()?  That's what OpenBSD does.

-Andrew

Index: sys/net/if_lagg.c
===================================================================

--- sys/net/if_lagg.c	(revision 233707)
+++ sys/net/if_lagg.c	(working copy)
@@ -952,9 +952,10 @@
 		}
 		if (sc->sc_proto != LAGG_PROTO_NONE) {
 			LAGG_WLOCK(sc);
+			/* Reset protocol */
+			sc->sc_proto = LAGG_PROTO_NONE;
 			error = sc->sc_detach(sc);
-			/* Reset protocol and pointers */
-			sc->sc_proto = LAGG_PROTO_NONE;
+			/* Reset pointers */
 			sc->sc_detach = NULL;
 			sc->sc_start = NULL;
 			sc->sc_input = NULL;

--------------------------------------------------
Andrew Boyer	aboyer at averesystems.com