svn commit: r278472 - in head/sys: netinet netinet6
Gleb Smirnoff
glebius at FreeBSD.org
Fri Feb 13 23:38:23 UTC 2015
Randall,
thanks a lot for investigating that.
On Fri, Feb 13, 2015 at 05:27:40PM -0500, Randall Stewart wrote:
R> Gleb:
R>
R> Ok here is the diff of the arp timer function that this changed made (238990):
R> ****************************
R> arptimer(void *arg)
R> {
R> + struct llentry *lle = (struct llentry *)arg;
R> struct ifnet *ifp;
R> - struct llentry *lle;
R> - int pkts_dropped;
R> + size_t pkts_dropped;
R>
R> - KASSERT(arg != NULL, ("%s: arg NULL", __func__));
R> - lle = (struct llentry *)arg;
R> + if (lle->la_flags & LLE_STATIC) {
R> + LLE_WUNLOCK(lle);
R> + return;
R> + }
R> +
R> ifp = lle->lle_tbl->llt_ifp;
R> CURVNET_SET(ifp->if_vnet);
R> +
R> + if (lle->la_flags != LLE_DELETED) {
R> + int evt;
R> +
R> + if (lle->la_flags & LLE_VALID)
R> + evt = LLENTRY_EXPIRED;
R> + else
R> + evt = LLENTRY_TIMEDOUT;
R> + EVENTHANDLER_INVOKE(lle_event, lle, evt);
R> + }
R> +
R> + callout_stop(&lle->la_timer);
R> +
R> + /* XXX: LOR avoidance. We still have ref on lle. */
R> + LLE_WUNLOCK(lle);
R> IF_AFDATA_LOCK(ifp);
R> LLE_WLOCK(lle);
R> - if (lle->la_flags & LLE_STATIC)
R> - LLE_WUNLOCK(lle);
R> - else {
R> - if (!callout_pending(&lle->la_timer) &&
R> - callout_active(&lle->la_timer)) {
R> - callout_stop(&lle->la_timer);
R> - LLE_REMREF(lle);
R>
R> - if (lle->la_flags != LLE_DELETED) {
R> - int evt;
R> -
R> - if (lle->la_flags & LLE_VALID)
R> - evt = LLENTRY_EXPIRED;
R> - else
R> - evt = LLENTRY_TIMEDOUT;
R> - EVENTHANDLER_INVOKE(lle_event, lle, evt);
R> - }
R> -
R> - pkts_dropped = llentry_free(lle);
R> - ARPSTAT_ADD(dropped, pkts_dropped);
R> - ARPSTAT_INC(timeouts);
R> - } else {
R> -#ifdef DIAGNOSTIC
R> - struct sockaddr *l3addr = L3_ADDR(lle);
R> - log(LOG_INFO,
R> - "arptimer issue: %p, IPv4 address: \"%s\"\n", lle,
R> - inet_ntoa(
R> - ((const struct sockaddr_in *)l3addr)->sin_addr));
R> -#endif
R> - LLE_WUNLOCK(lle);
R> - }
R> - }
R> + LLE_REMREF(lle);
R> + pkts_dropped = llentry_free(lle);
R> IF_AFDATA_UNLOCK(ifp);
R> + ARPSTAT_ADD(dropped, pkts_dropped);
R> + ARPSTAT_INC(timeouts);
R> CURVNET_RESTORE();
R> }
R> ******************************
R>
R> And I can see *what* the problem was.. If you look at the lines:
R> - if (!callout_pending(&lle->la_timer) &&
R> - callout_active(&lle->la_timer)) {
R>
R> This is the *incorrect* way to write this code it should have been:
R> if (callout_pending(&lle->la_timer) &&
R> !callout_active(&lle->la_timer)) {
R>
R> To properly do the MPSAFE thing.. take a look at the callout manual..
R> So the original author just mixed up the test..
R>
R> The idea is after you get the lock you check if its pending again, if
R> so someone has restarted it.. and if its not active, then someone has
R> stopped it.
R>
R> They check if it was *not* pending.. which is almost always the case
R> since the callout code removes the pending flag and thats what you want
R> to be there. So not pending would always be true..
R>
R> I don’t see that this old code did the callout_deactive().. but I think
R> the callout_stop() does that I guess..
R>
R> I think the problem was that the original code was wrong and that
R> the fix yes, avoided one race but put in another.
R>
R> I do think a callout_async_drain is the right thing, but that will take a while
R> to get right. Peter Holm (thank you so much) has been pounding on this, if you
R> could find another test to add.. that would be great. I think this will work
R> the way it is..
R>
R> R
R>
R>
R>
R>
R>
R>
R> On Feb 13, 2015, at 4:21 PM, Gleb Smirnoff <glebius at freebsd.org> wrote:
R>
R> > 165863
R>
R> --------
R> Randall Stewart
R> rrs at netflix.com
R> 803-317-4952
R>
R>
R>
R>
R>
--
Totus tuus, Glebius.
More information about the svn-src-all
mailing list