[Differential] [Commented On] D1777: Associated fix for arp/nd6 timer usage.
rrs (Randall Stewart)
phabric-noreply at FreeBSD.org
Thu Feb 5 17:11:48 UTC 2015
rrs added a comment.
Jhb/Others
So lets go through your scenario with code in arp:
a) softclock dequeues callout to run
-- Which calls softclock_call_cc
We make it to line:676 and see that "yes" the user (arp) init'd with a rw_mtx
and run the next line 677 (to get the lock).
b) other thread grabs lle_lock
-- Our other thread is a call in to flush the table in net/if_llatble.c the
lucky winner is line 181.
c) softclock blocks on lle_wlock above
-- from the call to line 677 right.
d) other thread tears down structure, unlocks lock, zeros memory, 0xdeadc0de, etc.
-- Now here is where its interesting, the other thread does
if (callout_stop(&lle->la_timer))
LLE_REMREF(lle);
lleentry_free
llentry_free is going to do:
- remove the entry from the lists
- and in the end call LLE_FREE_LOCKED()
- LLE_FREE_LOCKED() is a macro that checks
if (lle->lle_refcnt == 1)
call free function
else {
LLE_REMREF(lle)
LLE_WUNLOCK()
}
Since we are a "stoppable callout" and the callout has a lock, it will fall through (even though its not
safe) and return 1, yes the callout was stopped. So we hit the call free function which in this
case in_lltable_free
which does:
LLE_WUNLOCK(lle)
LLE_LOK_DESTROY(lle)
free(lle, M_LLTABLE)
e) softclock wakes up in mutex code and panics becuase the mutex is destroyed and it either triggers an assertion, follows a bad pointer trying to propagate priority or see if the "owner" is running, etc.
-- And we wake up and boom.
However, with the change from callout_init_rw(c,..) -> callout_init(c, 1) things change.
Instead you get a 0 return from callout_stop, since the callout can *not* be stopped at this point.
We don't do that first reference lower so we hit the else case in llentry_free which just reduces
the count to 1 and unlocks and returns.
Now our callout proceeds, getting the lock and it will then go through and check only
the pending flag (the active has been removed by the stop but we don't care). There
we now do the free and all is well.
That is how this fix avoids the issue.
Would it be better to have callout_async_drain()? Yes probably so, but then this
code would have to be restructured a lot more than this small change.
I hope that explains how it works here.. unless of course I am missing something???
R
REVISION DETAIL
https://reviews.freebsd.org/D1777
To: rrs, imp, sbruno, gnn, rwatson, lstewart, kostikbel, adrian, bz, jhb
Cc: bz, emaste, hiren, julian, hselasky, freebsd-net
More information about the freebsd-net
mailing list