Race between arptimer() and lle removal [WAS: panic in arptimer in r289937]
Randall Stewart
rrs at netflix.com
Fri Dec 11 19:13:59 UTC 2015
Hans:
I don’t think you are getting a 1 back from the callout_reset()..
If the pending bit is set, you get a 1 back. But if you have a race where
the arp-timer is blocked on the lock (held by arp resolve) your going to
have the pending bit off.. since before calling the function the callout
code removes pending.
Callout-reset() is going to return 0 here since it is running and cannot
be stopped. That should make you *not* decrement your reference.
The only time you get a 1 back (where you will decrement your reference) is if
the PENDING flag is set or if you are migrating and the current one running. The ARP code
does not migrate that I can see. Which means that once you start running
and block a return of 0 will be done by the callout system.
Since you check the PENDING flag at the top of the callout, that would get you
to return if its been reset.
However I notice down in the arptimer code a unlock/lock/lock is done. Now again
the pending flag will be gone when that set of calls is made. So we should
be getting a zero return out of the callout_reset() code.
Hmm.
let me mull on this a bit.. any time there is a unlock/lock/lock that could be
a problem..
R
On Dec 11, 2015, at 4:12 AM, Hans Petter Selasky <hps at selasky.org> wrote:
> On 12/11/15 12:16, Hans Petter Selasky wrote:
>> On 12/11/15 11:12, Alexander V. Chernikov wrote:
>>> 11.12.2015, 12:15, "Hans Petter Selasky" <hps at selasky.org>:
>>>> Hi,
>>>>
>>>> Pulling the nail out of the haystack hopefully.
>>>>
>>>>>> Any ideas on where next to look?
>>>>
>>>> Adrian: In your dump aswell I see:
>>>>
>>>> la_flags = 1
>>>>
>>>> That means there was a race calling arptimer() and removing the "lle".
>>> Yes. The interesting part here is why lle is removed. There are quite
>>> a few reasons: either interface address deleted or interface going
>>> down, or explicit delete request.
>>> That's why I asked Adrian about interface stuff (and haven't got a
>>> reply).
>>>>
>>>> Alexander: Can you comment on the following patch:
>>>>
>>>> > Index: netinet/if_ether.c
>>>> > ===================================================================
>>>> > --- netinet/if_ether.c (revision 291256)
>>>> > +++ netinet/if_ether.c (working copy)
>>>> > @@ -185,7 +185,13 @@
>>>> > LLE_WUNLOCK(lle);
>>>> > return;
>>>> > }
>>>> > - ifp = lle->lle_tbl->llt_ifp;
>>>> > + if (lle->la_flags & LLE_LINKED) {
>>>> > + ifp = lle->lle_tbl->llt_ifp;
>>>> > + } else {
>>>> > + /* XXX RACE entry has been freed */
>>>> > + llentry_free(lle);
>>>> > + return;
>>>> > + }
>>>> > CURVNET_SET(ifp->if_vnet);
>>>> >
>>>> > if ((lle->la_flags & LLE_DELETED) == 0) {
>>>>
>>>> We need a check in arptimer() that the lle is still linked before
>>> Yes, I had exactly that approach in mind. (And nd6_llinfo_timer()
>>> needs the same fix).
>>> So, would you commit it or should I?
>>>> proceeding, in there from what I can see. Because the callback is not
>>>> protected by a mutex, it is not atomically stopped by callout_stop().
>>
>> Hi,
>>
>> Talking to Randall offlist, I see some more trouble. Let's get
>> everything straight before making a fix. There is one more race I see:
>>
>> The start of the arptimer() callback looks like this:
>>
>> > static void
>> > arptimer(void *arg)
>> > {
>> POINT0
>> > LLE_WLOCK(lle);
>> > if (callout_pending(&lle->lle_timer)) {
>> POINT1
>> > LLE_WUNLOCK(lle);
>> > return;
>> > }
>>
>> The code starting the callback looks like this:
>>
>> > LLE_ADDREF(la);
>> > la->la_expire = time_uptime;
>> > canceled = callout_reset(&la->lle_timer, hz *
>> V_arpt_down,
>> > arptimer, la);
>> > if (canceled)
>> > LLE_REMREF(la);
>>
>> Which can be written like this:
>>
>> > la->la_expire = time_uptime;
>> > canceled = callout_reset(&la->lle_timer, hz * V_arpt_down,
>> > arptimer, la);
>> > if (canceled == 0)
>> > LLE_ADDREF(la);
>>
>> In case we are at POINT0 in arptimer, callout_reset() will not be able
>> to cancel the callout and will return 0. We should also drop one ref at
>> POINT1, or rewrite the code a bit, which might need Randall's help in
>> the callout subsystem area.
>>
>
> Hi,
>
> Looking at the version history, I see Gleb Smirnoff and Randall, heavily involved with these code paths. Gleb and Randall, do you have any comments on the above findings?
>
> Gleb+Randall: Do you agree or disagree there is a race as pointed out above?
>
> Ways forward:
>
> 1) Revert r278472 (done by Randall) and fix r238990 (done by Gleb) to use the new callout_async_drain() instead of callout_stop() to avoid using the rm lock after free.
>
> 2) Use callout_stop() before callout_reset() and add a reference when the callout was not pending nor completing. We need to use callout_stop() in this case because callout_reset() only has two return values and cannot be used to distinguish the three callout states in use.
>
> 3) Modify callout_reset() to have three return values and fix the arptimer code to not leak references.
>
> --HPS
--------
Randall Stewart
rrs at netflix.com
803-317-4952
More information about the freebsd-net
mailing list