git: 35547df5c786 - main - Call wakeup() with the lock held to avoid missed wakeup races.

John Baldwin jhb at FreeBSD.org
Wed Aug 11 22:42:21 UTC 2021


On 8/11/21 3:31 PM, Luiz Otavio O Souza wrote:
> On Wed, Aug 11, 2021 at 7:03 PM John Baldwin <jhb at freebsd.org> wrote:
>>
>> On 8/11/21 2:38 PM, Luiz Otavio O Souza wrote:
>>> On Wed, Aug 11, 2021 at 3:49 PM John Baldwin <jhb at freebsd.org> wrote:
>>>>
>>>> On 8/10/21 3:41 PM, Scott Long wrote:
>>>>> The branch main has been updated by scottl:
>>>>>
>>>>> URL: https://cgit.FreeBSD.org/src/commit/?id=35547df5c78653b2da030f920323c0357056099f
>>>>>
>>>>> commit 35547df5c78653b2da030f920323c0357056099f
>>>>> Author:     Scott Long <scottl at FreeBSD.org>
>>>>> AuthorDate: 2021-08-10 22:36:38 +0000
>>>>> Commit:     Scott Long <scottl at FreeBSD.org>
>>>>> CommitDate: 2021-08-10 22:36:38 +0000
>>>>>
>>>>>        Call wakeup() with the lock held to avoid missed wakeup races.
>>>>>
>>>>>        Submitted by: luiz
>>>>>        Sponsored by: Rubicon Communications, LLC ("Netgate")
>>>>> ---
>>>>>     sys/dev/sdhci/sdhci.c | 2 +-
>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/sys/dev/sdhci/sdhci.c b/sys/dev/sdhci/sdhci.c
>>>>> index d075c2e05000..573e6949b57e 100644
>>>>> --- a/sys/dev/sdhci/sdhci.c
>>>>> +++ b/sys/dev/sdhci/sdhci.c
>>>>> @@ -2078,8 +2078,8 @@ sdhci_generic_release_host(device_t brdev __unused, device_t reqdev)
>>>>>         /* Deactivate led. */
>>>>>         WR1(slot, SDHCI_HOST_CONTROL, slot->hostctrl &= ~SDHCI_CTRL_LED);
>>>>>         slot->bus_busy--;
>>>>> -     SDHCI_UNLOCK(slot);
>>>>>         wakeup(slot);
>>>>> +     SDHCI_UNLOCK(slot);
>>>>>         return (0);
>>>>>     }
>>>>
>>>> Hmm, how does this avoid a race?  The sleep is checking bus_busy under
>>>> the lock and should never see a stale value and go back to sleep after
>>>> the wakeup has occurred:
>>>>
>>>>           SDHCI_LOCK(slot);
>>>>           while (slot->bus_busy)
>>>>                   msleep(slot, &slot->mtx, 0, "sdhciah", 0);
>>>>           slot->bus_busy++;
>>>>           /* Activate led. */
>>>>           WR1(slot, SDHCI_HOST_CONTROL, slot->hostctrl |= SDHCI_CTRL_LED);
>>>>           SDHCI_UNLOCK(slot);
>>>>
>>>> Dropping the lock before wakeup() is a tiny optimization that avoids
>>>> having the second thread wakeup and immediately block on the lock before
>>>> it has been released by the first thread.
>>>>
>>>
>>> 'race' is probably wrong here.  this change will prevent a second
>>> thread from taking the bus before you call wakeup() - poking all other
>>> threads unnecessarily.
>>
>> This change does not prevent that.  The other thread and the thread that
>> are awakened will race with each other to acquire the lock.  wakeup()
>> doesn't do any sort of explicit lock handoff to the thread being awakened
>> and it's just as likely for a thread not yet asleep to acquire the lock as
>> for the thread being awakened to acquire the lock.  If you have observed
>> thundering herd problems with this wakeup() then you might want to change
>> it to wakeup_one().
> 
> correct, but to be more specific, on the first thread, after you free
> the bus and release the lock, a new thread can run, successfully
> acquire the lock and grab the bus.  at this point the first thread
> resumes and call wakeup().  when that happens, the new thread always
> wins, the threads being awakened won't have a chance.

Perhaps on a uniprocessor system this might be true, but otherwise
your new thread is likely spinning adaptively on the lock on another
CPU and grabs it as soon as it is released before the thread awakened by
wakeup() is even scheduled on a CPU and given a chance to run.  That is,
I suspect the new thread always wins even with this change on any
multiprocessor system, but it now has to wait a bit longer before it
wins.  Have you observed some specific behavior with traces that this
change seeks to address?

-- 
John Baldwin


More information about the dev-commits-src-main mailing list