Re: git: 03f868b163ad - main - x86: Add a required store-load barrier in cpu_idle()
- In reply to: Mark Johnston : "Re: git: 03f868b163ad - main - x86: Add a required store-load barrier in cpu_idle()"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 17 Jul 2022 14:36:37 UTC
On Sun, 17 Jul 2022 06:16:24 -0400 Mark Johnston <markj@freebsd.org> wrote: > On Sun, Jul 17, 2022 at 11:31:43AM +0200, Tijl Coosemans wrote: > > On Thu, 14 Jul 2022 14:47:37 GMT Mark Johnston <markj@FreeBSD.org> wrote: > > > The branch main has been updated by markj: > > > > > > URL: https://cgit.FreeBSD.org/src/commit/?id=03f868b163ad46d6f7cb03dc46fb83ca01fb8f69 > > > > > > commit 03f868b163ad46d6f7cb03dc46fb83ca01fb8f69 > > > Author: Mark Johnston <markj@FreeBSD.org> > > > AuthorDate: 2022-07-14 14:24:25 +0000 > > > Commit: Mark Johnston <markj@FreeBSD.org> > > > CommitDate: 2022-07-14 14:28:01 +0000 > > > > > > x86: Add a required store-load barrier in cpu_idle() > > > > > > ULE's tdq_notify() tries to avoid delivering IPIs to the idle thread. > > > In particular, it tries to detect whether the idle thread is running. > > > There are two mechanisms for this: > > > - tdq_cpu_idle, an MI flag which is set prior to calling cpu_idle(). If > > > tdq_cpu_idle == 0, then no IPI is needed; > > > - idle_state, an x86-specific state flag which is updated after > > > cpu_idleclock() is called. > > > > > > The implementation of the second mechanism is racy; the race can cause a > > > CPU to go to sleep with pending work. Specifically, cpu_idle_*() set > > > idle_state = STATE_SLEEPING, then check for pending work by loading the > > > tdq_load field of the CPU's runqueue. These operations can be reordered > > > so that the idle thread observes tdq_load == 0, and tdq_notify() > > > observes idle_state == STATE_RUNNING. > > > > > > Some counters indicate that the idle_state check in tdq_notify() > > > frequently elides an IPI. So, fix the problem by inserting a fence > > > after the store to idle_state, immediately before idling the CPU. > > > > > > PR: 264867 > > > Reviewed by: mav, kib, jhb > > > MFC after: 1 month > > > Sponsored by: The FreeBSD Foundation > > > Differential Revision: https://reviews.freebsd.org/D35777 > > > --- > > > sys/x86/x86/cpu_machdep.c | 103 ++++++++++++++++++++++++++++------------------ > > > 1 file changed, 62 insertions(+), 41 deletions(-) > > > > > > diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c > > > index fa11f64e2779..040438043c73 100644 > > > --- a/sys/x86/x86/cpu_machdep.c > > > +++ b/sys/x86/x86/cpu_machdep.c > > > @@ -52,6 +52,7 @@ __FBSDID("$FreeBSD$"); > > > #include "opt_maxmem.h" > > > #include "opt_mp_watchdog.h" > > > #include "opt_platform.h" > > > +#include "opt_sched.h" > > > #ifdef __i386__ > > > #include "opt_apic.h" > > > #endif > > > @@ -532,32 +533,25 @@ static int idle_mwait = 1; /* Use MONITOR/MWAIT for short idle. */ > > > SYSCTL_INT(_machdep, OID_AUTO, idle_mwait, CTLFLAG_RWTUN, &idle_mwait, > > > 0, "Use MONITOR/MWAIT for short idle"); > > > > > > -static void > > > -cpu_idle_acpi(sbintime_t sbt) > > > +static bool > > > +cpu_idle_enter(int *statep, int newstate) > > > { > > > - int *state; > > > + KASSERT(atomic_load_int(statep) == STATE_RUNNING, > > > + ("%s: state %d", __func__, atomic_load_int(statep))); > > > > > > - state = &PCPU_PTR(monitorbuf)->idle_state; > > > - atomic_store_int(state, STATE_SLEEPING); > > > - > > > - /* See comments in cpu_idle_hlt(). */ > > > - disable_intr(); > > > - if (sched_runnable()) > > > - enable_intr(); > > > - else if (cpu_idle_hook) > > > - cpu_idle_hook(sbt); > > > - else > > > - acpi_cpu_c1(); > > > - atomic_store_int(state, STATE_RUNNING); > > > -} > > > - > > > -static void > > > -cpu_idle_hlt(sbintime_t sbt) > > > -{ > > > - int *state; > > > - > > > - state = &PCPU_PTR(monitorbuf)->idle_state; > > > - atomic_store_int(state, STATE_SLEEPING); > > > + /* > > > + * A fence is needed to prevent reordering of the load in > > > + * sched_runnable() with this store to the idle state word. Without it, > > > + * cpu_idle_wakeup() can observe the state as STATE_RUNNING after having > > > + * added load to the queue, and elide an IPI. Then, sched_runnable() > > > + * can observe tdq_load == 0, so the CPU ends up idling with pending > > > + * work. tdq_notify() similarly ensures that a prior update to tdq_load > > > + * is visible before calling cpu_idle_wakeup(). > > > + */ > > > + atomic_store_int(statep, newstate); > > > +#if defined(SCHED_ULE) && defined(SMP) > > > + atomic_thread_fence_seq_cst(); > > > +#endif > > > > Can't you combine the store and the fence with > > atomic_store_explicit(memory_order_seq_cst) or, since this is x86 > > specific code, atomic_swap_int? > > > > #if defined(SCHED_ULE) && defined(SMP) > > atomic_swap_int(statep, newstate); > > #else > > atomic_store_int(statep, newstate); > > #endif > > Yes, I believe so. I wrote it that way initially, but decided it was > nicer to use the same fence as the MI code with which this synchronizes. > What's the advantage of using xchg instead, given that the CPU is about > to idle? ok, I was just wondering.