Lag after resume culprit found
Konstantin Belousov
kostikbel at gmail.com
Thu May 17 09:20:09 UTC 2018
On Thu, May 17, 2018 at 11:06:42AM +0300, Andriy Gapon wrote:
> On 17/05/2018 10:56, Johannes Lundberg wrote:
> >
> >
> > On Thu, May 17, 2018 at 8:46 AM, Johannes Lundberg <johalun0 at gmail.com
> > <mailto:johalun0 at gmail.com>> wrote:
> >
> >
> >
> > On Thu, May 17, 2018 at 7:43 AM, Andriy Gapon <avg at freebsd.org
> > <mailto:avg at freebsd.org>> wrote:
> >
> > On 17/05/2018 02:07, Johannes Lundberg wrote:
> > > https://github.com/freebsd/freebsd/commit/66f063557f257baa9c8aeab9f933171eaa6e1cfa
> > <https://github.com/freebsd/freebsd/commit/66f063557f257baa9c8aeab9f933171eaa6e1cfa>
> > > x86 cpususpend_handler: call wbinvd after setting suspend state bits
> >
> > That's very interesting and surprising.
> > That commit changes something that happens before suspend, it should not
> > have
> > any effect on the system state after resume.
> >
> > Does anyone have a theory of what could be wrong?
> >
> >
> > Nope but moving
> > CPU_CLR_ATOMIC(cpu, &suspended_cpus);
> > back to the end of that scope fixes it.
> >
> >
> >
> > I did some further testing.
> > Calling
> > CPU_CLR_ATOMIC(cpu, &suspended_cpus);
> > before
> > pmap_init_pat();
> > is what "breaks" resume.
> >
> > Is this Intel only or this it happen on AMD as well (which this patch was
> > intended for)?
>
> Not sure about the PAT part, but fpuresume/npxresume would affect all platforms.
> It's a bit puzzling that doing PAT manipulations on one AP while another AP is
> being brought up is problematic. Probably there is something that I am missing.
Manipulating PAT might affect the cache consistency, since contradicting
caching attributes are applied to the line of the suspended_cpus variable
which is already cached. It might be not the variable itself that causes
the final mis-operation, but some other data sharing the line.
More information about the freebsd-current
mailing list