svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include
Kurt Lidl
lidl at pix.net
Mon Feb 6 18:38:29 UTC 2017
On 2/5/17 1:59 PM, Jason Harmening wrote:
> Actually attaching the patch this time (**** gmail client)
>
> On Sun, Feb 5, 2017 at 10:58 AM, Jason Harmening
> <jason.harmening at gmail.com <mailto:jason.harmening at gmail.com>> wrote:
>
> Hmm, it's a good idea to consider the possibility of a barrier
> issue. It wouldn't be the first time we've had such a problem on a
> weakly-ordered architecture. That said, I don't see a problem in
> this case. smp_rendezvous_cpus() takes a spinlock and then issues
> atomic_store_rel_int() to ensure the rendezvous params are visible
> to other cpus. The latter corresponds to lwsync on powerpc, which
> AFAIK should be sufficient to ensure visibility of prior stores.
>
> For now I'm going with the simpler explanation that I made a bad
> assumption in the powerpc get_pcpu() and there is some context in
> which the read of sprg0 doesn't return a consistent pointer value.
> Unfortunately I don't see where that might be right now.
>
> On the mips side, Kurt/Alexander can you test the attached patch?
> It contains a simple fix to ensure get_pcpu() returns the consistent
> per-cpu pointer.
I applied this patch on top of r313347 (which I had verified that a
kernel built from that revisions to boot from successfully).
The kernel from r313347+(this patch) least gets to multi-user on my ERL.
So, that's a big improvement.
I'll start a native buildworld/buildkernel on the ERL, and that ought
to give it a reasonable workout.
-Kurt
>
> On Sat, Feb 4, 2017 at 1:34 PM, Svatopluk Kraus <onwahe at gmail.com
> <mailto:onwahe at gmail.com>> wrote:
>
> Probably not related. But when I took short look to the patch to see
> what could go wrong, I walked into the following comment in
> _rm_wlock(): "Assumes rm->rm_writecpus update is visible on
> other CPUs
> before rm_cleanIPI is called." There is no explicit barrier to
> ensure
> it. However, there might be some barriers inside of
> smp_rendezvous_cpus(). I have no idea what could happened if this
> assumption is not met. Note that rm_cleanIPI() is affected by the
> patch.
>
>
>
> On Sat, Feb 4, 2017 at 9:39 PM, Jason Harmening
> <jason.harmening at gmail.com <mailto:jason.harmening at gmail.com>>
> wrote:
> > Can you post an example of such panic? Only 2 MI pieces were
> changed,
> > netisr and rmlock. I haven't seen problems on my own
> amd64/i386/arm testing
> > of this, so a backtrace might help to narrow down the cause.
> >
> > On Sat, Feb 4, 2017 at 12:22 PM, Andreas Tobler
> <andreast at freebsd.org <mailto:andreast at freebsd.org>>
> > wrote:
> >>
> >> On 04.02.17 20:54, Jason Harmening wrote:
> >>>
> >>> I suspect this broke rmlocks for mips because the rmlock
> implementation
> >>> takes the address of the per-CPU pc_rm_queue when building
> tracker
> >>> lists. That address may be later accessed from another CPU
> and will
> >>> then translate to the wrong physical region if the address
> was taken
> >>> relative to the globally-constant pcpup VA used on mips.
> >>>
> >>> Regardless, for mips get_pcpup() should be implemented as
> >>> pcpu_find(curcpu) since returning an address that may mean
> something
> >>> different depending on the CPU seems like a big POLA
> violation if
> >>> nothing else.
> >>>
> >>> I'm more concerned about the report of powerpc breakage.
> For powerpc we
> >>> simply take each pcpu pointer from the pc_allcpu list (which
> is the same
> >>> value stored in the cpuid_to_pcpu array) and pass it through
> the ap_pcpu
> >>> global to each AP's startup code, which then stores it in
> sprg0. It
> >>> should be globally unique and won't have the
> variable-translation issues
> >>> seen on mips. Andreas, are you certain this change was
> responsible the
> >>> breakage you saw, and was it the same sort of hang observed
> on mips?
> >>
> >>
> >> I'm really sure. 313036 booted fine, allowed me to execute heavy
> >> compilation jobs, np. 313037 on the other side gave me
> various patterns of
> >> panics. During startup, but I also succeeded to get into
> multiuser and then
> >> the panic happend during port building.
> >>
> >> I have no deeper inside where pcpu data is used. Justin
> mentioned netisr?
> >>
> >> Andreas
> >>
> >
>
>
>
More information about the svn-src-all
mailing list