svn commit: r238907 - projects/calloutng/sys/kern
John Baldwin
jhb at freebsd.org
Tue Sep 18 12:34:54 UTC 2012
On Tuesday, September 18, 2012 6:29:31 am David Chisnall wrote:
> On 18 Sep 2012, at 11:15, Dimitry Andric wrote:
>
> > Please use gcc's __sync_synchronize() builtin[1] instead, which is
> > specifically for this purpose. Clang also supports it.
> >
> > The builtin will emit actual memory barrier instructions, if the target
> > architecture supports it, otherwise it will emit the same asm statement
> > you show above. See contrib/gcc/builtins.c, around line 5584, function
> > expand_builtin_synchronize().
>
> From Attilio's description of the problem in IRC, I believe that
atomic_signal_fence() is the correct thing to use here. He stated that he
cares about reordering of memory access with regard to the current CPU, but
not with regard to other CPUs / threads. He also said that he only cares
about the compiler performing the reordering, not about the CPU, but I suspect
that is incorrect as there are numerous subtle bugs that can creep in on
weakly-ordered architectures (e.g. Alpha, ARMv8) if you only have a compiler
barrier.
Not true. Barriers only affect the order that writes are posted to external
viewers (e.g. other processors and devices that perform DMA). On a single
CPU barriers are completely meaningless. The types of barriers Attilio is
worried about are things like RAW and WAR hazards. CPUs generally handle
these things internally (except for ia64 and it's stop bits on bundles, but
the compiler is required to insert those to ensure that things still complete
in "program order").
Specifically, CPUs will only reorder instructions in a way that does not
violate a RAW or WAR hazard. Compilers have similar constraints (they can
move constants out of a loop because it is not a WAR). Given that I think
your last comment is largely bollocks. :)
> That said, this is likely to be incorrect, because it's very unusual for
that to actually be the requirement, especially in multithreaded code (where
the atomic.h stuff is actually important). In most of the cases where
__compiler_membar() is being used, you actually want at least a partial
barrier.
The specific cases where Attilio wants to use a pure compiler barrier
without a full atomic op (that will include appropriate barriers already)
are attempting to force the compiler to safely order actions that are
sensitive to preemption (e.g. ensuring that td_pinned and td_critnest are
properly set before a critical section is entered so that any interrupt
that occurs is guaranteed to "see" the result of sched_pin() or
critical_enter() before any protected accesses are performed, and similarly
to ensure that any protected accesses are completed before the weak "lock"
is released via sched_unpin() or critical_exit()). You can think of these
as WAR or RAW hazards that the compiler simply has no way of knowing about
(and can't). However, assuming the compiler is correct, there are no
WAR or RAW hazards that are not visible to the CPU.
This is actually very similar to signal handling in userland (signals are
basically interrupts for userland), so it may be that atomic_signal_fence() is
in fact be correct.
--
John Baldwin
More information about the svn-src-projects
mailing list