svn commit: r238907 - projects/calloutng/sys/kern

Tue Sep 18 12:34:54 UTC 2012

On Tuesday, September 18, 2012 6:29:31 am David Chisnall wrote:
> On 18 Sep 2012, at 11:15, Dimitry Andric wrote:
> 
> > Please use gcc's __sync_synchronize() builtin[1] instead, which is
> > specifically for this purpose.  Clang also supports it.
> > 
> > The builtin will emit actual memory barrier instructions, if the target
> > architecture supports it, otherwise it will emit the same asm statement
> > you show above.  See contrib/gcc/builtins.c, around line 5584, function
> > expand_builtin_synchronize().
> 
> From Attilio's description of the problem in IRC, I believe that 
atomic_signal_fence() is the correct thing to use here.  He stated that he 
cares about reordering of memory access with regard to the current CPU, but 
not with regard to other CPUs / threads.  He also said that he only cares 
about the compiler performing the reordering, not about the CPU, but I suspect 
that is incorrect as there are numerous subtle bugs that can creep in on 
weakly-ordered architectures (e.g. Alpha, ARMv8) if you only have a compiler 
barrier.

Not true.  Barriers only affect the order that writes are posted to external
viewers (e.g. other processors and devices that perform DMA).  On a single
CPU barriers are completely meaningless.  The types of barriers Attilio is
worried about are things like RAW and WAR hazards.  CPUs generally handle
these things internally (except for ia64 and it's stop bits on bundles, but
the compiler is required to insert those to ensure that things still complete
in "program order").

Specifically, CPUs will only reorder instructions in a way that does not 
violate a RAW or WAR hazard.  Compilers have similar constraints (they can 
move constants out of a loop because it is not a WAR).  Given that I think 
your last comment is largely bollocks. :)

> That said, this is likely to be incorrect, because it's very unusual for 
that to actually be the requirement, especially in multithreaded code (where 
the atomic.h stuff is actually important).  In most of the cases where 
__compiler_membar() is being used, you actually want at least a partial 
barrier.  

The specific cases where Attilio wants to use a pure compiler barrier
without a full atomic op (that will include appropriate barriers already)
are attempting to force the compiler to safely order actions that are
sensitive to preemption (e.g. ensuring that td_pinned and td_critnest are
properly set before a critical section is entered so that any interrupt
that occurs is guaranteed to "see" the result of sched_pin() or 
critical_enter() before any protected accesses are performed, and similarly
to ensure that any protected accesses are completed before the weak "lock"
is released via sched_unpin() or critical_exit()).  You can think of these
as WAR or RAW hazards that the compiler simply has no way of knowing about
(and can't).  However, assuming the compiler is correct, there are no
WAR or RAW hazards that are not visible to the CPU.

This is actually very similar to signal handling in userland (signals are
basically interrupts for userland), so it may be that atomic_signal_fence() is 
in fact be correct.

-- 
John Baldwin