Possible instruction pipelining problem between HT's on the
same die ?
Matthew Dillon
dillon at apollo.backplane.com
Fri Jun 3 22:47:27 GMT 2005
:This is normal behaviour.
:Take a look at IA-32 Intel Developers ... Vol 3,
:Section: 7.2.2 for details + solutions.
:
:Stephan
Ok.. that section seems to indicate that speculative reads
can pass writes, but it also says that the pipeline sniffs the address
within the processor and ensures proper ordering. The latter part
makes sense within the context of a single cpu, but the big question is:
Is that supposed to hold true for interactions with HT cpus (that share
the pipeline) as well? Or not ? It seems not.
Speculative reads creating out of order situations seems to be the
biggest issue. The AMD manual (Programmers manual volume 3 page
186, MFENCE instruction) says this:
"The MFENCE instruction is weakly-ordered with respect to data and
instruction prefetches. Speculative loads initiated by the processor,
or specified explicitly using cache-prefetch instructions, can be
reordered around an MFENCE".
This seems to be different then what the Intel manual says, and doesn't
make much sense. What's the point of having a fence instruction if it
can't guarentee read/write ordering? Is the AMD manual simply wrong ?
Other then that, the Intel manual does indicate that speculative reads
will not pass locked bus cycle instructions (the AMD manual says nothing
about that that I can see). So, presumably, doing a dummy locked bus
cycle operation on e.g. the top of the stack, such as Linux does, would
be sufficient to ensure read ordering. Would you concur with that
assessment?
What's really horrible here is that the 'old' value of the data being
used is modified at location A something like 30 instructions prior to
the instruction that updates the index (B). I think this is a
situation that can only occur in an HT configuration, and then only if
the speculative read issued by the HT cpu is being held for across
30 instructions executed by the primary cpu before the HT cpu issues the
read of B.
cpu #0 cpu #1 (HT cpu on same die as cpu #0)
speculatively read A
write A (stalled)
[30 instructions] (stalled x 30)
write B (stalled)
read B
see that B has been updated
read A (get old value for A instead of new)
Is that even possible ? Not only the 30 instruction latency, but also
the fact that even with the shared pipeline you have a speculative read
on the HT cpu surviving 30 instructions running on cpu #0 (but only one
or two on the HT cpu)... even though they share the same pipeline.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the freebsd-hackers
mailing list