Possible instruction pipelining problem between HT's on the same
die ?
Matthew Dillon
dillon at apollo.backplane.com
Fri Jun 3 20:57:25 GMT 2005
I've been tracking down a crash one of our users gets occassionally.
He has a quad Intel(R) XEON(TM) CPU 2.00GHz (1996.61-MHz 686-class CPU)
system.
After getting a few of these crashes he pulled three of the four cpus
out. But with just one physical cpu, with HTT turned on (so two
logical cpus), he is still getting these crashes.
This is the sequence that causes the bad data:
cpu #0 write A
write B
(HT)cpu #1 read B
if (B)
read A <---- gets OLD data in A, not new data
Now I was depending on the presumed write ordering, so if a foreign
cpu sees that B is updated it can assume that A has also been updated.
But I'm beginning to think that it isn't working as advertised. I've
read the manuals over and over again and they seem to only guarentee
write ordering between physical cpus, not between logical HT cpus, and
even then it appears that a cpu can do a speculative read and
thus get an old value for A even after getting a new value for B.
I looked at the various SFENCE/LFENCE/MFENCE instructions and they
do not seem to guarentee ordering for speculative accesses at all.
They all say that they do not protect against speculative reads.
Bus-locked instructions don't seem to avoid speculative reads either.
I'm even more confused because this bug is occuring between two logical
cpus on the same physical die. Is write ordering not guarenteed with
respect to the other logical cpu? Can one logical cpu prefetch data
early then then becomes obsolete by the time the instruction is actually
run? Or perhaps its a pipeline bug... I just don't know. But it's
damn annoying.
The only solution I see is to use an actual serializing instruction
like cpuid. I really do not want to have to use cpuid :-(.
So, has anyone seen anything similar?
-Matt
More information about the freebsd-hackers
mailing list