svn commit: r238755 - head/sys/x86/x86
Jung-uk Kim
jkim at FreeBSD.org
Wed Jul 25 17:27:50 UTC 2012
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 2012-07-25 10:44:04 -0400, Bruce Evans wrote:
> On Wed, 25 Jul 2012, Andriy Gapon wrote:
>
>> on 25/07/2012 13:21 Konstantin Belousov said the following:
>>> ... diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c index
>>> 085c339..229b351 100644 --- a/sys/x86/x86/tsc.c +++
>>> b/sys/x86/x86/tsc.c @@ -594,6 +594,7 @@ static u_int
>>> tsc_get_timecount(struct timecounter *tc __unused) {
>>>
>>> + rmb(); return (rdtsc32()); }
>>
>> This makes sense to me. We probably want correctness over
>> performance here. [BTW, I originally thought that the change was
>> here; brain malfunction]
>
> And I liked the original change because it wasn't here :-).
>
>>> @@ -602,8 +603,9 @@ tsc_get_timecount_low(struct timecounter
>>> *tc) { uint32_t rv;
>>>
>>> + rmb(); __asm __volatile("rdtsc; shrd %%cl, %%edx, %0" -
>>> : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); +
>>> : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); return
>>> (rv); }
>>>
>>
>> It would correct here too, but not sure if it would make any
>> difference given that some lower bits are discarded anyway.
>> Probably depends on exact CPU.
>
> It is needed to pessimize this too. :-)
>
> As I have complained before, the loss of resolution from the shift
> is easy to see by reading the time from userland, even with syscall
> overhead taking 10-20 times longer than the read. On core2 with
> TSC-low, a clock- checking utility gives:
>
> % min 481, max 12031, mean 530.589452, std 51.633626 % 1th: 550
> (1296487 observations) % 2th: 481 (448425 observations) % 3th: 482
> (142650 observations) % 4th: 549 (61945 observations) % 5th: 551
> (47619 observations)
>
> The numbers are diffences in nanoseconds measured by
> clock_gettime(). The jump from 481 to 549 is 68. From this I can
> tell that the clock frequency is 1.86 Ghz and the shift is 128, or
> the clock frequency is 3.72 Ghz and the shift is 256.
>
> On AthlonXP with TSC:
>
> % min 273, max 29075, mean 274.412811, std 80.425963 % 1th: 273
> (853962 observations) % 2th: 274 (745606 observations) % 3th: 275
> (400212 observations) % 4th: 276 (20 observations) % 5th: 280 (10
> observations)
>
> Now the numbers cluster about the mean. Although syscalls take
> much longer than the loss of resolution with TSC-low, and even the
> core2 TSC takes almost as long to read as the loss, it is still
> possible to see things happening at the limits of the resolution
> (~0.5 nsec).
>
>> And, oh hmm, I read AMD Software Optimization Guide for AMD
>> Family 10h Processors and they suggest using cpuid (with a note
>> that it may be intercepted in virtualized environments) or
>> _mfence_ in the discussed role (Appendix F of the document).
>> Googling for 'rdtsc mfence lfence' yields some interesting
>> results.
>
> The second hit was for the shrd pessimization/loss of resolution
> and a memory access hack in lkml in 2011. I now seem to remember
> jkim mentioning the memory access hack. rmb() on i386 has a
> related memory access hack, but now with a lock prefix that defeats
> the point of the 2011 hack (it wanted to save 5 nsec by removing
> fences). rmb() on amd64 uses lfence.
I believe I mentioned this thread at the time:
https://patchwork.kernel.org/patch/691712/
FYI, r238755 is essentially this commit for Linux:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=93ce99e849433ede4ce8b410b749dc0cad1100b2
> Some of the other hits are a bit old. The 8th one was by me in
> the thread about kib@ implementing gettimeofday() in userland.
Since we have gettimeofday() in userland, the above Linux thread is
more relevant now, I guess.
Jung-uk Kim
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAlAQLJQACgkQmlay1b9qnVMR8ACglzKrNWGeYJeqRhHQmna5stQQ
qM4AoKn4xdey8nglvdVm7UiQ1NZRr81E
=15v+
-----END PGP SIGNATURE-----
More information about the svn-src-head
mailing list