svn commit: r222866 - head/sys/x86/x86
Jung-uk Kim
jkim at FreeBSD.org
Mon Jun 20 23:41:16 UTC 2011
On Saturday 18 June 2011 08:05 am, Bruce Evans wrote:
> Long ago, On Wed, 8 Jun 2011, Jung-uk Kim wrote:
> > On Wednesday 08 June 2011 04:55 pm, Bruce Evans wrote:
> >> On Wed, 8 Jun 2011, Jung-uk Kim wrote:
> >>> Log:
> >>> Introduce low-resolution TSC timecounter "TSC-low". It
> >>> replaces the normal TSC timecounter if TSC frequency is higher
> >>> than ~4.29 MHz (or 2^32-1 Hz) or
> >>
> >> It should be a separate timecounter so that the user can choose
> >> it independently, at least in the SMP case where it is very low
> >> (at most ~4.29 GHz >> 8 ~= 17 MHz).
> >
> > As I noted in the log, it is still higher than the previous
> > default ACPI-fast, which is ~3.68 MHz and I've never heard of any
> > complaint about ACPI-fast being too low. ;-)
>
> That's because it is too low to measure itself being low :-).
>
> > Nothing prevents us from making a separate timecounter, though.
> > In fact, we can do the same for ACPI-fast/ACPI-safe. However,
> > that'll only confuse users, IMHO.
>
> TSC/TSC-low sort of corresponds to ACPI-fast/ACPI-safe. Users can
> switch between the latter.
How do we do that?
if (j == 10) {
acpi_timer_timecounter.tc_name = "ACPI-fast";
acpi_timer_timecounter.tc_get_timecount =
acpi_timer_get_timecount;
acpi_timer_timecounter.tc_quality = 900;
} else {
acpi_timer_timecounter.tc_name = "ACPI-safe";
acpi_timer_timecounter.tc_get_timecount =
acpi_timer_get_timecount_safe;
acpi_timer_timecounter.tc_quality = 850;
}
We didn't have any code to influence this selection as far as I
can remember.
> What they can't do is run both concurrently, either to compare them
> or use the best one that works in the current context. That would
> be more developers and is not implemented mainly because it has more
> complexity (only a tiny amount of extra overhead I think, provided
> you don't try to keep the 2 times coherent -- just an extra windup
> for each active timecounter).
>
> >>> static void tsc_levels_changed(void *arg, int unit);
> >>>
> >>> static struct timecounter tsc_timecounter = {
> >>> @@ -392,11 +393,19 @@ test_smp_tsc(void)
> >>> static void
> >>> init_TSC_tc(void)
> >>
> >> This seems to only be called once at boot time. So the lowness
> >> may be much lower than necessary if the levels are reduced
> >> significantly later.
> >
> > It'll only happen when the CPU is started at the highest
> > frequency and TSC is not invariant. In this case, its quality
> > will be set to 800 and HPET or ACPI timecounter will be selected
> > by default. I don't see much problem with the default choice
> > here.
>
> Can the CPU be started at a low frequency and throttled up later?
Yes, Intel mobile parts may do that.
> I agree that the non-invariant case is not very important.
Exactly.
> >>> {
> >>> + uint64_t max_freq;
> >>> + int shift;
> >>>
> >>> if ((cpu_feature & CPUID_TSC) == 0 || tsc_disabled)
> >>> return;
> >>>
> >>> /*
> >>> + * Limit timecounter frequency to fit in an int and prevent
> >>> it from + * overflowing too fast.
> >>> + */
> >>> + max_freq = UINT_MAX;
> >>> +
> >>> + /*
> >>> * We can not use the TSC if we support APM. Precise
> >>> timekeeping * on an APM'ed machine is at best a fools pursuit,
> >>> since * any and all of the time spent in various SMM code can't
> >>> @@ -418,13 +427,27 @@ init_TSC_tc(void)
> >>> * We can not use the TSC in SMP mode unless the TSCs on all
> >>> CPUs are * synchronized. If the user is sure that the system
> >>> has synchronized * TSCs, set kern.timecounter.smp_tsc tunable
> >>> to a non-zero value. + * We also limit the frequency even
> >>> lower to avoid "temporal anomalies" + * as much as possible.
> >>> */
> >>> - if (smp_cpus > 1)
> >>> + if (smp_cpus > 1) {
> >>> tsc_timecounter.tc_quality = test_smp_tsc();
> >>> + max_freq >>= 8;
> >>> + }
> >>
> >> This gives especially low lowness if the levels are reduced
> >> significantly. Maybe as low as 100 MHz >> 8 = ~390 KHz = lower
> >> than an i8254.
> >
> > I don't remember any SMP-capable x86 ever running at 100 MHz
> > unless it is seriously under-clocked. Even if it existed, it
> > won't be available today. :-P
>
> Doesn't throttling give underclocking?
T-state *usually* does not change CPU frequency itself. Only P-state
can change TSC frequency. However, some broken implementation *may*
just stop incrementing TSC in very low T-state (or C-state). AMD
does not have this problem for invariant TSCs. It seems Intel also
fixed it for recent processors. Nehalem or Sandy Bridge, I am not
sure, though.
> Maybe not as low as 100 MHz, but quite low. Only a possible problem
> for the non-invariant case anyway.
Agreed.
> >> OTOH, maybe the temporal anomalies scale with the TSC frequency,
> >> so you need to right shift by a few irrespective of the TSC
> >> frequency. A shift count of 8 seems too much, but if the initial
> >> TSC frequency is already < 2**32 shifted by 8, then the final
> >> shift is 0.
>
> This is my main point. How can it be right to reduce the extra
> shift for SMP (if this shift is needed at all) just because the
> initial TSC frequency is low? All instructions are clocked, so
> non-temporalness within a core scales with the current frequency.
> Oops, this leads back to my previous point that the scaling should
> depend on the current frequency and not just on the initial
> frequency. Across cores, it isn't so clear what the
> non-temporalness scales with. The non-temporalness is FUD so its
> scaling could be anything :-).
My questions to you:
a) Why do we care TSC timecounter when it is not invariant where we
*know* it is unusable and set to negative quality?
b) Why do we complicate code when invariant frequency == current
frequency == initial frequency?
> >> ...
> >> Perhaps the levels can also be increased significantly later.
> >> Then the timecounter frequency may exceed 4.29 GHz despite its
> >> scaling.
> >
> > Again, it can only happen when the CPU was started at low
> > frequency and the TSC is not invariant. For that case, TSC won't
> > be selected by default unless both HPET and ACPI timers are
> > disabled/unavailable.
>
> But users can select it, and since user's can't control the scaling
> or even select between TSC/TSC-low, TSC-low must be scaled properly
> initially to have the best chance of working later.
Maybe we should not allow users to select negative-quality timecounter
in the first place. Or maybe we should print scary warning messages
if they try foot-shooting. Sigh...
> >>> @@ -520,8 +545,15 @@ SYSCTL_PROC(_machdep, OID_AUTO, tsc_freq
> >>> 0, 0, sysctl_machdep_tsc_freq, "QU", "Time Stamp Counter
> >>> frequency");
> >>>
> >>> static u_int
> >>> -tsc_get_timecount(struct timecounter *tc)
> >>> +tsc_get_timecount(struct timecounter *tc __unused)
> >>> {
> >>>
> >>> return (rdtsc32());
> >>> }
> >>> +
> >>> +static u_int
> >>> +tsc_get_timecount_lowres(struct timecounter *tc)
> >>> +{
> >>> +
> >>> + return (rdtsc() >> (int)(intptr_t)tc->tc_priv);
> >>
> >> This forces a slow 64-bit shift (shrdl; shrl) in all cases.
> >
> > Yes, it does, unfortunately.
> >
> > I have no clue why AMD didn't implement native 64-bit RDTSC (and
> > RDMSR/WRMSR) in the first place. :-(
>
> I didn't notice before that it still goes to a register pair on
> amd64.
>
> >> rdtsc32() with a scaled tc_counter_mask should work OK
> >> (essentially the same as the non-low timecounter except for
> >> reduced accuracy; the only loss is an decrease in the time until
> >> counter overflow to the same as for the non-low timecounter).
> >
> > I thought about that but I didn't like that idea, i.e., losing
> > resolution and accuracy at the same time.
>
> But it doesn't lose any more resolution or accuracy than any shift
> necessarily uses. It only loses wrap time, which is of no interest
> for a small reduction. See another reply.
>
> The shift of 8 for SMP still seems far too much. clock_gettime()
> with a TSC timecounter on an old 2GHz system takes about 250 nS. I
> hope it takes only 1/2 that on a newer system. nanouptime() in the
> kernel takes more like 30 nS on the old system. It should at least
> try to have enough resulution for sequential calls to it to never
> return the same time (even ACPI-fast has this property -- about
> 1000 nS per call and a resolution of about 250 nS). rdtsc on old
> Athlons takes only 12 (9?) cycles so you could almost use it to
> time individual instructions (modulo out of order execution). THe
> invariant versions have to be much slower for synchronization :-(.
> They take at least 42 cycles AFAIR. A shift count of 5 would lose
> less resolution than an invariant TSC really has so it would be
> good if it is enough to hide the nontemporalness. A shift count of
> 6 would be OK too. But a shift count of 8 lets you execute about 4
> nanouptime()'s for every change in the time returned. OTOH, 256
> cycles at 4 GHz is about 64 uS and clock_gettime() unfortunately
> takes longer (except on Linux? :-(), so a shift count of 8 is OK
> for it.
>
> My clock measurement program (mostly an old program by Wollman)
> shows the following histogram of times for a non-invariant TSC
> timecounter on a 2GHz UP system:
>
> % min 273, max 265102, mean 273.998217, std 79.069534
> % 1th: 273 (1727219 observations)
> % 2th: 274 (265607 observations)
> % 3th: 275 (6984 observations)
> % 4th: 280 (11 observations)
> % 5th: 290 (8 observations)
>
> The variance is small, and differences of a single nS can be seen
> clearly. With the SMP shift of 8 on a 4GHz system, the minimum
> difference would be 64 nS so it would be impossible to see the
> details of the distribution about the mean of 273.998 nS.
Thanks for the info,
Jung-uk Kim
More information about the svn-src-all
mailing list