powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
Bruce Evans
brde at optusnet.com.au
Fri Mar 1 20:38:33 UTC 2019
On Fri, 1 Mar 2019, Konstantin Belousov wrote:
> On Sat, Mar 02, 2019 at 05:40:58AM +1100, Bruce Evans wrote:
>> On Fri, 1 Mar 2019, Konstantin Belousov wrote:
>>
>>> On Thu, Feb 28, 2019 at 11:39:06PM -0800, Mark Millard wrote:
>>>> [The new, trial code also has truncation occurring.]
>>> In fact no, I do not think it is.
>>>
>>>> An example observation of diff_scaled having an overflowed
>>>> value was:
>>>>
>>>> scale_factor == 0x80da2067ac
>>>> scale_factor*freq overflows unsigned, 64 bit representation.
>>>> tim_offset == 0x3da0eaeb
>>>> tim_cnt == 0x42dea3c4
>>>> tim_diff == 0x53db8d9
>>>> For reference: 0x1fc9d43 == 0xffffffffffffffffull/scale_factor
>>>> scaled_diff == 0xA353A5BF3FF780CC (truncated to 64 bits)
>>>>
>>>> But for the new, trail code:
>>>>
>>>> 0x80da2067ac is 40 bits
>>>> 0x53db8d9 is 27 bits
>>>> So 67 bits, more than 63. Then:
>>>>
>>>> x
>>>> == (0x80da2067ac>>32) * 0x53db8d9
>>>> == 0x80 * 0x53db8d9
>>>> == 0x29EDC6C80
>>>>
>>>> x>>32
>>>> == 0x2
>>>>
>>>> x<<32
>>>> == 0x9EDC6C8000000000 (limited to 64 bits)
>>>> Note the truncation of: 0x29EDC6C8000000000.
>>> Right, this is how the patch is supposed to work. Note that the overflow
>>> bits 'lost' due to overflow of the left shift are the same bits that as
>>> used to increment bt->sec:
>>> bt->sec += x >> 32;
>>> So the 2 seconds are accounted for.
67 bits is 4-8 seconds, and it is an error for the adjustment to be >= 1
second.
>>>
>>>>
>>>> Thus the "bintime_addx(bt, x << 32)" is still
>>>> based on a truncated value.
>>>
>>> I must admit that 2 seconds of interval where the timehands where
>>> not updated is too much. This might be the real cause of all ppc
>>> troubles. I tried to see if the overflow case is possible on amd64,
>>> and did not get a single case of the '> 63' branch executed during the
>>> /usr/tests/lib/libc run.
>>
>> The algorithm requires the update interval to be less than 1 second.
>> th_scale is 2**64 / tc_frequency, so whatever tc_frequency is, after
>> 1 second the value of the multiplication is approximately 2**64 so
>> it overflows about then (depending on rounding).
>>
>> The most useful timecounters are TSC's, and these give another overflow
>> in tc_delta() after 1 second when their frequency is 4 GHz (except the
>> bogus TSC-low timecounter reduces the frequency to below 2 binary GHz,
>> so the usual case is overflow after 2 seconds).
> As I said, I was unable to trigger the overflow on amd64.
Yes, amd64 doesn't have the bug. It gets tested more with fast TSCs that
overflow after about 1 second for other reasons.
Try amd64 with an ACPI or HPET timecounter. I sometimes do this to avoid
the overflow in tc_delta() while stopped in ddb. This seemed to work.
Actually, it doesn't work since the overflow under discussion still occurs.
>* ...
>>> @@ -78,7 +81,14 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs)
>>> continue;
>>> if (error != 0)
>>> return (error);
>>> - bintime_addx(bt, th->th_scale * delta);
>>> + scale = th->th_scale;
>>> + if (__predict_false(fls(scale) + fls(delta) > 63)) {
>>
>> This is unnecessarily pessimal. Updates must be frequent enough to prevent
>> tc_delta() overflowing, and it is even easier to arrange that this
>> multiplication doesn't overflow (since the necessary update interval for
>> the latter is constant).
>>
>> `scale' is 64 bits, so fls(scale) is broken on 32-bit arches, and
>> flls(scale) is an especially large pessimization. I saw this on my
>> calcru1() fixes -- the flls()s take almost as long as long long
>> divisions when the use the pessimal C versions.
> Ok, fixed.
It is still very slow.
> Updated patch.
>
> diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c
> index 3749e0473af..fdefda08e39 100644
> --- a/lib/libc/sys/__vdso_gettimeofday.c
> +++ b/lib/libc/sys/__vdso_gettimeofday.c
> ...
> @@ -78,7 +81,19 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs)
> continue;
> if (error != 0)
> return (error);
> - bintime_addx(bt, th->th_scale * delta);
> + scale = th->th_scale;
> +#ifdef _LP64
> + scale_bits = ffsl(scale);
> +#else
> + scale_bits = ffsll(scale);
> +#endif
> + if (__predict_false(scale_bits + fls(delta) > 63)) {
Userland is still fully pessimized, except when the compiler auto-inlines
ffs*().
> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> index 2656fb4d22f..eedea5183c0 100644
> --- a/sys/kern/kern_tc.c
> +++ b/sys/kern/kern_tc.c
> @@ -355,13 +356,22 @@ void
> binuptime(struct bintime *bt)
> {
> struct timehands *th;
> - u_int gen;
> + uint64_t scale, x;
> + u_int delta, gen;
>
> do {
> th = timehands;
> gen = atomic_load_acq_int(&th->th_generation);
> *bt = th->th_offset;
> - bintime_addx(bt, th->th_scale * tc_delta(th));
> + scale = th->th_scale;
> + delta = tc_delta(th);
> + if (__predict_false(th->th_scale_bits + fls(delta) > 63)) {
Better, but shouldn't be changed (and the bug that causes the large intervals
remains unlocated), and if it is changed then it should use:
if (delta >= th->th_large_delta)
> @@ -1464,6 +1483,11 @@ tc_windup(struct bintime *new_boottimebin)
> scale += (th->th_adjustment / 1024) * 2199;
> scale /= th->th_counter->tc_frequency;
> th->th_scale = scale * 2;
> +#ifdef _LP64
> + th->th_scale_bits = ffsl(th->th_scale);
> +#else
> + th->th_scale_bits = ffsll(th->th_scale);
> +#endif
th->th_large_delta = ((uint64_t)1 << 63) / scale;
Bruce
More information about the freebsd-ppc
mailing list