powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
Mark Millard
marklmi at yahoo.com
Thu Mar 14 21:06:08 UTC 2019
On 2019-Mar-14, at 12:39, Konstantin Belousov <kostikbel at gmail.com> wrote:
> On Thu, Mar 07, 2019 at 05:29:51PM -0800, Mark Millard wrote:
>> A basic question and a small note.
>>
>> Question's context for it tc->tc_get_timecount(tc) values:
>>
>> In the powerpc64 context tc->tc_get_timecount(tc) is the lower
>> 32 bits of the tbr, in my context having a 33,333,333 MHz or so
>> increment rate for a machine with a 2.5 GHz or so clock rate.
>> The truncated 32 bit tbr value wraps every 128 seconds or so.
>> 2 sockets, 2 cores per socket, so 4 separate tbr values.
>>
>> The question is . . .
>>
>> In tc_delta's:
>>
>> tc->tc_get_timecount(tc) - th->th_offset_count
>>
>> is observing tc->tc_get_timecount(tc) < th->th_offset_count
>> ever supposed to be possible in correct operation, other than
>> tc->tc_get_timecount(tc) having wrapped around (and so being
>> newly 0 or "near" 0, no evidence of of having it having been
>> near 128 seconds or more for my context)?
> I think yes, there is no reason for current get_timecount() value
> to have any arithmetic relation to th_offset_count. Look at tc_windup()
> on how the th_offset_count is calculated. The final value is clamped
> by the tc_counter_mask, so only lower bits are important (higher bits
> are evacuated to th_offset or lost due to overflow if tc_windup()
> was not called soon enough).
>
Okay. Thanks.
Just FYI:
I asked because in my powerpc64 context I was seeing
(in sleepq_timeout) td->td_sleeptimo > sbinuptime() in:
if (td->td_sleeptimo > sbinuptime() || td->td_sleeptimo == 0) {
/*
* The thread does not want a timeout (yet).
*/
and without such sleeps being rescheduled in that case, those sleeps
hang up. My hack to temporarily enable useful operation was to
have binuptime avoid tc->tc_get_timecount(tc) < th->th_offset_count
for small enough differences, as shown below:
. . .
do {
do { // HACK!!!
th= timehands;
tc= th->th_counter;
gen= atomic_load_acq_int(&th->th_generation);
tim_cnt= tc->tc_get_timecount(tc);
tim_offset= th->th_offset_count;
tim_wrong_order_diff= tim_offset-tim_cnt;
} while (tim_cnt<tim_offset && tim_wrong_order_diff<wrong_order_diff_proper_upper_bound); // HACK!!!
*bt = th->th_offset;
. . .
where I experimentally came up with the following for the specific PowerMac G5 context:
u_int const wrong_order_diff_proper_upper_bound= 0x14u; // 0x11 is max observed diff so far HACK!!!
I've not hand any hung-up sleeps after that change. Despite being a hack,
this gives evidence that tc->tc_get_timecount(tc) < th->th_offset_count
for small enough differences (in binuptime) is involved in the hangups
in some essential way for the PowerMac G5 context.
I look forward to removing this hack at some point, when things just
work for this 2 socket, 2 cores per socket powerpc64 context. But
for now the hack is locally useful.
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-ppc
mailing list