head -r344018 based powerpc64 pmac_thermal hangup (stuck sleeping): some preliminary evidence [not as uniform as I initially saw]
Mark Millard
marklmi at yahoo.com
Mon Feb 25 23:50:03 UTC 2019
[I've now seen examples of sbt==0xfffffed8 that did not lead to
a hangup.]
On 2019-Feb-25, at 14:46, Mark Millard <marklmi at yahoo.com> wrote:
> I adjusted what KTR_PROC does to just show just some of its pid 26
> (pmac_thermal) messages and adding some extra output as well. I'll
> list some of that output later. I'll note that beyond pmac_thermal
> the buf*daemon* threads also seem to be subject to being stuck
> sleeping in (offsets are for a specific build of mine):
>
> mi_switch+0x134 sleepq_switch+0x2ec sleepq_timedwait+0x48 _sleep+0x41c
>
>
> So far for pmac_thermal I've seen that until the failing case:
>
> sleepq_set_timeout_sbt was being given: sbt==0xcccccbe0 pr==0x0 flags==0x100
> and in turn was using: prec==0xcccccbe flags==0x501 (of course the used
> td_sleeptimo varies).
>
> [I note that 16*0xcccccbe == 0xcccccbe0, the original sbt value,
> not that I know yet if this matters.]
>
> But the sequence leading to failures is different:
I've now seen examples of sbt==0xfffffed8 that did not lead to
a hangup. So it is not a reliable predictor of the hang-up in
sleep. I'm trying to see if I can observe a failure with
different value.
> sleepq_set_timeout_sbt was given: sbt==0xfffffed8 pr==0x0 flags==0x100
> and in turn was using: prec==0xfffffed flags==0x501
>
> [I note that 16*0xfffffed == 0xfffffed0, so less than the original
> sbt value, not that I know this matters at this point.]
>
> For sbt==0xfffffed8, the callout to sleepq_timeout ends up with values
> like (a particular example):
>
> td_sleeptimo=0x470d360fe5 sbinuptime=0x46c869f6aa
>
> where the reporting code looks like:
>
> static void
> sleepq_timeout(void *arg)
> {
> struct sleepqueue_chain *sc __unused;
> struct sleepqueue *sq;
> struct thread *td;
> void *wchan;
> int wakeup_swapper;
> sbintime_t sbut; // HACK!!!
>
> td = arg;
> wakeup_swapper = 0;
> if (26 == td->td_proc->p_pid) // HACK!!!
> CTR3(KTR_PROC, "sleepq_timeout: thread %p (pid %ld, %s)",
> (void *)td, (long)td->td_proc->p_pid, (void *)td->td_name);
>
> thread_lock(td);
>
> sbut= sbinuptime(); // HACK!!!
> if (td->td_sleeptimo > sbut || td->td_sleeptimo == 0) {
> /*
> * The thread does not want a timeout (yet).
> */
> if (26 == td->td_proc->p_pid) // HACK!!!
> CTR5(KTR_PROC, "sleepq_timeout thread not want timeout yet: thread %p (pid %ld, %s) td_sleeptimo=%jx sbinuptime=%jx",
> (void *)td, (long)td->td_proc->p_pid, (void *)td->td_name, (uintmax_t)td->td_sleeptimo, (uintmax_t)sbut);
> . . .
>
> So far sleepq_set_timeout_sbt being given sbt==0xfffffed8 instead of
> sbt==0xcccccbe0 seems to be an accurate indicator of if the problem will
> happen in sleepq_timeout. (But I've only a few examples so far.)
>
I've now seen examples of sbt==0xfffffed8 that did not lead to
a hangup. So it is not a reliable predictor of the hang-up in
sleep.
It looks like the values are sometimes more varied than I'd
seen before as well.
> I'll note that the sleepq_timeout code for this case does not set up
> another callout to itself for later and the sleep then continues
> indefinitely.
>
> I've not yet gotten into finding evidence for why the callout to
> sleepq_timeout itself happens. Hopefully I can find some.
>
>
> An example of some modified KTR_PROC output is:
>
> . . .
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-ppc
mailing list