head -r344018 based powerpc64 pmac_thermal hangup (stuck sleeping): some preliminary evidence
Mark Millard
marklmi at yahoo.com
Mon Feb 25 22:47:01 UTC 2019
I adjusted what KTR_PROC does to just show just some of its pid 26
(pmac_thermal) messages and adding some extra output as well. I'll
list some of that output later. I'll note that beyond pmac_thermal
the buf*daemon* threads also seem to be subject to being stuck
sleeping in (offsets are for a specific build of mine):
mi_switch+0x134 sleepq_switch+0x2ec sleepq_timedwait+0x48 _sleep+0x41c
So far for pmac_thermal I've seen that until the failing case:
sleepq_set_timeout_sbt was being given: sbt==0xcccccbe0 pr==0x0 flags==0x100
and in turn was using: prec==0xcccccbe flags==0x501 (of course the used
td_sleeptimo varies).
[I note that 16*0xcccccbe == 0xcccccbe0, the original sbt value,
not that I know yet if this matters.]
But the sequence leading to failures is different:
sleepq_set_timeout_sbt was given: sbt==0xfffffed8 pr==0x0 flags==0x100
and in turn was using: prec==0xfffffed flags==0x501
[I note that 16*0xfffffed == 0xfffffed0, so less than the original
sbt value, not that I know this matters at this point.]
For sbt==0xfffffed8, the callout to sleepq_timeout ends up with values
like (a particular example):
td_sleeptimo=0x470d360fe5 sbinuptime=0x46c869f6aa
where the reporting code looks like:
static void
sleepq_timeout(void *arg)
{
struct sleepqueue_chain *sc __unused;
struct sleepqueue *sq;
struct thread *td;
void *wchan;
int wakeup_swapper;
sbintime_t sbut; // HACK!!!
td = arg;
wakeup_swapper = 0;
if (26 == td->td_proc->p_pid) // HACK!!!
CTR3(KTR_PROC, "sleepq_timeout: thread %p (pid %ld, %s)",
(void *)td, (long)td->td_proc->p_pid, (void *)td->td_name);
thread_lock(td);
sbut= sbinuptime(); // HACK!!!
if (td->td_sleeptimo > sbut || td->td_sleeptimo == 0) {
/*
* The thread does not want a timeout (yet).
*/
if (26 == td->td_proc->p_pid) // HACK!!!
CTR5(KTR_PROC, "sleepq_timeout thread not want timeout yet: thread %p (pid %ld, %s) td_sleeptimo=%jx sbinuptime=%jx",
(void *)td, (long)td->td_proc->p_pid, (void *)td->td_name, (uintmax_t)td->td_sleeptimo, (uintmax_t)sbut);
. . .
So far sleepq_set_timeout_sbt being given sbt==0xfffffed8 instead of
sbt==0xcccccbe0 seems to be an accurate indicator of if the problem will
happen in sleepq_timeout. (But I've only a few examples so far.)
I'll note that the sleepq_timeout code for this case does not set up
another callout to itself for later and the sleep then continues
indefinitely.
I've not yet gotten into finding evidence for why the callout to
sleepq_timeout itself happens. Hopefully I can find some.
An example of some modified KTR_PROC output is:
493 (0xc000000001eb45a0:cpu3) 3278339934 /usr/src/sys/kern/subr_sleepqueue.c.1026: sleepq_timeout thread not want timeout yet: thread 0xc00000000a8315a0 (pid 26, pmac_thermal) td_sleeptimo=470d360fe5 sbinuptime=46c869f6aa
492 (0xc000000001eb45a0:cpu3) 3278339919 /usr/src/sys/kern/subr_sleepqueue.c.1015: sleepq_timeout: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
491 (0xc00000000a8315a0:cpu3) 3253979006 /usr/src/sys/kern/kern_synch.c.435: mi_switch: old thread 100078 (td_sched 0xc00000000a831ad8, pid 26, pmac_thermal)
490 (0xc00000000a8315a0:cpu3) 3253978972 /usr/src/sys/kern/subr_sleepqueue.c.408: sleepq_set_timeout_sbt: sbt=fffffed8 pr=0 flags=100 td_sleeptimo=470d360fe5 prec=fffffed flags=501
489 (0xc00000000a8315a0:cpu3) 3253978965 /usr/src/sys/kern/subr_sleepqueue.c.406: sleepq_set_timeout_sbt installing sleepq_timeout: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
488 (0xc00000000a8315a0:cpu3) 3253978940 /usr/src/sys/kern/kern_synch.c.180: sleep: thread 100078 (pid 26, pmac_thermal) on pmac_therm (0xc0000000015082c3)
487 (0xc00000000a8315a0:cpu3) 3253978890 /usr/src/sys/kern/subr_sleepqueue.c.630: sleepq resume: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
486 (0xc00000000a8315a0:cpu3) 3253978883 /usr/src/sys/kern/kern_synch.c.445: mi_switch: new thread 100078 (td_sched 0xc00000000a831ad8, pid 26, pmac_thermal)
485 (0xc000000001f6e000:cpu2) 3253978598 /usr/src/sys/kern/subr_sleepqueue.c.836: sleepq_wakeup: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
484 (0xc000000001f6e000:cpu2) 3253978580 /usr/src/sys/kern/subr_sleepqueue.c.988: sleepq_remove_matching calling sleepq_resume_thread: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
483 (0xc00000000a8315a0:cpu3) 3252269011 /usr/src/sys/kern/kern_synch.c.435: mi_switch: old thread 100078 (td_sched 0xc00000000a831ad8, pid 26, pmac_thermal)
482 (0xc00000000a8315a0:cpu3) 3252268968 /usr/src/sys/kern/subr_sleepqueue.c.408: sleepq_set_timeout_sbt: sbt=cccccbe0 pr=0 flags=100 td_sleeptimo=46ccf486f8 prec=cccccbe flags=501
481 (0xc00000000a8315a0:cpu3) 3252268962 /usr/src/sys/kern/subr_sleepqueue.c.406: sleepq_set_timeout_sbt installing sleepq_timeout: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
480 (0xc00000000a8315a0:cpu3) 3252268935 /usr/src/sys/kern/kern_synch.c.180: sleep: thread 100078 (pid 26, pmac_thermal) on smu (0xe000000087fd1670)
479 (0xc00000000a8315a0:cpu3) 3252268793 /usr/src/sys/kern/subr_sleepqueue.c.630: sleepq resume: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
478 (0xc00000000a8315a0:cpu3) 3252268778 /usr/src/sys/kern/kern_synch.c.445: mi_switch: new thread 100078 (td_sched 0xc00000000a831ad8, pid 26, pmac_thermal)
477 (0xc000000001f6e000:cpu2) 3252268391 /usr/src/sys/kern/subr_sleepqueue.c.836: sleepq_wakeup: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
476 (0xc000000001f6e000:cpu2) 3252268385 /usr/src/sys/kern/subr_sleepqueue.c.988: sleepq_remove_matching calling sleepq_resume_thread: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
475 (0xc00000000a8315a0:cpu2) 3250546514 /usr/src/sys/kern/kern_synch.c.435: mi_switch: old thread 100078 (td_sched 0xc00000000a831ad8, pid 26, pmac_thermal)
474 (0xc00000000a8315a0:cpu2) 3250546468 /usr/src/sys/kern/subr_sleepqueue.c.408: sleepq_set_timeout_sbt: sbt=cccccbe0 pr=0 flags=100 td_sleeptimo=46bfa4a7cc prec=cccccbe flags=501
473 (0xc00000000a8315a0:cpu2) 3250546462 /usr/src/sys/kern/subr_sleepqueue.c.406: sleepq_set_timeout_sbt installing sleepq_timeout: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
472 (0xc00000000a8315a0:cpu2) 3250546436 /usr/src/sys/kern/kern_synch.c.180: sleep: thread 100078 (pid 26, pmac_thermal) on smu (0xe000000087fd1670)
471 (0xc00000000a8315a0:cpu2) 3250546295 /usr/src/sys/kern/subr_sleepqueue.c.630: sleepq resume: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
470 (0xc00000000a8315a0:cpu2) 3250546286 /usr/src/sys/kern/kern_synch.c.445: mi_switch: new thread 100078 (td_sched 0xc00000000a831ad8, pid 26, pmac_thermal)
469 (0xc000000001f6e000:cpu0) 3250545941 /usr/src/sys/kern/subr_sleepqueue.c.836: sleepq_wakeup: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
468 (0xc000000001f6e000:cpu0) 3250545934 /usr/src/sys/kern/subr_sleepqueue.c.988: sleepq_remove_matching calling sleepq_resume_thread: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
. . .
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-hackers
mailing list