A question on possible A64 (Pine64+ 2GB) aarch64 blocked_lock misuse. . .

Thu Sep 14 03:45:39 UTC 2017

I've been trying to gather evidence for why
for some times head hangs up or panics on
Pine64+ 2GB's (and other A64's?) during:

taskqgroup_adjust_softirq(0)...

in the following contexts:

A) non-debug kernel build (no witness, no invariants): hang,
   possibly always (I've never seen a boot get past that
   point).

B) debug kernel build (witness and invariants): sometimes gets:

   panic: acquiring blockable sleep lock with spinlock or critical
   section held (sleep mutex) pmap @ /usr/src/sys/arm64/arm64/pmap.c:4710

C) debug kernel build (invariants but no witness): sometimes gets a
   kassert failure

Exploring this is appears that in all cases of explicitly
reported failure there is something like (witness example):

. . .
kassert_panic() at witness_checkorder+0x160
        pc = 0xffff0000003174e4  lr = 0xffff000000374990
        sp = 0xffff0000698503f0  fp = 0xffff000069850470

witness_checkorder() at __mtx_lock_flags+0xa8
        pc = 0xffff000000374990  lr = 0xffff0000002f8b7c
        sp = 0xffff000069850480  fp = 0xffff0000698504b0

__mtx_lock_flags() at pmap_fault+0x40
        pc = 0xffff0000002f8b7c  lr = 0xffff000000606994
        sp = 0xffff0000698504c0  fp = 0xffff0000698504e0

pmap_fault() at data_abort+0xb8
        pc = 0xffff000000606994  lr = 0xffff000000608a9c
        sp = 0xffff0000698504f0  fp = 0xffff0000698505a0

data_abort() at do_el1h_sync+0xfc
        pc = 0xffff000000608a9c  lr = 0xffff0000006088f0
        sp = 0xffff0000698505b0  fp = 0xffff0000698505e0
. . .

with the thread in question having the status of
"blocked lock" (so blocked_lock in use):

db> show thread 100058
Thread 100058 at 0xfffffd0001415a80:
 proc (pid 0): 0xffff000000c5db88
 name: softirq_1
 stack: 0xffff00006984d000-0xffff000069850fff
 flags: 0x4010004  pflags: 0x200000
 state: RUNQ
 priority: 24
 container lock: blocked lock (0xffff000000c73e30)
 last voluntary switch: 245 ms ago

The Question:

Should pmap_fault's lock activity be possible
while blocked_lock is in use for the thread's
container lock?

FYI:

The call chain leading to that status shows:

do_el1h_sync() at handle_el1h_sync+0x74
        pc = 0xffff0000006088f0  lr = 0xffff0000005f1874
        sp = 0xffff0000698505f0  fp = 0xffff000069850700

handle_el1h_sync() at sched_switch+0x2a8
        pc = 0xffff0000005f1874  lr = 0xffff00000033f0c8
        sp = 0xffff000069850710  fp = 0xffff0000698507f0

sched_switch() at mi_switch+0x1b8
        pc = 0xffff00000033f0c8  lr = 0xffff00000032161c
        sp = 0xffff000069850800  fp = 0xffff000069850820

mi_switch() at taskqgroup_binder+0x7c
        pc = 0xffff00000032161c  lr = 0xffff00000035510c
        sp = 0xffff000069850830  fp = 0xffff000069850860

taskqgroup_binder() at gtaskqueue_run_locked+0x104
        pc = 0xffff00000035510c  lr = 0xffff000000354f74
        sp = 0xffff000069850870  fp = 0xffff0000698508e0

gtaskqueue_run_locked() at gtaskqueue_thread_loop+0x9c
        pc = 0xffff000000354f74  lr = 0xffff000000354d10
        sp = 0xffff0000698508f0  fp = 0xffff000069850910

gtaskqueue_thread_loop() at fork_exit+0x7c
        pc = 0xffff000000354d10  lr = 0xffff0000002dbd3c
        sp = 0xffff000069850920  fp = 0xffff000069850950

fork_exit() at fork_trampoline+0x10
        pc = 0xffff0000002dbd3c  lr = 0xffff000000608664
        sp = 0xffff000069850960  fp = 0x0000000000000000

Apparently sched_switch did one of the last 2 cases of:

        if (TD_IS_IDLETHREAD(td)) {
                . . .
        } else if (TD_IS_RUNNING(td)) {
                MPASS(td->td_lock == TDQ_LOCKPTR(tdq));
                srqflag = preempted ?
                    SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED :
                    SRQ_OURSELF|SRQ_YIELDING;
#ifdef SMP
                if (THREAD_CAN_MIGRATE(td) && !THREAD_CAN_SCHED(td, ts->ts_cpu))
                        ts->ts_cpu = sched_pickcpu(td, 0);
#endif
                if (ts->ts_cpu == cpuid)
                        tdq_runq_add(tdq, td, srqflag);
                else {
                        KASSERT(THREAD_CAN_MIGRATE(td) ||
                            (ts->ts_flags & TSF_BOUND) != 0,
                            ("Thread %p shouldn't migrate", td));
                        mtx = sched_switch_migrate(tdq, td, srqflag);
                }
        } else {
                /* This thread must be going to sleep. */
                TDQ_LOCK(tdq);
                mtx = thread_lock_block(td);
                tdq_load_rem(tdq, td);
        }

where sched_switch_migrate also also does thread_lock_block :

static struct mtx *
sched_switch_migrate(struct tdq *tdq, struct thread *td, int flags)
{
        struct tdq *tdn;

        tdn = TDQ_CPU(td_get_sched(td)->ts_cpu);
#ifdef SMP
        tdq_load_rem(tdq, td);
        /*
         * Do the lock dance required to avoid LOR.  We grab an extra
         * spinlock nesting to prevent preemption while we're
         * not holding either run-queue lock.
         */
        spinlock_enter();
        thread_lock_block(td);  /* This releases the lock on tdq. */

        /*
         * Acquire both run-queue locks before placing the thread on the new
         * run-queue to avoid deadlocks created by placing a thread with a
         * blocked lock on the run-queue of a remote processor.  The deadlock
         * occurs when a third processor attempts to lock the two queues in
         * question while the target processor is spinning with its own
         * run-queue lock held while waiting for the blocked lock to clear.
         */
        tdq_lock_pair(tdn, tdq);
        tdq_add(tdn, td, flags);
        tdq_notify(tdn, td);
        TDQ_UNLOCK(tdn);
        spinlock_exit();
#endif
        return (TDQ_LOCKPTR(tdn));
}

(I have not checked for inlining so I allow for it
above.)

There have been past discussions such as:

https://lists.freebsd.org/pipermail/freebsd-arm/2016-January/013120.html

that have notes like (from before a fix to an inappropriate
indirection for blocked_lock that was later changed):

> > cpu_switch() already does what you describe though in a slightly
> different
> > way.  The thread_lock() of a thread being switched out is set to
> blocked_lock.
> > cpu_switch() on the new CPU will always spin until cpu_switch updates
> > thread_lock of the old thread to point to the proper runq lock after
> saving
> > its state in the pcb.  arm64 does this here:
> >
> >         /*
> >          * Release the old thread. This doesn't need to be a
> store-release
> >          * as the above dsb instruction will provide release semantics.
> >          */
> >         str     x2, [x0, #TD_LOCK]
> > #if defined(SCHED_ULE) && defined(SMP)
> >         /* Read the value in blocked_lock */
> >         ldr     x0, =_C_LABEL(blocked_lock)
> >         ldr     x2, [x0]
> > 1:
> >         ldar    x3, [x1, #TD_LOCK]
> >         cmp     x3, x2
> >         b.eq    1b
> > #endif
> >
> > Note the thread_lock_block() call just above the block you noted from
> > sched_switch_migrate() to see where td_lock is set to &blocked_lock.
> >
> > If the comment about 'dsb' above is wrong that might explain why you see
> > stale state in the PCB after seeing the new value of td_lock.
> >
> > --
> > John Baldwin

Unfortunately I've no hint what causes the race condition
debug kernel builds (invariants and possibly witness) get
that leads to the variable behavior.

===
Mark Millard
markmi at dsl-only.net