A question on possible A64 (Pine64+ 2GB) aarch64 blocked_lock misuse. . .
Mark Millard
markmi at dsl-only.net
Thu Sep 14 03:45:39 UTC 2017
I've been trying to gather evidence for why
for some times head hangs up or panics on
Pine64+ 2GB's (and other A64's?) during:
taskqgroup_adjust_softirq(0)...
in the following contexts:
A) non-debug kernel build (no witness, no invariants): hang,
possibly always (I've never seen a boot get past that
point).
B) debug kernel build (witness and invariants): sometimes gets:
panic: acquiring blockable sleep lock with spinlock or critical
section held (sleep mutex) pmap @ /usr/src/sys/arm64/arm64/pmap.c:4710
C) debug kernel build (invariants but no witness): sometimes gets a
kassert failure
Exploring this is appears that in all cases of explicitly
reported failure there is something like (witness example):
. . .
kassert_panic() at witness_checkorder+0x160
pc = 0xffff0000003174e4 lr = 0xffff000000374990
sp = 0xffff0000698503f0 fp = 0xffff000069850470
witness_checkorder() at __mtx_lock_flags+0xa8
pc = 0xffff000000374990 lr = 0xffff0000002f8b7c
sp = 0xffff000069850480 fp = 0xffff0000698504b0
__mtx_lock_flags() at pmap_fault+0x40
pc = 0xffff0000002f8b7c lr = 0xffff000000606994
sp = 0xffff0000698504c0 fp = 0xffff0000698504e0
pmap_fault() at data_abort+0xb8
pc = 0xffff000000606994 lr = 0xffff000000608a9c
sp = 0xffff0000698504f0 fp = 0xffff0000698505a0
data_abort() at do_el1h_sync+0xfc
pc = 0xffff000000608a9c lr = 0xffff0000006088f0
sp = 0xffff0000698505b0 fp = 0xffff0000698505e0
. . .
with the thread in question having the status of
"blocked lock" (so blocked_lock in use):
db> show thread 100058
Thread 100058 at 0xfffffd0001415a80:
proc (pid 0): 0xffff000000c5db88
name: softirq_1
stack: 0xffff00006984d000-0xffff000069850fff
flags: 0x4010004 pflags: 0x200000
state: RUNQ
priority: 24
container lock: blocked lock (0xffff000000c73e30)
last voluntary switch: 245 ms ago
The Question:
Should pmap_fault's lock activity be possible
while blocked_lock is in use for the thread's
container lock?
FYI:
The call chain leading to that status shows:
do_el1h_sync() at handle_el1h_sync+0x74
pc = 0xffff0000006088f0 lr = 0xffff0000005f1874
sp = 0xffff0000698505f0 fp = 0xffff000069850700
handle_el1h_sync() at sched_switch+0x2a8
pc = 0xffff0000005f1874 lr = 0xffff00000033f0c8
sp = 0xffff000069850710 fp = 0xffff0000698507f0
sched_switch() at mi_switch+0x1b8
pc = 0xffff00000033f0c8 lr = 0xffff00000032161c
sp = 0xffff000069850800 fp = 0xffff000069850820
mi_switch() at taskqgroup_binder+0x7c
pc = 0xffff00000032161c lr = 0xffff00000035510c
sp = 0xffff000069850830 fp = 0xffff000069850860
taskqgroup_binder() at gtaskqueue_run_locked+0x104
pc = 0xffff00000035510c lr = 0xffff000000354f74
sp = 0xffff000069850870 fp = 0xffff0000698508e0
gtaskqueue_run_locked() at gtaskqueue_thread_loop+0x9c
pc = 0xffff000000354f74 lr = 0xffff000000354d10
sp = 0xffff0000698508f0 fp = 0xffff000069850910
gtaskqueue_thread_loop() at fork_exit+0x7c
pc = 0xffff000000354d10 lr = 0xffff0000002dbd3c
sp = 0xffff000069850920 fp = 0xffff000069850950
fork_exit() at fork_trampoline+0x10
pc = 0xffff0000002dbd3c lr = 0xffff000000608664
sp = 0xffff000069850960 fp = 0x0000000000000000
Apparently sched_switch did one of the last 2 cases of:
if (TD_IS_IDLETHREAD(td)) {
. . .
} else if (TD_IS_RUNNING(td)) {
MPASS(td->td_lock == TDQ_LOCKPTR(tdq));
srqflag = preempted ?
SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED :
SRQ_OURSELF|SRQ_YIELDING;
#ifdef SMP
if (THREAD_CAN_MIGRATE(td) && !THREAD_CAN_SCHED(td, ts->ts_cpu))
ts->ts_cpu = sched_pickcpu(td, 0);
#endif
if (ts->ts_cpu == cpuid)
tdq_runq_add(tdq, td, srqflag);
else {
KASSERT(THREAD_CAN_MIGRATE(td) ||
(ts->ts_flags & TSF_BOUND) != 0,
("Thread %p shouldn't migrate", td));
mtx = sched_switch_migrate(tdq, td, srqflag);
}
} else {
/* This thread must be going to sleep. */
TDQ_LOCK(tdq);
mtx = thread_lock_block(td);
tdq_load_rem(tdq, td);
}
where sched_switch_migrate also also does thread_lock_block :
static struct mtx *
sched_switch_migrate(struct tdq *tdq, struct thread *td, int flags)
{
struct tdq *tdn;
tdn = TDQ_CPU(td_get_sched(td)->ts_cpu);
#ifdef SMP
tdq_load_rem(tdq, td);
/*
* Do the lock dance required to avoid LOR. We grab an extra
* spinlock nesting to prevent preemption while we're
* not holding either run-queue lock.
*/
spinlock_enter();
thread_lock_block(td); /* This releases the lock on tdq. */
/*
* Acquire both run-queue locks before placing the thread on the new
* run-queue to avoid deadlocks created by placing a thread with a
* blocked lock on the run-queue of a remote processor. The deadlock
* occurs when a third processor attempts to lock the two queues in
* question while the target processor is spinning with its own
* run-queue lock held while waiting for the blocked lock to clear.
*/
tdq_lock_pair(tdn, tdq);
tdq_add(tdn, td, flags);
tdq_notify(tdn, td);
TDQ_UNLOCK(tdn);
spinlock_exit();
#endif
return (TDQ_LOCKPTR(tdn));
}
(I have not checked for inlining so I allow for it
above.)
There have been past discussions such as:
https://lists.freebsd.org/pipermail/freebsd-arm/2016-January/013120.html
that have notes like (from before a fix to an inappropriate
indirection for blocked_lock that was later changed):
> > cpu_switch() already does what you describe though in a slightly
> different
> > way. The thread_lock() of a thread being switched out is set to
> blocked_lock.
> > cpu_switch() on the new CPU will always spin until cpu_switch updates
> > thread_lock of the old thread to point to the proper runq lock after
> saving
> > its state in the pcb. arm64 does this here:
> >
> > /*
> > * Release the old thread. This doesn't need to be a
> store-release
> > * as the above dsb instruction will provide release semantics.
> > */
> > str x2, [x0, #TD_LOCK]
> > #if defined(SCHED_ULE) && defined(SMP)
> > /* Read the value in blocked_lock */
> > ldr x0, =_C_LABEL(blocked_lock)
> > ldr x2, [x0]
> > 1:
> > ldar x3, [x1, #TD_LOCK]
> > cmp x3, x2
> > b.eq 1b
> > #endif
> >
> > Note the thread_lock_block() call just above the block you noted from
> > sched_switch_migrate() to see where td_lock is set to &blocked_lock.
> >
> > If the comment about 'dsb' above is wrong that might explain why you see
> > stale state in the PCB after seeing the new value of td_lock.
> >
> > --
> > John Baldwin
Unfortunately I've no hint what causes the race condition
debug kernel builds (invariants and possibly witness) get
that leads to the variable behavior.
===
Mark Millard
markmi at dsl-only.net
More information about the freebsd-hackers
mailing list