970MP PowerMac G5s: What printf's show about cpu_mp_unleash hangups on the test variant of head -r347003 (found cpu_mp_unleash counterexample)
Mark Millard
marklmi at yahoo.com
Fri May 3 21:52:13 UTC 2019
[I've tried a different moea64_bootstrap_slb_prefault
usage variation. Also, I reverted all the Rx,Rx,Rx
no-op changes. I put in the change to protect the
mttb(...) implementation from interrupts
messing up the assignment to the TBR, just so that
would not be a worry. I still have my isync
additions and my libc string compare code changes
in what I'm working with for head -r347003 .]
On 2019-May-3, at 11:09, Mark Millard <marklmi at yahoo.com> wrote:
> [Exeriment with instead disabling the code in each of
> the nop_prio_ routines instead of commenting out specific
> calls. But mostly: more test runs. It does not support
> what I thought yesterday's cpu_mp_unlead suggested.]
>
> On 2019-May-2, at 22:23, Mark Millard <marklmi at yahoo.com> wrote:
>
>> [Note: I still have your requested loop change, my
>> isync additions, and my libc string compare code
>> change in what I'm working with for head -r347003 .]
>>
>> I started using printf to help identify more about what
>> code managed to execute vs what code did not for
>> hang-ups.
>>
>> This note is just about cpu_mp_unleash observations and
>> experiments related to what printf's showed.
>>
>> I did:
>>
>> static void
>> cpu_mp_unleash(void *dummy)
>> {
>> . . . (omitted as all earlier printf's printed) . . .
>> printf("cpu_mp_unleash: before DELAY\n");
>> /* Let the APs get into the scheduler */
>> DELAY(10000);
>> printf("cpu_mp_unleash: after DELAY\n");
>>
>> }
>>
>> What I saw was only the first of the twoDEALY printf's
>> shown above was printing when cpu_mp_unleash hung up,
>> such a hangup being the normal case when vt_upgrade
>> did not hang-up first.
>>
>> So I looked at /mnt/usr/src/sys/powerpc/powerpc/clock.c
>> and its DELAY routine and came up with only one thing
>> that looked like a useful experiment. Note what I
>> then commented out:
>>
>> # svnlite diff /mnt/usr/src/sys/powerpc/powerpc/clock.c
>> Index: /mnt/usr/src/sys/powerpc/powerpc/clock.c
>> ===================================================================
>> --- /mnt/usr/src/sys/powerpc/powerpc/clock.c (revision 347003)
>> +++ /mnt/usr/src/sys/powerpc/powerpc/clock.c (working copy)
>> @@ -309,10 +309,10 @@
>> TSENTER();
>> tb = mftb();
>> ttb = tb + howmany((uint64_t)n * 1000000, ps_per_tick);
>> - nop_prio_vlow();
>> + //nop_prio_vlow();
>> while (tb < ttb)
>> tb = mftb();
>> - nop_prio_medium();
>> + //nop_prio_medium();
>> TSEXIT();
>> }
>>
>> After this change I've not (yet?) seen another cpu_mp_unleash
>> hangup in my test context.
>>
>> Even if not documented to do so, it appears to me that
>> ori Rx,Rx,Rx code that is behind the nop_prio_vlow() does
>> something specific on the 970MP's in the 2-socket/2-core-each
>> G5 PowerMac11,2's --and what it does interferes with making
>> progress in DELAY, in at least that specific use of it and/or
>> any others on the ap's during cpu_mp_unleash.
>>
>> Of course, this testing process is of a probabilistic context
>> and I do not have hundreds or more of examples of any specific
>> condition at this point. But, so far, the change in behavior
>> seems clear: I went from always-hanging-up-so-far to
>> always-booting-so-far (when vt_upgrade did not prevent the
>> test in each context).
>
> So I uncommented the 2 calls commented out the contents of
> the nop_prio_ routines, summarized here via:
>
>
> # egrep -r 'or.*(31,31,31|1,1,1|6,6,6|2,2,2|5,5,5|3,3,3)' /mnt/usr/src/sys/powerpc/ | more /mnt/usr/src/sys/powerpc/include/cpufunc.h: //__asm __volatile("or 31,31,31");
> /mnt/usr/src/sys/powerpc/include/cpufunc.h: //__asm __volatile("or 1,1,1");
> /mnt/usr/src/sys/powerpc/include/cpufunc.h: //__asm __volatile("or 6,6,6");
> /mnt/usr/src/sys/powerpc/include/cpufunc.h: //__asm __volatile("or 2,2,2");
> /mnt/usr/src/sys/powerpc/include/cpufunc.h: //__asm __volatile("or 5,5,5");
> /mnt/usr/src/sys/powerpc/include/cpufunc.h: //__asm __volatile("or 3,3,3");
[I had forgotten to list cpu.h 's use of the 27,27,27 form in
cpu_spinwait.]
I've reverted all such "or Rx,Rx,Rx" changes as having no
overall change in observed behavior for any of the 3
hang-up areas.
The new experiment . . .
I've tried changing the loop to only moea64_bootstrap_slb_prefault
about 1/2 of the kernel slb slots, leaving the others to be filled
before mftb()%n_slbs based "random" replacements would start:
/*
- * Map the entire KVA range into the SLB. We must not fault there.
+ * Map about 1/2 the kernel slb slots, leaving the others available without
+ * mftb()%n_slbs replaceme being involved.
*/
#ifdef __powerpc64__
- for (va = virtual_avail; va < virtual_end; va += SEGMENT_LENGTH)
+ i = 0;
+ for (va = virtual_avail; va < virtual_end && i<(n_slbs-1)/2; va += SEGMENT_LENGTH, i++)
moea64_bootstrap_slb_prefault(va, 0);
#endif
This change still gets hang-ups in the 3 same places but,
so far, has booted more often. (Not that I have a
large sampling at this point.) Definitely still still
probabilistic.
Other status . . .
I've still not figured out a way to discover what happens
at each of the 3 hang-up points in the areas identified.
I'm also running out of ideas for what to do for possibly
gaining other types of evidence.
One hypothesis about why I've not seen these 3 hang-up
places in my normal environment ( currently based on
head -r345758 ), is that I normally run non-debug
kernels in that context. Here it has all been debug
kernels to better match with what artifacts.ci provides,
among other reasons.
For reference, the current -r347003 patches that I was
last exploring, including showing the extra printf's for
the 3 hang-up areas:
# svnlite diff /mnt/usr/src/sys/ | more
Index: /mnt/usr/src/sys/dev/vt/vt_core.c
===================================================================
--- /mnt/usr/src/sys/dev/vt/vt_core.c (revision 347003)
+++ /mnt/usr/src/sys/dev/vt/vt_core.c (working copy)
@@ -2720,6 +2720,7 @@
/* Init 25 Hz timer. */
callout_init_mtx(&vd->vd_timer, &vd->vd_lock, 0);
+printf("vt_upgrade after callout_init_mtx / before sequence leading to callout_reset\n");
/*
* Start timer when everything ready.
@@ -2732,6 +2733,7 @@
atomic_add_acq_int(&vd->vd_timer_armed, 1);
vd->vd_flags |= VDF_ASYNC;
callout_reset(&vd->vd_timer, hz / VT_TIMERFREQ, vt_timer, vd);
+printf("vt_upgrade after callout_reset\n");
register_handlers = 1;
}
Index: /mnt/usr/src/sys/fs/nfsserver/nfs_fha_new.c
===================================================================
--- /mnt/usr/src/sys/fs/nfsserver/nfs_fha_new.c (revision 347003)
+++ /mnt/usr/src/sys/fs/nfsserver/nfs_fha_new.c (working copy)
@@ -93,9 +93,11 @@
* Initialize the sysctl context list for the fha module.
*/
sysctl_ctx_init(&softc->sysctl_ctx);
+printf("fhanew_init: after sysctl_ctx_init / before SYSCTL_ADD_NODE\n");
softc->sysctl_tree = SYSCTL_ADD_NODE(&softc->sysctl_ctx,
SYSCTL_STATIC_CHILDREN(_vfs_nfsd), OID_AUTO, "fha", CTLFLAG_RD,
0, "NFS File Handle Affinity (FHA)");
+printf("fhanew_init: after SYSCTL_ADD_NODE\n");
if (softc->sysctl_tree == NULL) {
printf("%s: unable to allocate sysctl tree\n", __func__);
return;
Index: /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c
===================================================================
--- /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c (revision 347003)
+++ /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c (working copy)
@@ -956,10 +956,12 @@
virtual_end = VM_MAX_SAFE_KERNEL_ADDRESS;
/*
- * Map the entire KVA range into the SLB. We must not fault there.
+ * Map about 1/2 the kernel slb slots, leaving the others available without
+ * mftb()%n_slbs replaceme being involved.
*/
#ifdef __powerpc64__
- for (va = virtual_avail; va < virtual_end; va += SEGMENT_LENGTH)
+ i = 0;
+ for (va = virtual_avail; va < virtual_end && i<(n_slbs-1)/2; va += SEGMENT_LENGTH, i++)
moea64_bootstrap_slb_prefault(va, 0);
#endif
@@ -1090,8 +1092,8 @@
CPU_SET(PCPU_GET(cpuid), &pm->pm_active);
#ifdef __powerpc64__
- PCPU_SET(aim.userslb, pm->pm_slb);
- __asm __volatile("slbmte %0, %1; isync" ::
+ PCPU_SET(aim.userslb, pm->pm_slb); // no slbie needed?
+ __asm __volatile("isync; slbmte %0, %1; isync" ::
"r"(td->td_pcb->pcb_cpu.aim.usr_vsid), "r"(USER_SLB_SLBE));
#else
PCPU_SET(curpmap, pm->pmap_phys);
@@ -1104,7 +1106,7 @@
{
pmap_t pm;
- __asm __volatile("isync; slbie %0" :: "r"(USER_ADDR));
+ __asm __volatile("isync; slbie %0; isync" :: "r"(USER_ADDR));
pm = &td->td_proc->p_vmspace->vm_pmap;
CPU_CLR(PCPU_GET(cpuid), &pm->pm_active);
@@ -1956,7 +1958,7 @@
(uintptr_t)uaddr >> ADDR_SR_SHFT;
curthread->td_pcb->pcb_cpu.aim.usr_vsid = slbv;
#ifdef __powerpc64__
- __asm __volatile ("slbie %0; slbmte %1, %2; isync" ::
+ __asm __volatile ("isync; slbie %0; slbmte %1, %2; isync" ::
"r"(USER_ADDR), "r"(slbv), "r"(USER_SLB_SLBE));
#else
__asm __volatile("mtsr %0,%1; isync" :: "n"(USER_SR), "r"(slbv));
Index: /mnt/usr/src/sys/powerpc/aim/moea64_native.c
===================================================================
--- /mnt/usr/src/sys/powerpc/aim/moea64_native.c (revision 347003)
+++ /mnt/usr/src/sys/powerpc/aim/moea64_native.c (working copy)
@@ -406,7 +406,7 @@
*/
#ifdef __powerpc64__
- __asm __volatile ("slbia");
+ __asm __volatile ("isync; slbia");
__asm __volatile ("slbmfee %0,%1; slbie %0;" : "=r"(seg0) :
"r"(0));
@@ -417,6 +417,7 @@
__asm __volatile ("slbmte %0, %1" ::
"r"(slb[i].slbv), "r"(slb[i].slbe));
}
+ __asm __volatile ("isync");
#else
for (i = 0; i < 16; i++)
mtsrin(i << ADDR_SR_SHFT, kernel_pmap->pm_sr[i]);
Index: /mnt/usr/src/sys/powerpc/aim/slb.c
===================================================================
--- /mnt/usr/src/sys/powerpc/aim/slb.c (revision 347003)
+++ /mnt/usr/src/sys/powerpc/aim/slb.c (working copy)
@@ -457,7 +457,7 @@
/* If it is for this CPU, put it in the SLB right away */
if (pmap_bootstrapped) {
/* slbie not required */
- __asm __volatile ("slbmte %0, %1" ::
+ __asm __volatile ("isync; slbmte %0, %1; isync" ::
"r"(slbcache[i].slbv), "r"(slbcache[i].slbe));
}
Index: /mnt/usr/src/sys/powerpc/aim/trap_subr64.S
===================================================================
--- /mnt/usr/src/sys/powerpc/aim/trap_subr64.S (revision 347003)
+++ /mnt/usr/src/sys/powerpc/aim/trap_subr64.S (working copy)
@@ -65,6 +65,7 @@
li %r29, 0 /* Set the counter to zero */
+ isync
slbia
slbmfee %r31,%r29
clrrdi %r31,%r31,28
@@ -71,6 +72,7 @@
slbie %r31
1: ld %r31, 0(%r28) /* Load SLB entry pointer */
cmpdi %r31, 0 /* If NULL, stop */
+ isync
beqlr
ld %r30, 0(%r31) /* Load SLBV */
@@ -96,6 +98,7 @@
/* Otherwise, set up SLBs */
li %r29, 0 /* Set the counter to zero */
+ isync
slbia
slbmfee %r31,%r29
clrrdi %r31,%r31,28
@@ -105,6 +108,7 @@
ld %r31, 8(%r28) /* Load SLBE */
cmpdi %r31, 0 /* If SLBE is not valid, stop */
+ isync
beqlr
ld %r30, 0(%r28) /* Load SLBV */
slbmte %r30, %r31 /* Install SLB entry */
@@ -113,6 +117,7 @@
addi %r29, %r29, 1
cmpdi %r29, 64 /* Repeat if we are not at the end */
blt 1b
+ isync
blr
/*
Index: /mnt/usr/src/sys/powerpc/include/cpufunc.h
===================================================================
--- /mnt/usr/src/sys/powerpc/include/cpufunc.h (revision 347003)
+++ /mnt/usr/src/sys/powerpc/include/cpufunc.h (working copy)
@@ -155,15 +155,8 @@
return (tb);
}
-static __inline void
-mttb(u_quad_t time)
-{
+// mttb moved to after intr_restore
- mtspr(TBR_TBWL, 0);
- mtspr(TBR_TBWU, (uint32_t)(time >> 32));
- mtspr(TBR_TBWL, (uint32_t)(time & 0xffffffff));
-}
-
static __inline void
eieio(void)
{
@@ -202,6 +195,19 @@
mtmsr(msr);
}
+static __inline void
+mttb(u_quad_t time)
+{
+ const uint32_t high= time>>32;
+ const uint32_t low= time&0xffffffffu;
+
+ const register_t predisable_msr= intr_disable();
+ mtspr(TBR_TBWL, 0);
+ mtspr(TBR_TBWU, high);
+ mtspr(TBR_TBWL, low);
+ intr_restore(predisable_msr);
+}
+
static __inline struct pcpu *
get_pcpu(void)
{
Index: /mnt/usr/src/sys/powerpc/powerpc/mp_machdep.c
===================================================================
--- /mnt/usr/src/sys/powerpc/powerpc/mp_machdep.c (revision 347003)
+++ /mnt/usr/src/sys/powerpc/powerpc/mp_machdep.c (working copy)
@@ -286,8 +286,10 @@
if (smp_cpus > 1)
atomic_store_rel_int(&smp_started, 1);
+printf("cpu_mp_unleash: before DELAY\n");
/* Let the APs get into the scheduler */
DELAY(10000);
+printf("cpu_mp_unleash: after DELAY\n");
}
Index: /mnt/usr/src/sys/powerpc/powerpc/trap.c
===================================================================
--- /mnt/usr/src/sys/powerpc/powerpc/trap.c (revision 347003)
+++ /mnt/usr/src/sys/powerpc/powerpc/trap.c (working copy)
@@ -453,8 +453,8 @@
#if defined(__powerpc64__) && defined(AIM)
case EXC_DSE:
if (td->td_pcb->pcb_cpu.aim.usr_vsid != 0 &&
- (frame->dar & SEGMENT_MASK) == USER_ADDR) {
- __asm __volatile ("slbmte %0, %1" ::
+ (frame->dar & SEGMENT_MASK) == USER_ADDR) { // no slbie needed?
+ __asm __volatile ("isync; slbmte %0, %1; isync" ::
"r"(td->td_pcb->pcb_cpu.aim.usr_vsid),
"r"(USER_SLB_SLBE));
return;
@@ -712,8 +712,8 @@
* Speculatively restore last user SLB segment, which we know is
* invalid already, since we are likely to do copyin()/copyout().
*/
- if (td->td_pcb->pcb_cpu.aim.usr_vsid != 0)
- __asm __volatile ("slbmte %0, %1; isync" ::
+ if (td->td_pcb->pcb_cpu.aim.usr_vsid != 0) // no slbie needed?
+ __asm __volatile ("isync; slbmte %0, %1; isync" ::
"r"(td->td_pcb->pcb_cpu.aim.usr_vsid), "r"(USER_SLB_SLBE));
#endif
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-ppc
mailing list