970MP PowerMac G5s: What printf's show about cpu_mp_unleash hangups on the test variant of head -r347003 (found cpu_mp_unleash counterexample)

Mark Millard marklmi at yahoo.com
Fri May 3 21:52:13 UTC 2019


[I've tried a different moea64_bootstrap_slb_prefault
usage variation. Also, I reverted all the Rx,Rx,Rx
no-op changes. I put in the change to protect the
mttb(...) implementation from interrupts
messing up the assignment to the TBR, just so that
would not be a worry. I still have my isync
additions and my libc string compare code changes
in what I'm working with for head -r347003 .]

On 2019-May-3, at 11:09, Mark Millard <marklmi at yahoo.com> wrote:

> [Exeriment with instead disabling the code in each of
> the nop_prio_ routines instead of commenting out specific
> calls. But mostly: more test runs. It does not support
> what I thought yesterday's cpu_mp_unlead suggested.]
> 
> On 2019-May-2, at 22:23, Mark Millard <marklmi at yahoo.com> wrote:
> 
>> [Note: I still have your requested loop change, my
>> isync additions, and my libc string compare code
>> change in what I'm working with for head -r347003 .]
>> 
>> I started using printf to help identify more about what
>> code managed to execute vs what code did not for
>> hang-ups.
>> 
>> This note is just about cpu_mp_unleash observations and
>> experiments related to what printf's showed.
>> 
>> I did:
>> 
>> static void
>> cpu_mp_unleash(void *dummy)
>> {
>> . . . (omitted as all earlier printf's printed) . . .
>> printf("cpu_mp_unleash: before DELAY\n");
>>       /* Let the APs get into the scheduler */
>>       DELAY(10000);
>> printf("cpu_mp_unleash: after DELAY\n");
>> 
>> }
>> 
>> What I saw was only the first of the twoDEALY printf's
>> shown above was printing when cpu_mp_unleash hung up,
>> such a hangup being the normal case when vt_upgrade
>> did not hang-up first.
>> 
>> So I looked at /mnt/usr/src/sys/powerpc/powerpc/clock.c
>> and its DELAY routine and came up with only one thing
>> that looked like a useful experiment. Note what I
>> then commented out:
>> 
>> # svnlite diff /mnt/usr/src/sys/powerpc/powerpc/clock.c
>> Index: /mnt/usr/src/sys/powerpc/powerpc/clock.c
>> ===================================================================
>> --- /mnt/usr/src/sys/powerpc/powerpc/clock.c	(revision 347003)
>> +++ /mnt/usr/src/sys/powerpc/powerpc/clock.c	(working copy)
>> @@ -309,10 +309,10 @@
>> 	TSENTER();
>> 	tb = mftb();
>> 	ttb = tb + howmany((uint64_t)n * 1000000, ps_per_tick);
>> -	nop_prio_vlow();
>> +	//nop_prio_vlow();
>> 	while (tb < ttb)
>> 		tb = mftb();
>> -	nop_prio_medium();
>> +	//nop_prio_medium();
>> 	TSEXIT();
>> }
>> 
>> After this change I've not (yet?) seen another cpu_mp_unleash
>> hangup in my test context.
>> 
>> Even if not documented to do so, it appears to me that
>> ori Rx,Rx,Rx code that is behind the nop_prio_vlow() does
>> something specific on the 970MP's in the 2-socket/2-core-each
>> G5 PowerMac11,2's --and what it does interferes with making
>> progress in DELAY, in at least that specific use of it and/or
>> any others on the ap's during cpu_mp_unleash.
>> 
>> Of course, this testing process is of a probabilistic context
>> and I do not have hundreds or more of examples of any specific
>> condition at this point. But, so far, the change in behavior
>> seems clear: I went from always-hanging-up-so-far to
>> always-booting-so-far (when vt_upgrade did not prevent the
>> test in each context).
> 
> So I uncommented the 2 calls commented out the contents of
> the nop_prio_ routines, summarized here via:
> 
> 
> # egrep -r 'or.*(31,31,31|1,1,1|6,6,6|2,2,2|5,5,5|3,3,3)' /mnt/usr/src/sys/powerpc/ | more                                                                                                       /mnt/usr/src/sys/powerpc/include/cpufunc.h:     //__asm __volatile("or 31,31,31");
> /mnt/usr/src/sys/powerpc/include/cpufunc.h:     //__asm __volatile("or 1,1,1");
> /mnt/usr/src/sys/powerpc/include/cpufunc.h:     //__asm __volatile("or 6,6,6");
> /mnt/usr/src/sys/powerpc/include/cpufunc.h:     //__asm __volatile("or 2,2,2");
> /mnt/usr/src/sys/powerpc/include/cpufunc.h:     //__asm __volatile("or 5,5,5");
> /mnt/usr/src/sys/powerpc/include/cpufunc.h:     //__asm __volatile("or 3,3,3");

[I had forgotten to list cpu.h 's use of the 27,27,27 form in
cpu_spinwait.]

I've reverted all such "or Rx,Rx,Rx" changes as having no
overall change in observed behavior for any of the 3
hang-up areas.


The new experiment . . .

I've tried changing the loop to only moea64_bootstrap_slb_prefault
about 1/2 of the kernel slb slots, leaving the others to be filled
before mftb()%n_slbs based "random" replacements would start:

 	/*
-	 * Map the entire KVA range into the SLB. We must not fault there.
+	 * Map about 1/2 the kernel slb slots, leaving the others available without
+	 * mftb()%n_slbs replaceme being involved.
 	 */
 	#ifdef __powerpc64__
-	for (va = virtual_avail; va < virtual_end; va += SEGMENT_LENGTH)
+	i = 0;
+	for (va = virtual_avail; va < virtual_end && i<(n_slbs-1)/2; va += SEGMENT_LENGTH, i++)
 		moea64_bootstrap_slb_prefault(va, 0);
 	#endif

This change still gets hang-ups in the 3 same places but,
so far, has booted more often. (Not that I have a
large sampling at this point.) Definitely still still
probabilistic.


Other status . . .

I've still not figured out a way to discover what happens
at each of the 3 hang-up points in the areas identified.
I'm also running out of ideas for what to do for possibly
gaining other types of evidence.

One hypothesis about why I've not seen these 3 hang-up
places in my normal environment ( currently based on
head -r345758 ), is that I normally run non-debug
kernels in that context. Here it has all been debug
kernels to better match with what artifacts.ci provides,
among other reasons.


For reference, the current -r347003 patches that I was
last exploring, including showing the extra printf's for
the 3 hang-up areas:

# svnlite diff /mnt/usr/src/sys/ | more
Index: /mnt/usr/src/sys/dev/vt/vt_core.c
===================================================================
--- /mnt/usr/src/sys/dev/vt/vt_core.c   (revision 347003)
+++ /mnt/usr/src/sys/dev/vt/vt_core.c   (working copy)
@@ -2720,6 +2720,7 @@
 
                /* Init 25 Hz timer. */
                callout_init_mtx(&vd->vd_timer, &vd->vd_lock, 0);
+printf("vt_upgrade after callout_init_mtx / before sequence leading to callout_reset\n");
 
                /*
                 * Start timer when everything ready.
@@ -2732,6 +2733,7 @@
                atomic_add_acq_int(&vd->vd_timer_armed, 1);
                vd->vd_flags |= VDF_ASYNC;
                callout_reset(&vd->vd_timer, hz / VT_TIMERFREQ, vt_timer, vd);
+printf("vt_upgrade after callout_reset\n");
                register_handlers = 1;
        }
 
Index: /mnt/usr/src/sys/fs/nfsserver/nfs_fha_new.c
===================================================================
--- /mnt/usr/src/sys/fs/nfsserver/nfs_fha_new.c (revision 347003)
+++ /mnt/usr/src/sys/fs/nfsserver/nfs_fha_new.c (working copy)
@@ -93,9 +93,11 @@
         * Initialize the sysctl context list for the fha module.
         */
        sysctl_ctx_init(&softc->sysctl_ctx);
+printf("fhanew_init: after sysctl_ctx_init / before SYSCTL_ADD_NODE\n");
        softc->sysctl_tree = SYSCTL_ADD_NODE(&softc->sysctl_ctx,
            SYSCTL_STATIC_CHILDREN(_vfs_nfsd), OID_AUTO, "fha", CTLFLAG_RD,
            0, "NFS File Handle Affinity (FHA)");
+printf("fhanew_init: after SYSCTL_ADD_NODE\n");
        if (softc->sysctl_tree == NULL) {
                printf("%s: unable to allocate sysctl tree\n", __func__);
                return;
Index: /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c
===================================================================
--- /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c    (revision 347003)
+++ /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c    (working copy)
@@ -956,10 +956,12 @@
        virtual_end = VM_MAX_SAFE_KERNEL_ADDRESS; 
 
        /*
-        * Map the entire KVA range into the SLB. We must not fault there.
+        * Map about 1/2 the kernel slb slots, leaving the others available without
+        * mftb()%n_slbs replaceme being involved.
         */
        #ifdef __powerpc64__
-       for (va = virtual_avail; va < virtual_end; va += SEGMENT_LENGTH)
+       i = 0;
+       for (va = virtual_avail; va < virtual_end && i<(n_slbs-1)/2; va += SEGMENT_LENGTH, i++)
                moea64_bootstrap_slb_prefault(va, 0);
        #endif
 
@@ -1090,8 +1092,8 @@
        CPU_SET(PCPU_GET(cpuid), &pm->pm_active);
 
        #ifdef __powerpc64__
-       PCPU_SET(aim.userslb, pm->pm_slb);
-       __asm __volatile("slbmte %0, %1; isync" ::
+       PCPU_SET(aim.userslb, pm->pm_slb); // no slbie needed?
+       __asm __volatile("isync; slbmte %0, %1; isync" ::
            "r"(td->td_pcb->pcb_cpu.aim.usr_vsid), "r"(USER_SLB_SLBE));
        #else
        PCPU_SET(curpmap, pm->pmap_phys);
@@ -1104,7 +1106,7 @@
 {
        pmap_t  pm;
 
-       __asm __volatile("isync; slbie %0" :: "r"(USER_ADDR));
+       __asm __volatile("isync; slbie %0; isync" :: "r"(USER_ADDR));
 
        pm = &td->td_proc->p_vmspace->vm_pmap;
        CPU_CLR(PCPU_GET(cpuid), &pm->pm_active);
@@ -1956,7 +1958,7 @@
            (uintptr_t)uaddr >> ADDR_SR_SHFT;
        curthread->td_pcb->pcb_cpu.aim.usr_vsid = slbv;
 #ifdef __powerpc64__
-       __asm __volatile ("slbie %0; slbmte %1, %2; isync" ::
+       __asm __volatile ("isync; slbie %0; slbmte %1, %2; isync" ::
            "r"(USER_ADDR), "r"(slbv), "r"(USER_SLB_SLBE));
 #else
        __asm __volatile("mtsr %0,%1; isync" :: "n"(USER_SR), "r"(slbv));
Index: /mnt/usr/src/sys/powerpc/aim/moea64_native.c
===================================================================
--- /mnt/usr/src/sys/powerpc/aim/moea64_native.c        (revision 347003)
+++ /mnt/usr/src/sys/powerpc/aim/moea64_native.c        (working copy)
@@ -406,7 +406,7 @@
         */
 
        #ifdef __powerpc64__
-               __asm __volatile ("slbia");
+               __asm __volatile ("isync; slbia");
                __asm __volatile ("slbmfee %0,%1; slbie %0;" : "=r"(seg0) :
                    "r"(0));
 
@@ -417,6 +417,7 @@
                        __asm __volatile ("slbmte %0, %1" :: 
                            "r"(slb[i].slbv), "r"(slb[i].slbe)); 
                }
+               __asm __volatile ("isync");
        #else
                for (i = 0; i < 16; i++)
                        mtsrin(i << ADDR_SR_SHFT, kernel_pmap->pm_sr[i]);
Index: /mnt/usr/src/sys/powerpc/aim/slb.c
===================================================================
--- /mnt/usr/src/sys/powerpc/aim/slb.c  (revision 347003)
+++ /mnt/usr/src/sys/powerpc/aim/slb.c  (working copy)
@@ -457,7 +457,7 @@
        /* If it is for this CPU, put it in the SLB right away */
        if (pmap_bootstrapped) {
                /* slbie not required */
-               __asm __volatile ("slbmte %0, %1" :: 
+               __asm __volatile ("isync; slbmte %0, %1; isync" :: 
                    "r"(slbcache[i].slbv), "r"(slbcache[i].slbe)); 
        }
 
Index: /mnt/usr/src/sys/powerpc/aim/trap_subr64.S
===================================================================
--- /mnt/usr/src/sys/powerpc/aim/trap_subr64.S  (revision 347003)
+++ /mnt/usr/src/sys/powerpc/aim/trap_subr64.S  (working copy)
@@ -65,6 +65,7 @@
 
        li      %r29, 0                 /* Set the counter to zero */
 
+       isync
        slbia
        slbmfee %r31,%r29               
        clrrdi  %r31,%r31,28
@@ -71,6 +72,7 @@
        slbie   %r31
 1:     ld      %r31, 0(%r28)           /* Load SLB entry pointer */
        cmpdi   %r31, 0                 /* If NULL, stop */
+       isync
        beqlr
 
        ld      %r30, 0(%r31)           /* Load SLBV */
@@ -96,6 +98,7 @@
        /* Otherwise, set up SLBs */
        li      %r29, 0                 /* Set the counter to zero */
 
+       isync
        slbia
        slbmfee %r31,%r29               
        clrrdi  %r31,%r31,28
@@ -105,6 +108,7 @@
 
        ld      %r31, 8(%r28)           /* Load SLBE */
        cmpdi   %r31, 0                 /* If SLBE is not valid, stop */
+       isync
        beqlr
        ld      %r30, 0(%r28)           /* Load SLBV  */
        slbmte  %r30, %r31              /* Install SLB entry */
@@ -113,6 +117,7 @@
        addi    %r29, %r29, 1
        cmpdi   %r29, 64                /* Repeat if we are not at the end */
        blt     1b 
+       isync
        blr
 
 /*
Index: /mnt/usr/src/sys/powerpc/include/cpufunc.h
===================================================================
--- /mnt/usr/src/sys/powerpc/include/cpufunc.h  (revision 347003)
+++ /mnt/usr/src/sys/powerpc/include/cpufunc.h  (working copy)
@@ -155,15 +155,8 @@
        return (tb);
 }
 
-static __inline void
-mttb(u_quad_t time)
-{
+// mttb moved to after intr_restore
 
-       mtspr(TBR_TBWL, 0);
-       mtspr(TBR_TBWU, (uint32_t)(time >> 32));
-       mtspr(TBR_TBWL, (uint32_t)(time & 0xffffffff));
-}
-
 static __inline void
 eieio(void)
 {
@@ -202,6 +195,19 @@
        mtmsr(msr);
 }
 
+static __inline void
+mttb(u_quad_t time)
+{
+       const uint32_t   high= time>>32;
+       const uint32_t   low=  time&0xffffffffu;
+
+       const register_t predisable_msr= intr_disable();
+       mtspr(TBR_TBWL, 0);
+       mtspr(TBR_TBWU, high);
+       mtspr(TBR_TBWL, low);
+       intr_restore(predisable_msr);
+}
+
 static __inline struct pcpu *
 get_pcpu(void)
 {
Index: /mnt/usr/src/sys/powerpc/powerpc/mp_machdep.c
===================================================================
--- /mnt/usr/src/sys/powerpc/powerpc/mp_machdep.c       (revision 347003)
+++ /mnt/usr/src/sys/powerpc/powerpc/mp_machdep.c       (working copy)
@@ -286,8 +286,10 @@
        if (smp_cpus > 1)
                atomic_store_rel_int(&smp_started, 1);
 
+printf("cpu_mp_unleash: before DELAY\n");
        /* Let the APs get into the scheduler */
        DELAY(10000);
+printf("cpu_mp_unleash: after DELAY\n");
 
 }
 
Index: /mnt/usr/src/sys/powerpc/powerpc/trap.c
===================================================================
--- /mnt/usr/src/sys/powerpc/powerpc/trap.c     (revision 347003)
+++ /mnt/usr/src/sys/powerpc/powerpc/trap.c     (working copy)
@@ -453,8 +453,8 @@
 #if defined(__powerpc64__) && defined(AIM)
                case EXC_DSE:
                        if (td->td_pcb->pcb_cpu.aim.usr_vsid != 0 &&
-                           (frame->dar & SEGMENT_MASK) == USER_ADDR) {
-                               __asm __volatile ("slbmte %0, %1" ::
+                           (frame->dar & SEGMENT_MASK) == USER_ADDR) { // no slbie needed?
+                               __asm __volatile ("isync; slbmte %0, %1; isync" ::
                                        "r"(td->td_pcb->pcb_cpu.aim.usr_vsid),
                                        "r"(USER_SLB_SLBE));
                                return;
@@ -712,8 +712,8 @@
         * Speculatively restore last user SLB segment, which we know is
         * invalid already, since we are likely to do copyin()/copyout().
         */
-       if (td->td_pcb->pcb_cpu.aim.usr_vsid != 0)
-               __asm __volatile ("slbmte %0, %1; isync" ::
+       if (td->td_pcb->pcb_cpu.aim.usr_vsid != 0) // no slbie needed?
+               __asm __volatile ("isync; slbmte %0, %1; isync" ::
                    "r"(td->td_pcb->pcb_cpu.aim.usr_vsid), "r"(USER_SLB_SLBE));
 #endif
 


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)



More information about the freebsd-ppc mailing list