head -r344018 powerpc64 variant on Powermac G5 (2 sockets, 2 cores each): [*buffer arena] shows up more . . .? [mpc85xx_smp_timebase_sync problem too]
Mark Millard
marklmi at yahoo.com
Tue Apr 16 21:26:47 UTC 2019
[Looks to me like the use and content of mpc85xx_smp_timebase_sync
have the same type of problems I noted for the proposed powermac
patch.]
On 2019-Mar-6, at 20:36, Justin Hibbits <chmeeedalf at gmail.com> wrote:
> On Wed, 6 Mar 2019 18:35:42 -0800
> Mark Millard <marklmi at yahoo.com> wrote:
>
>> [The patch is definitely wrong via a 3rd use of
>> platform_smp_timebase_sync that I'd not noted before. Details at the
>> end.]
>>
>> On 2019-Mar-6, at 16:39, Mark Millard <marklmi at yahoo.com> wrote:
>>
>>
>>
>>> On 2019-Mar-6, at 13:19, Justin Hibbits <chmeeedalf at gmail.com>
>>> wrote:
>>>> On Mon, 4 Mar 2019 19:43:09 -0800
>>>> Mark Millard via freebsd-ppc <freebsd-ppc at freebsd.org> wrote:
>>>>
>>>>> [It is possible that the following is tied to my hack to
>>>>> avoid threads ending up stuck-sleeping. But I ask about
>>>>> an alternative that I see in the code.]
>>>>>
>>>>> Context: using the modern powerpc64 VM_MAX_KERNEL_ADDRESS
>>>>> and using usefdt=1 on an old Powermac G5 (2 sockets, 2 cores
>>>>> each). Hacks are in use to provide fairly reliable booting
>>>>> and to avoid threads getting stuck sleeping.
>>>>>
>>>>> Before the modern VM_MAX_KERNEL_ADDRESS figure there were only
>>>>> 2 or 3 bufspacedaemon-* threads as I remember. Now there are 8
>>>>> (plus bufdaemon and its worker), for example:
>>>>>
>>>>> root 23 0.0 0.0 0 288 - DL 15:48 0:00.39
>>>>> [bufdaemon/bufdaemon] root 23 0.0 0.0 0 288 -
>>>>> DL 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0
>>>>> 0.0 0 288 - DL 15:48 0:00.05 [bufdaemon/bufspaced]
>>>>> root 23 0.0 0.0 0 288 - DL 15:48 0:00.05
>>>>> [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 -
>>>>> DL 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0
>>>>> 0.0 0 288 - DL 15:48 0:00.05 [bufdaemon/bufspaced]
>>>>> root 23 0.0 0.0 0 288 - DL 15:48 0:00.07
>>>>> [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 -
>>>>> DL 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0
>>>>> 0.0 0 288 - DL 15:48 0:00.56 [bufdaemon// worker]
>>>>>
>>>>> I'm sometimes seeing processes showing [*buffer arena] that
>>>>> seemed to wait for a fairly long time with that status, not
>>>>> something I'd seen historically for those same types of
>>>>> processes for a similar overall load (not much). During such
>>>>> times trying to create processes to look around at what is
>>>>> going on seems to also wait. (Probably with the same status?)
>>>>>
>>>>
>>>> Hi Mark,
>>>>
>>>> Can you try the attached patch? It might be overkill in the
>>>> synchronization, and I might be using the wrong barriers to be
>>>> considered correct, but I think this should narrow the race down,
>>>> and synchronize the timebases to within a very small margin. The
>>>> real correct fix would be to suspend the timebase on all cores,
>>>> which is feasible (there's a GPIO for the G4s, and i2c for G5s),
>>>> but that's a non-trivial extra work.
>>>>
>>>> Be warned, I haven't tested it, I've only compiled it (I don't
>>>> have a G5 to test with anymore).
>>>>
>>>
>>> Sure, I'll try it when the G5 is again available: it is doing
>>> a time consuming build.
>>>
>>> I do see one possible oddity: tracing another
>>> platform_smp_timebase_sync use in the code . . .
>>>
>>> DEVMETHOD(cpufreq_drv_set, pmufreq_set)
>>>
>>> static int
>>> pmufreq_set(device_t dev, const struct cf_setting *set)
>>> {
>>> . . .
>>> error = pmu_set_speed(speed_sel);
>>> . . .
>>> }
>>>
>>> int
>>> pmu_set_speed(int low_speed)
>>> {
>>> . . .
>>> platform_sleep();
>>> . . .
>>> }
>>>
>>> PLATFORMMETHOD(platform_sleep, powermac_sleep),
>>>
>>> void
>>> powermac_sleep(platform_t platform)
>>> {
>>>
>>> *(unsigned long *)0x80 = 0x100;
>>> cpu_sleep();
>>> }
>>>
>>> void
>>> cpu_sleep()
>>> {
>>> . . .
>>> platform_smp_timebase_sync(timebase, 0);
>>> . . .
>>> }
>>>
>>> PLATFORMMETHOD(platform_smp_timebase_sync,
>>> powermac_smp_timebase_sync),
>>>
>>> The issue:
>>>
>>> I do not see any matching platform_smp_timebase_sync(timebase, 1)
>>> or other CPUs doing a powermac_smp_timebase_sync in this sequence.
>>>
>>> (If this makes testing the patch inappropriate, let me know.)
>>>
>>
>> More important: There is also a use of:
>>
>> /* The following is needed for restoring from sleep. */
>> platform_smp_timebase_sync(0, 1);
>>
>> in cpudep_ap_setup . That in turn happens during cpu_reset_handler
>> before machdep_ap_bootstrap is called (which does
>> platform_smp_timebase_sync as well) :
>>
>> cpu_reset_handler:
>> GET_TOCBASE(%r2)
>>
>> ld %r1,TOC_REF(tmpstk)(%r2) /* get new SP */
>> addi %r1,%r1,(TMPSTKSZ-48)
>>
>> bl CNAME(cpudep_ap_early_bootstrap) /* Set PCPU */
>> nop
>> lis %r3,1 at l
>> bl CNAME(pmap_cpu_bootstrap) /* Turn on virtual
>> memory */ nop
>> bl CNAME(cpudep_ap_bootstrap) /* Set up PCPU and
>> stack */ nop
>> mr %r1,%r3 /* Use new stack */
>> bl CNAME(cpudep_ap_setup)
>> nop
>> GET_CPUINFO(%r5)
>> ld %r3,(PC_RESTORE)(%r5)
>> cmpldi %cr0,%r3,0
>> beq %cr0,2f
>> nop
>> li %r4,1
>> bl CNAME(longjmp)
>> nop
>> 2:
>> #ifdef SMP
>> bl CNAME(machdep_ap_bootstrap) /* And away! */
>> nop
>> #endif
>>
>> Thus overall for ap's there is the sequence:
>>
>> platform_smp_timebase_sync(0, 1);
>> . . .
>> while (ap_letgo == 0)
>> __asm __volatile("or 31,31,31");
>> __asm __volatile("or 6,6,6");
>>
>> /*
>> * Set timebase as soon as possible to meet an implicit
>> rendezvous
>> * from cpu_mp_unleash(), which sets ap_letgo and then
>> immediately
>> * sets timebase.
>> *
>> * Note that this is instrinsically racy and is only relevant
>> on
>> * platforms that do not support better mechanisms.
>> */
>> platform_smp_timebase_sync(ap_timebase, 1);
>>
>> for each ap . So the (ap) case in powermac_smp_timebase_sync
>> will wait with tb==0 (from cpudep_ap_setup) and the later calls
>> from machdep_ap_bootstrap will not wait and will be after the
>> unleash but not just local to powermac_smp_timebase_sync:
>>
>> static void
>> powermac_smp_timebase_sync(platform_t plat, u_long tb, int ap)
>> {
>> static int cpus;
>> static int unleash;
>>
>> if (ap) {
>> atomic_add_int(&cpus, 1);
>> while (!atomic_load_acq_int(&unleash))
>> ;
>> } else {
>> atomic_add_int(&cpus, 1);
>> while (atomic_load_int(&cpus) != mp_ncpus)
>> ;
>> atomic_store_rel_int(&unleash, 1);
>> }
>>
>> mttb(tb);
>> }
>>
>> In the end cpus will have double counts of the ap cpus instead
>> of matching mp_ncpus.
>>
>> cpufreq_drv_set activity is a seperate, additional issue from this.
>>
>> ===
>> Mark Millard
>> marklmi at yahoo.com
>> ( dsl-only.net went
>> away in early 2018-Mar)
>>
>
> As mentioned, I had only compiled it. Your examination of the code
> path demonstrates that the patch is insufficient, and would hang at
> unleash anyway. The sleep/wake logic probably needs to be updated
> anyway. It was written for a G4 powerbook primarily for the PMU-based
> cpufreq driver, so some bits might need to be moved around. Orthogonal
> to this issue, though.
It appears to me that the overall sequence:
platform_smp_timebase_sync(0, 1); // called from cpudep_ap_setup
. . .
// The following are from in machdep_ap_bootstrap . . .
while (ap_letgo == 0)
__asm __volatile("or 31,31,31");
__asm __volatile("or 6,6,6");
. . .
platform_smp_timebase_sync(ap_timebase, 1);
calls mpc85xx_smp_timebase_sync twice per ap and that the
2nd call has tb_ready && mp_ncpus<=cpu_done for each such
ap, leading to a lack of coordination of the activity that
2nd time for each ap:
static void
mpc85xx_smp_timebase_sync(platform_t plat, u_long tb, int ap)
{
static volatile bool tb_ready;
static volatile int cpu_done;
if (ap) {
/* APs. Hold off until we get a stable timebase. */
while (!tb_ready)
atomic_thread_fence_seq_cst();
mttb(tb);
atomic_add_int(&cpu_done, 1);
while (cpu_done < mp_ncpus)
atomic_thread_fence_seq_cst();
} else {
/* BSP */
freeze_timebase(rcpm_dev, true);
tb_ready = true;
mttb(tb);
atomic_add_int(&cpu_done, 1);
while (cpu_done < mp_ncpus)
atomic_thread_fence_seq_cst();
freeze_timebase(rcpm_dev, false);
}
}
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-ppc
mailing list