Some evidence about the PowerMac G5 multiprocessor boot hang ups with the modern VM_MAX_KERNEL_ADDRESS value

Sat Feb 16 04:32:44 UTC 2019

[I've had to search for an address that would not have
my values corrupted/replaced. I did not find one. I've
added the assignment requested before the PCPU_SET but
until I find an address to use that preserves the values
that I assign, it likely does not matter.]

On 2019-Feb-15, at 16:04, Justin Hibbits <chmeeedalf at gmail.com> wrote:

> On Fri, 15 Feb 2019 15:26:09 -0800
> Mark Millard <marklmi at yahoo.com> wrote:
> 
>> On 2019-Feb-15, at 14:09, Justin Hibbits <chmeeedalf at gmail.com>
>> wrote:
>> 
>>> On Fri, 15 Feb 2019 14:01:18 -0800
>>> Mark Millard <marklmi at yahoo.com> wrote:
>>> 
>>>> . . .
>>>> 
>>>> Just to be sure, was the 0xc prefix a typo
>>>> (vs. 0xe as a prefix)?:
>>>> 
>>>> 0xc000000000000010
>>>> vs.
>>>> 0xe000000000000010  
>>> 
>>> No, 0xc is correct.  0xc... is the address of the DMAP, and it so
>>> happens that the upper bits are ignored in real mode, simply by the
>>> fact that they're not placed onto the address bus.  We take
>>> advantage of that elsewhere as well.  So writing to 0xc000....10
>>> actually writes to 0x0000...10, both in real mode and translated
>>> mode.  Writing to this at various points when the AP is starting
>>> up, we can see just how far into the boot it gets.
>>> 
>>>> . . .  
>> 
>> I got an odd result from a successful boot. But first
>> notes what I did to the code:
>> 
>> I used 0xc000000000000010 via:
>> 
>> +       *(unsigned long*)0xc000000000000010 = 0x10; // HACK!!!
>> +       powerpc_sync(); // HACK!!!
>> 
>> just before returning from cpudep_ap_early_bootstrap
>> 
>> +        *(unsigned long*)0xc000000000000010 = 0x20; // HACK!!!
>> +        powerpc_sync(); // HACK!!!
>> 
>> just before return from pmap_cpu_bootstrap
>> 
>> +        *(unsigned long*)0xc000000000000010 = 0x30; // HACK!!!
>> +        powerpc_sync(); // HACK!!!
>> 
>> just before return from cpudep_ap_bootstrap
>> 
>> +        *(unsigned long*)0xc000000000000010 = 0x40; // HACK!!!
>> +        powerpc_sync(); // HACK!!!
>> 
>> just before returning from cpudep_ap_setup
>> 
>> +        *(unsigned long*)0xc000000000000010 = 0x51; // HACK!!!
>> +        powerpc_sync(); // HACK!!!
>> 
>> just before the ap_letgo loop in machdep_ap_boostrap [so just
>> after the PCPU_SET(away,1)]
>> 
>> +        *(unsigned long*)0xc000000000000010 = 0x50; // HACK!!!
>> +        powerpc_sync(); // HACK!!!
>> 
>> just before sched_throw(NULL) in machdep_ap_bootstrap
>> 
>> 
>> For CPU 3 just after the two (void)*rstvec related
>> code sequences powermac_smp_start_cpu reported:
>> 
>> *(unsigned long*)0xc000000000000010=0xffa34878A
>> 
>> For CPU 2 just after the two (void)*rstvec related
>> code sequences powermac_smp_start_cpu reported:
>> 
>> *(unsigned long*)0xc000000000000010=0x51
>> 
>> For CPU 1 just after the two (void)*rstvec related
>> code sequences powermac_smp_start_cpu reported:
>> 
>> *(unsigned long*)0xc000000000000010=0x51
>> 
>> It looks to me like something is using the memory
>> that 0xc000000000000010 maps to.
>> 
>> None of them reported the 0x50 from just before
>> the sched_throw(NULL) .
>> 
>> 
>> ===
>> Mark Millard
>> marklmi at yahoo.com
>> ( dsl-only.net went
>> away in early 2018-Mar)
>> 
> 
> Interesting.  That value looks like it could be an OpenFirmware
> phandle.  PowerISA does state that the first 256 bytes of memory is
> free for the OS (or firmware) to use as it sees fit, and we already
> know address 0x80 is special for OF.  Maybe pick another address if you
> wish to continue this experiment.  Can you write at the beginning of
> machdep_ap_bootstrap() some value, just before the PCPU_SET()? And then
> right after the sync?

Using 0xc000000000000020 resulted in the CPU 3 case
showing:

*(unsigned long*)0xc000000000000020=0x0

CPU 2 and CPU 1 again showed 0x51, as expected.

The same happened for 0xc000000000000030 .

After that I added the 0x5F hack shown below
(showing the 0xc0...40 address attempt):

void
machdep_ap_bootstrap(void)
{

        *(unsigned long*)0xc000000000000040 = 0x5F; // HACK!!!
        powerpc_sync(); // HACK!!!

        PCPU_SET(awake, 1);
        __asm __volatile("msync; isync");

        *(unsigned long*)0xc000000000000040 = 0x51; // HACK!!!
        powerpc_sync(); // HACK!!!

        while (ap_letgo == 0)
                __asm __volatile("or 31,31,31");
        __asm __volatile("or 6,6,6");
. . .

Then I continued my search for an address where my assigned
values would survive over the duration required.

The same happened for 0xc000000000000040 .
The same happened for 0xc000000000000050 .
The same happened for 0xc000000000000060 .
The same happened for 0xc000000000000070 .

Is there another reasonable address range to try?
(I've not tried any 0xc0000000000000?8 addresses.)

I'll remind that machdep_ap_bootstrap for CPU 3
does echo its own messages even when the hang up happens,
proving that it gets past the PCPU_SET(awake,1) and
the ap_letgo loop.

May be whatever clobbers 0xc0000000000000?0 content
sometimes clobbers something important to getting
pc_awake for CPU 3 set in the right place and to the
handling of CPU 2?

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)