An experimental hack that appears to allow old PowerMacG5 4-core (system total) system to boot reliably (head -r343884 based context)
Mark Millard
marklmi at yahoo.com
Tue Feb 26 21:11:57 UTC 2019
[I explicitly note that my hack is racy. It apepars that
I've finally had an example.]
On 2019-Feb-24, at 13:50, Mark Millard <marklmi at yahoo.com> wrote:
> On 2019-Feb-24, at 13:07, Justin Hibbits <chmeeedalf at gmail.com> wrote:
>
>> On Sat, Feb 23, 2019 at 1:36 PM Mark Millard <marklmi at yahoo.com> wrote:
>>>
>>> For sys/powerpc/aim/mp_cpudep.c 's cpudep_ap_bootstrap I added as shown below:
>>>
>>> +extern void hack_into_slb_if_needed(void* vap); // HACK!!!
>>> +
>>> uintptr_t
>>> cpudep_ap_bootstrap(void)
>>> {
>>> . . .
>>> + hack_into_slb_if_needed(pcpup->pc_curpcb); // HACK!!!
>>> +
>>> sp = pcpup->pc_curpcb->pcb_sp;
In the above, after the implict slb_insert_kernel, but before
the pcpup->pc_curpcb-> attempt, the slb entry could be replaced
again. There are, after all, other threads in operation before
SI_SUB_SMP starts:
SI_SUB_KTHREAD_INIT = 0xe000000, /* init process*/
SI_SUB_KTHREAD_PAGE = 0xe400000, /* pageout daemon*/
SI_SUB_KTHREAD_VM = 0xe800000, /* vm daemon*/
SI_SUB_KTHREAD_BUF = 0xea00000, /* buffer daemon*/
SI_SUB_KTHREAD_UPDATE = 0xec00000, /* update daemon*/
SI_SUB_KTHREAD_IDLE = 0xee00000, /* idle procs*/
#ifndef EARLY_AP_STARTUP
SI_SUB_SMP = 0xf000000, /* start the APs*/
#endif
I've finally had one boot hang-up, apparently from this happening.
>>> and in src/sys/powerpc/aim/slb.c I added an implementation:
>>>
>>> +void hack_into_slb_if_needed(void* vap); // HACK!!!
>>> +void hack_into_slb_if_needed(void* vap) // HACK!!!
>>> +{ // HACK!!!
>>> + struct slb *cache= PCPU_GET(aim.slb);
>>> + vm_offset_t va= (vm_offset_t)vap;
>>> + uint64_t slbv= kernel_va_to_slbv(va);
>>> + uint64_t esid= va>>ADDR_SR_SHFT;
>>> + uint64_t slbe= (esid<<SLBE_ESID_SHIFT) | SLBE_VALID;
>>> + int i;
>>> +
>>> + for (i = 0; i < n_slbs; i++) {
>>> + if (i == USER_SLB_SLOT)
>>> + continue;
>>> + if (cache[i].slbe == (slbe | i))
>>> + break;
>>> + }
>>> +
>>> + if (i==n_slbs)
>>> + slb_insert_kernel(slbe,slbv);
>>> +} // HACK!!!
>>> +
>>>
>>> So far I've not had any boot hang-ups after this.
>>>
>>> Given the random nature of the hang-ups it will be a
>>> while before I conclude for sure how reliable this
>>> change makes booting, but so far so good.
>>>
>>> (I recognize that the "break" could be "return"
>>> and then then the "if (i==n_slbs)" would not be
>>> needed.)
>>>
>>>
>>> Other issues not fixed by this:
>>>
>>> This does not change the buf*daemon* randomly getting
>>> hung up (and so timing out on shutdown). This appears
>>> to be the same issue that leads to the fans sometimes
>>> starting to run full-rate because of pmac_thermal
>>> being hun -up.
>>>
>>> For buf*daemon* "top -SHIopid" before shutdown shows
>>> just the ones that will not hang-up. The same goes for
>>> seeing before hand for pmac_thermal vs. the fans.
>>>
>>> ===
>>> Mark Millard
>>
>> Hi Mark,
>>
>> Fantastic work tracking this down! So the problem is we now can fault
>> when accessing KVA space. I think we should allow this, otherwise we
>> can hamper performance with reduced KVA size. I'll have to think
>> about how best to do this. Would you be willing to test patches I
>> come up with?
>
> I'll try to test whatever updates you want but there may be some
> issues with timeliness.
>
>
>
> The reason for the "sometimes" boot-failure is that the entry in the
> slb for the PCB/stack for the CPU being added has sometimes been
> replaced already before the CPU the pcb is for has sufficiently
> configured to allow automatic handling --and other times has not
> yet been replaced: the random slb replacement mechanism.
>
> There already is code to handle slb entry replacements but it does
> not work for a CPU still being set up (at the stage of the
> sometimes failure). At least that is what I expect for:
>
> # grep -r "handle_kernel_slb_spill" /usr/src/sys/powerpc/
> /usr/src/sys/powerpc/aim/trap_subr64.S: bl handle_kernel_slb_spill
> /usr/src/sys/powerpc/powerpc/trap.c: void handle_kernel_slb_spill(int, register_t, register_t);
> /usr/src/sys/powerpc/powerpc/trap.c:handle_kernel_slb_spill(int type, register_t dar, register_t srr0)
>
> So my hack was to separately do the potential replacement in that
> early time frame to allow the configuration for the CPU to get
> far enough along for the existing mechanism to work. (At least
> that is what I expect that I did.)
>
> So far I've had no boot failures of any kind with the hack.
> I've removed the hacks for reporting information and things
> still work.
>
> But I've not tried anything extensive after booting because
> things like buf*daemon* threads and pmac_thermal are randomly
> hanging up in/at:
>
> mi_switch+0x134 sleepq_switch+0x2ec sleepq_timedwait+0x48 _sleep+0x41c
> (mi_swtich seems to have called sched_switch based on the
> "+0x134" and the code in that area --but ched_switch is not
> listed)
>
> I've no clue what is safe when one or more buf*daeomon* threads
> make no progress.
>
> For shutdown that frequently leads to timeouts for stopping some
> buf*deamon* threads (when all 8 time out it takes about 8 minutes).
> The buf*deamon* that fail are the ones that "top -SHIopid" no
> longer shows.
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-ppc
mailing list