lr=u_trap+0x10 and srr0=k_trap+0x28 for "stopped at 0 illegal instruction 0" before-copyright hang on PowerMac G5's
Mark Millard
markmi at dsl-only.net
Sat Sep 27 10:51:42 UTC 2014
I found the backtrace for the OF_peer call that leads to the "before copyright"/ofwcall-for-peer hang/crash in ofwcall. This happens to be the first ofwcall with pmap_bootstrapped!=0, which may be the biggest issue involved (for what it implies).
.OF_peer+0x8c
.powermac_smp_first_cpu+0x3c (OF_peer(0) below)
.platform_smp_first_cpu+0x78
.cpu_mp_setmaxid+0x2c (via .mpt_fc_els_reply_handler+0x2e68 that is not explicitly listed)
.mp_setmaxid+0x14
.mi_startup0x10c
btext+0xbc
The source code involved is:
static int
powermac_smp_first_cpu(platform_t plat, struct cpuref *cpuref)
{
char buf[8];
phandle_t cpu, dev, root;
int res;
root = OF_peer(0);
dev = OF_child(root);
while (dev != 0) {
res = OF_getprop(dev, "name", buf, sizeof(buf));
if (res > 0 && strcmp(buf, "cpus") == 0)
break;
dev = OF_peer(dev);
}
if (dev == 0) {
/*
* psim doesn't have a name property on the /cpus node,
* but it can be found directly
*/
dev = OF_finddevice("/cpus");
if (dev == -1)
return (ENOENT);
}
cpu = OF_child(dev);
while (cpu != 0) {
res = OF_getprop(cpu, "device_type", buf, sizeof(buf));
if (res > 0 && strcmp(buf, "cpu") == 0)
break;
cpu = OF_peer(cpu);
}
if (cpu == 0)
return (ENOENT);
return (powermac_smp_fill_cpuref(cpuref, cpu));
}
To check if the peer use is special I temporarily made OF_peer cache the node 0 result so only the first such call uses ofwcall. (The above is not the first such call.) The expectation is that the OF_child should then fail. And it does. So peer is not special: it is just whichever ofwcall argument type happens to be the first after pmap_bootstrapped!=0 that get the problem.
===
Mark Millard
markmi at dsl-only.net
On Sep 26, 2014, at 11:55 PM, Mark Millard <markmi at dsl-only.net> wrote:
According to my adjusted dumping: At the "before Copyright"/ofwcall-for-peer crash ofw_real_mode==0.
And that does turn off exception vector save/restore:
__inline void
ofw_save_trap_vec(char *save_trap_vec)
{
if (!ofw_real_mode)
return;
bcopy((void *)EXC_RST, save_trap_vec, EXC_LAST - EXC_RST);
}
static __inline void
ofw_restore_trap_vec(char *restore_trap_vec)
{
if (!ofw_real_mode)
return;
bcopy(restore_trap_vec, (void *)EXC_RST, EXC_LAST - EXC_RST);
__syncicache(EXC_RSVD, EXC_LAST - EXC_RSVD);
}
So now it is clear to me how FreeBSD's exception vectors could be involved in a context that does not have FreeBSD's environment in place. (Finally!)
For powerpc64/GENERIC64 it should also then establish OFW_STD_32BIT:
boolean_t
OF_bootstrap()
{
boolean_t status = FALSE;
if (openfirmware_entry != NULL) {
if (ofw_real_mode) {
status = OF_install(OFW_STD_REAL, 0);
} else {
#ifdef __powerpc64__
status = OF_install(OFW_STD_32BIT, 0);
#else
status = OF_install(OFW_STD_DIRECT, 0);
#endif
}
This seems to be like OFW_STD_REAL in what it sets up: ofw_real_methods.
static ofw_def_t ofw_real = {
OFW_STD_REAL,
ofw_real_methods,
0
};
OFW_DEF(ofw_real);
static ofw_def_t ofw_32bit = {
OFW_STD_32BIT,
ofw_real_methods,
0
};
OFW_DEF(ofw_32bit);
ofw_real_mode is used to figure out the context when it matters from what I can tell so far.
Just to experiment to be sure I temporarily hacked in ignoring ofw_real_mode in ofw_save_trap_vec and ofw_restore_trap_vec so they would be effective at exception vector swapping.
As I guessed it still hangs before the copyright notice. (Without getting to DDB so no dump information is displayed.)
===
Mark Millard
markmi at dsl-only.net
On Sep 26, 2014, at 10:18 PM, Mark Millard <markmi at dsl-only.net> wrote:
The first send of this was big enough for the moderator to be involved. So I canceled and am sending with less history included.
[I'll note that I seem to have trouble typing 0xdbb290 vs. 0xbdd290. The actual value is 0xdbb290. The references to the incorrect typing should say 0xbdd290, which is the wrong value. But I've had both types of references listing the wrong text... in various notes.]
===
Mark Millard
markmi at dsl-only.net
On Sep 26, 2014, at 10:11 PM, Mark Millard <markmi at dsl-only.net> wrote:
The openfirmware peer crash (i.e., the before Copyright notice crash) happens during/just-after the MMU setup and the peer pfwcall is the first ofwcall where pmap_bootstrapped is non-zero at the time. In other words: the very first ofwcall in the new context fails.
And this failure involves some of the same code area that I got a backtrace for and reported as a separate crash (with the trace listed). As a reminder for that backtrace that has a difference failure point:
.pvo_vaddr_compare+0x14, instruction ld r0, r4, 0x58 [or ld r0,88(r4) in an alternate notation]
.pvo_tree_RB_FIND+0x38
.moea64_dev_direct_mapped_0x90
.pmap_dev_direct_mapped+0x84 ("_dev" was missing in earlier note)
.bs_remap_earlyboot_0x6c
.moea64_late_bootstrap+0x178
.moea64_bootstrap_native+0x120
.pmap_bootstrap+0xac
.powerpc_init+0x514
btext+0xa8
As for the sequence of ofwcall's that I reported: starting at the last OF_finddevice before the OF_instance_to_package that I reported in the sequence of ofwcall's from quiesce until the crash...
moea64_late_bootstrap does
chosen = OF_finddevice("/chosen");
if (chosen != -1 && OF_getprop(chosen, "mmu", &mmui, 4) != -1) {
mmu = OF_instance_to_package(mmui);
if (mmu == -1 || (sz = OF_getproplen(mmu, "translations")) == -1)
sz = 0;
if (sz > 6144 /* tmpstksz - 2 KB headroom */)
panic("moea64_bootstrap: too many ofw translations");
if (sz > 0)
moea64_add_ofw_mappings(mmup, mmu, sz);
}
with moea64_add_ofw_mappings called. Then...
moea64_add_ofw_mappings does...
bzero(translations, sz);
OF_getprop(OF_finddevice("/"), "#address-cells", &acells,
sizeof(acells));
if (OF_getprop(mmu, "translations", trans_cells, sz) == -1)
panic("moea64_bootstrap: can't get ofw translations");
And it is the next ofwcall after that last OF_getprop that fails. (It happens to be a peer request.) Adding a dump of the pmap_bootstrapped value with the ofwcall name in my hack for reporting things about the crash confirmed that peer ofwcall as the first with pmap_bootstrapped non-zero.
I will note here that it is somewhat later than the above code that pvo_vaddr_compare ends up executing via bs_remap_earlyboot. That earlier moea64_late_bootstrap code continues after the } from the first if above with:
/*
* Calculate the last available physical address.
*/
for (i = 0; phys_avail[i + 2] != 0; i += 2)
;
Maxmem = powerpc_btop(phys_avail[i + 1]);
/*
* Initialize MMU and remap early physical mappings
*/
MMU_CPU_BOOTSTRAP(mmup,0);
mtmsr(mfmsr() | PSL_DR | PSL_IR);
pmap_bootstrapped++;
bs_remap_earlyboot();
(and more). I've not found the peer call yet but it may well be after the pvo_vaddr_compare shown above as far as execution order goes.
===
Mark Millard
markmi at dsl-only.net
On Sep 25, 2014, at 2:41 PM, Mark Millard <markmi at dsl-only.net> wrote:
The first boot after make -8 kernel without quiesce also died during peer, I'd guess the same one.
Looks like quiesce does not matter for the issue. (But it is handy for identifying which peer fails.)
===
Mark Millard
markmi at dsl-only.net
On Sep 25, 2014, at 2:08 PM, Nathan Whitehorn <nwhitehorn at freebsd.org> wrote:
Can you comment out the call to quiesce? It may not be necessary on your system.
-Nathan
On 09/25/14 13:17, Mark Millard wrote:
> The "before copyright" hang/exception is during the first openfirmware "peer" after "quiesce". The ofw_restore_trap_vec(save_trap_init) completes fine, the ofwcall(args) is made but it does not return normally.
>
> Ignoring the ofwcall's from before quiesce, the sequence of ofwcall's is:
>
> quiesce
> finddevice
> parent
> getprop
> getprop
> getprop
> finddevice
> getprop
> instance-to-package
> getproplen
> finddevice
> getprop
> getprop
> peer
>
> And when the boot fails before the copyright that ofwcall for peer ends up resulting in the register dump with no register pointing to the kernel's normal stack area.
>
> I still have no clue what is happening during peer. ofw_restore_trap_vec(save_trap_init) is being called and is returning before ofwcall is used. For all I know some uses of peer could require not being quiesce'd in order for peer to be reliable.
>
> In the form of my display indicating what executed the text reported ends in:
>
> <peer>^
>
> where the ^ indicates the stage that last completed in the call sequence inside openfirmware_core. This information is displayed by the
>
> x/s ofw_name_history
>
> in the automatically created default script for DDB. I read the sequence backwards from the end marker (here ^), following the wraparound if there is that much text and if I care to go back that far.
>
> FreeBSD FBSDG5M1 10.1-BETA2 FreeBSD 10.1-BETA2 #11 r271944M: Thu Sep 25 12:14:05 PDT 2014 root at FBSDG5M1:/usr/obj/usr/src/sys/GENERIC64 powerpc
>
> My current hacks to get this information are:
>
> Index: /usr/src/sys/ddb/db_script.c
> ===================================================================
> --- /usr/src/sys/ddb/db_script.c (revision 271944)
> +++ /usr/src/sys/ddb/db_script.c (working copy)
> @@ -319,10 +319,25 @@
> {
> char scriptname[DB_MAXSCRIPTNAME];
>
> + /* HACK!!! : Additional lines to force a basic default script to exist.
> + * Will dump information even if ddb input is not available for early crash.
> + * Used to get more information about PowerMac G5 "before Copyright" hangs.
> + */
> + struct ddb_script *dsp = db_script_lookup(DB_SCRIPT_KDBENTER_DEFAULT);
> + if (!dsp) db_script_set(DB_SCRIPT_KDBENTER_DEFAULT, "show registers; bt; x/s ofw_name_history");
> +
> snprintf(scriptname, sizeof(scriptname), "%s.%s",
> DB_SCRIPT_KDBENTER_PREFIX, eventname);
> if (db_script_exec(scriptname, 0) == ENOENT)
> (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
> +
> + /* HACK!!! : Additional lines to always use the default script,
> + * even if scriptname existed and was executed.
> + * Will dump information even if ddb input is not available for early crash.
> + * Used to get more information about PowerMac G5 "before Copyright" hangs.
> + */
> + else
> + (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
> }
>
> /*-
> Index: /usr/src/sys/powerpc/conf/GENERIC64
> ===================================================================
> --- /usr/src/sys/powerpc/conf/GENERIC64 (revision 271944)
> +++ /usr/src/sys/powerpc/conf/GENERIC64 (working copy)
> @@ -76,6 +76,8 @@
> # Debugging support. Always need this:
> options KDB # Enable kernel debugger support.
> options KDB_TRACE # Print a stack trace for a panic.
> +options DDB
> +options GDB
>
> # Make an SMP-capable kernel by default
> options SMP # Symmetric MultiProcessor Kernel
> Index: /usr/src/sys/powerpc/ofw/ofw_machdep.c
> ===================================================================
> --- /usr/src/sys/powerpc/ofw/ofw_machdep.c (revision 271944)
> +++ /usr/src/sys/powerpc/ofw/ofw_machdep.c (working copy)
> @@ -324,6 +324,12 @@
> openfirmware(&args);
> }
>
> +/* Part of HACK to have record of ofw call names */
> +#define ofw_name_history_record_size 256
> +char ofw_name_history[ofw_name_history_record_size+1] = {}; /* Initially: automatically '\0' filled */
> +char * ofw_name_history_pos = ofw_name_history;
> +/* End Part of HACK */
> +
> static int
> openfirmware_core(void *args)
> {
> @@ -330,6 +336,42 @@
> int result;
> register_t oldmsr;
>
> + { /* HACK to have record of ofw call names */
> + struct argtype_prefix {
> + cell_t name;
> + };
> +
> + char *name = (char*) (uintptr_t) (((struct argtype_prefix*)args)->name);
> +
> + int i;
> +
> + *ofw_name_history_pos = '<';
> +
> + for(i=0; (*name) && i!=20; i++) {
> + ofw_name_history_pos++;
> + if (ofw_name_history_pos == &ofw_name_history[ofw_name_history_record_size]) {
> + ofw_name_history_pos = ofw_name_history;
> + }
> + *ofw_name_history_pos = *name;
> +
> + name++;
> + }
> +
> + ofw_name_history_pos++;
> + if (ofw_name_history_pos == &ofw_name_history[ofw_name_history_record_size]) {
> + ofw_name_history_pos = ofw_name_history;
> + }
> + *ofw_name_history_pos = '>';
> +
> + ofw_name_history_pos++;
> + if (ofw_name_history_pos == &ofw_name_history[ofw_name_history_record_size]) {
> + ofw_name_history_pos = ofw_name_history;
> + }
> + *ofw_name_history_pos = '@';
> +
> + ofw_name_history[ofw_name_history_record_size] = '\0'; /* Paranoia */
> + } /* HACK end */
> +
> /*
> * Turn off exceptions - we really don't want to end up
> * anywhere unexpected with PCPU set to something strange
> @@ -337,14 +379,22 @@
> */
> oldmsr = intr_disable();
>
> + *ofw_name_history_pos = '#'; /* HACK */
> +
> ofw_sprg_prepare();
>
> + *ofw_name_history_pos = '$'; /* HACK */
> +
> /* Save trap vectors */
> ofw_save_trap_vec(save_trap_of);
>
> + *ofw_name_history_pos = '%'; /* HACK */
> +
> /* Restore initially saved trap vectors */
> ofw_restore_trap_vec(save_trap_init);
>
> + *ofw_name_history_pos = '^'; /* HACK */
> +
> #if defined(AIM) && !defined(__powerpc64__)
> /*
> * Clear battable[] translations
> @@ -357,13 +407,21 @@
>
> result = ofwcall(args);
>
> + *ofw_name_history_pos = '&'; /* HACK */
> +
> /* Restore trap vecotrs */
> ofw_restore_trap_vec(save_trap_of);
>
> + *ofw_name_history_pos = '*'; /* HACK */
> +
> ofw_sprg_restore();
>
> + *ofw_name_history_pos = '~'; /* HACK */
> +
> intr_restore(oldmsr);
>
> + *ofw_name_history_pos = '!'; /* HACK */
> +
> return (result);
> }
>
>
>
>
>
> ===
> Mark Millard
> markmi at dsl-only.net
>
More information about the freebsd-ppc
mailing list