Strange panic on ppc64
Super Bisquit
superbisquit at gmail.com
Tue Jun 11 03:33:11 UTC 2013
1. Is the Open Firmware located on the CPU itself or is it part of the
logic board?
2. Would he be able to flash a newer version from a remote machine?
The answers to these two questions may help myself and others in the near
future.
Thanks,
Desmond
On Mon, Jun 10, 2013 at 9:31 AM, Nathan Whitehorn <nwhitehorn at freebsd.org>wrote:
> On 06/10/13 08:20, Nathan Whitehorn wrote:
> > On 06/09/13 16:21, Justin Hibbits wrote:
> >> On Sun, Jun 9, 2013 at 8:47 AM, Nathan Whitehorn
> >> <nwhitehorn at freebsd.org <mailto:nwhitehorn at freebsd.org>> wrote:
> >>
> >> On 06/08/13 17:33, Justin Hibbits wrote:
> >>>
> >>>
> >>> On Sat, Jun 8, 2013 at 7:54 AM, Nathan Whitehorn
> >>> <nwhitehorn at freebsd.org <mailto:nwhitehorn at freebsd.org>> wrote:
> >>>
> >>> On 06/08/13 09:21, Justin Hibbits wrote:
> >>>>
> >>>>
> >>>> On Wed, Jun 5, 2013 at 9:47 AM, Justin Hibbits
> >>>> <jhibbits at freebsd.org <mailto:jhibbits at freebsd.org>> wrote:
> >>>>
> >>>> Will do, when I get it panicking again.
> >>>>
> >>>> - Justin
> >>>>
> >>>> On Jun 5, 2013 9:46 AM, "Nathan Whitehorn"
> >>>> <nwhitehorn at freebsd.org <mailto:nwhitehorn at freebsd.org>>
> >>>> wrote:
> >>>>
> >>>> On 06/04/13 22:35, Justin Hibbits wrote:
> >>>>
> >>>> After a string of seemingly random hangs, I
> >>>> added invariants (but not
> >>>> witness) to my custom kernel config, and I get
> >>>> the following panic,
> >>>> recreated from a fuzzy cell phone picture:
> >>>>
> >>>>
> >>>> [thread pid -1 tid 1006665719 ]
> >>>> Stopped at 0: illegal instruction 0
> >>>> db> panic: mutex ohci1 owned at
> >>>>
> /usr/home/chmeee/freebsd/head/sys/dev/usb/usb_transfer.c:2280
> >>>> cpuid = 0
> >>>> Uptime: 9h8m1s
> >>>> <my dump code>
> >>>> ...
> >>>> panic: msleep1
> >>>> cpu = 0
> >>>> KDB: enter: panic
> >>>> [ thread pid -1 tid 100665719 ]
> >>>> ....
> >>>>
> >>>> The first question I have is how the hell it got
> >>>> such a strange PID/TID,
> >>>> memory corruption my guess, something is
> >>>> stomping on the pcpu or something,
> >>>> and I think these hangs have only happened since
> >>>> I added a lot more memory
> >>>> (up to 12G from 4G, Andreas Tobler was seeing
> >>>> hangs as well), so it might
> >>>> be something in the moea64 pmap code, but that's
> >>>> pure speculation on my
> >>>> part. Then the other panic messages, owned
> >>>> mutex and panic in msleep1. I
> >>>> enabled more trace code, so hopefully the next
> >>>> time it panics I can collect
> >>>> better data.
> >>>>
> >>>> - Justin
> >>>> _______________________________________________
> >>>> freebsd-ppc at freebsd.org
> >>>> <mailto:freebsd-ppc at freebsd.org> mailing list
> >>>>
> http://lists.freebsd.org/mailman/listinfo/freebsd-ppc
> >>>> To unsubscribe, send any mail to
> >>>> "freebsd-ppc-unsubscribe at freebsd.org
> >>>> <mailto:freebsd-ppc-unsubscribe at freebsd.org>"
> >>>>
> >>>>
> >>>> Could you post the output from show reg? It looks
> >>>> like it tried to jump to a null pointer there.
> >>>> -Nathan
> >>>>
> >>>>
> >>>> Well, it's hard to do get that output, because I just hit
> >>>> that 'mutex owned' panic, and here's the backtrace:
> >>>
> >>> The mutex thing is spurious -- it was already panicing and
> >>> then paniced again trying to panic. Can you get the backtrace
> >>> for the original panic (it should be different) and the
> >>> values of the registers?
> >>> -Nathan
> >>>
> >>>
> >>> Here you go:
> >>>
> >>> [ thread pid -1 tid 1006665719 ]
> >>> Stopped at 0: illegal instruction 0
> >>> db:0:kdb.enter.default> show reg
> >>> r0 0
> >>> r1 0
> >>> r2 0xab63d0 M_MACTEMP
> >>> r3 0xbb12e0
> >>> r4 0x741f18 .ofwcall+0xa8
> >>> r5 0
> >>> r6 0xa4f1a8
> >>> r7 0x1
> >>> r8 0x1
> >>> r9 0xc10500 __pcpu
> >>> r10 0x1c35ec0
> >>> r11 0
> >>> r12 0x2000d032
> >>> r13 0x342eb000
> >>> r14 0x10014200
> >>> r15 0xffffffffffffcb58
> >>> r16 0x2
> >>> r17 0x2
> >>> r18 0xffffffffffffcb50
> >>> r19 0
> >>> r20 0xc000000013231478
> >>> r21 0xc00000014c0ce200
> >>> r22 0
> >>> r23 0x64 dbsize+0x10
> >>> r24 0xc00000014c0cdf70
> >>> r25 0xb62cb8 smp_no_rendevous_barrier
> >>> r26 0
> >>> r27 0x741f18 .ofwcall+0xa8
> >>> r28 0x741f18 .ofwcall+0xa8
> >>> r29 0x2000d032
> >>> r30 0x9000000000001032
> >>> r31 0xc0cad8 mac_labeled
> >>> srr0 0x102ca4 k_trap+0x28
> >>> srr1 0x9000000000001032
> >>> lr 0x102c74 u_trap+0x10
> >>> ctr 0xff846d78
> >>> cr 0x2000f1b0
> >>> xer 0
> >>> dar 0xfffffffffffffd60
> >>> dsisr 0x42000000
> >>> 0: illegal instruction 0
> >>> db:0:kdb.enter.default> bt
> >>> Tracing pid -1 tid 1006665719 td 0
> >>> (nothing)
> >> Well, that is all kinds of messed up. It appears to have halted
> >> while handling a userland trap due to an implicit branch caused by
> >> bad translations when it restores the kernel SRs. Could you see
> >> what 'show pcpu' does? Does that information look valid at all? I
> >> suspect it has become corrupted somehow.
> >> -Nathan
> >>
> >>
> >> Here's the full log from dconschat, from bootup to panic.
> >> Unfortunately, not everything I wanted to print would print, and I
> >> can't type anything once it panics, because it panics when reading the
> >> keyboard, so I have to add everything as a ddb enter script. Here's
> >> what I've added so far (doesn't do everything as you can see from the
> >> transcript):
> >>
> >> script kdb.enter.default=show reg; bt; show pcpu; ps; run
> >> lockinfo; alltrace; show all procs; show files; show malloc; show
> >> allchains
> >>
> >> - Justin
> > This is now getting interesting. Reading the tea leaves, what has
> > happened is that the kernel has called into Open Firmware. Open Firmware
> > has then crashed early on, before setting up its own trap handlers,
> > which has then flung you back into FreeBSD's handlers with a totally
> > bogus environment, causing a second panic, which then causes a *third*
> > panic when trying to acquire a lock. It would be interesting to know
> > what the OF environment looked like and what commands it was trying to
> > execute (in r3), but that may be tricky to get...
> > -Nathan
> > _______________________________________________
>
> One other point: you can trace this pretty easily by just putting
> something like:
>
> if (pmap_bootstrapped) printf("Open Firmware call %p\n", args);
>
> in the top of openfirmware(). If I understood the debugger output
> correctly, something should be making a firmware call immediately before
> the crash.
>
> As a random guess about what is happening, it is possible OF is trying
> to allocate memory for itself. We just ignore the possibility that it
> might want to do that at present, but that is not necessarily a good
> assumption.
> -Nathan
> _______________________________________________
> freebsd-ppc at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-ppc
> To unsubscribe, send any mail to "freebsd-ppc-unsubscribe at freebsd.org"
>
More information about the freebsd-ppc
mailing list