Strange panic on ppc64

Tue Jun 11 03:33:11 UTC 2013

1. Is the Open Firmware located on the CPU itself or is it part of the
logic board?
2. Would he be able to flash a newer version from a remote machine?

The answers to these two questions may help myself and others in the near
future.

Thanks,
Desmond

On Mon, Jun 10, 2013 at 9:31 AM, Nathan Whitehorn <nwhitehorn at freebsd.org>wrote:

> On 06/10/13 08:20, Nathan Whitehorn wrote:
> > On 06/09/13 16:21, Justin Hibbits wrote:
> >> On Sun, Jun 9, 2013 at 8:47 AM, Nathan Whitehorn
> >> <nwhitehorn at freebsd.org <mailto:nwhitehorn at freebsd.org>> wrote:
> >>
> >>     On 06/08/13 17:33, Justin Hibbits wrote:
> >>>
> >>>
> >>>     On Sat, Jun 8, 2013 at 7:54 AM, Nathan Whitehorn
> >>>     <nwhitehorn at freebsd.org <mailto:nwhitehorn at freebsd.org>> wrote:
> >>>
> >>>         On 06/08/13 09:21, Justin Hibbits wrote:
> >>>>
> >>>>
> >>>>         On Wed, Jun 5, 2013 at 9:47 AM, Justin Hibbits
> >>>>         <jhibbits at freebsd.org <mailto:jhibbits at freebsd.org>> wrote:
> >>>>
> >>>>             Will do, when I get it panicking again.
> >>>>
> >>>>             - Justin
> >>>>
> >>>>             On Jun 5, 2013 9:46 AM, "Nathan Whitehorn"
> >>>>             <nwhitehorn at freebsd.org <mailto:nwhitehorn at freebsd.org>>
> >>>>             wrote:
> >>>>
> >>>>                 On 06/04/13 22:35, Justin Hibbits wrote:
> >>>>
> >>>>                     After a string of seemingly random hangs, I
> >>>>                     added invariants (but not
> >>>>                     witness) to my custom kernel config, and I get
> >>>>                     the following panic,
> >>>>                     recreated from a fuzzy cell phone picture:
> >>>>
> >>>>
> >>>>                     [thread pid -1 tid 1006665719 ]
> >>>>                     Stopped at 0: illegal instruction 0
> >>>>                     db> panic: mutex ohci1 owned at
> >>>>
> /usr/home/chmeee/freebsd/head/sys/dev/usb/usb_transfer.c:2280
> >>>>                     cpuid = 0
> >>>>                     Uptime: 9h8m1s
> >>>>                     <my dump code>
> >>>>                     ...
> >>>>                     panic: msleep1
> >>>>                     cpu = 0
> >>>>                     KDB: enter: panic
> >>>>                     [ thread pid -1 tid 100665719 ]
> >>>>                     ....
> >>>>
> >>>>                     The first question I have is how the hell it got
> >>>>                     such a strange PID/TID,
> >>>>                     memory corruption my guess, something is
> >>>>                     stomping on the pcpu or something,
> >>>>                     and I think these hangs have only happened since
> >>>>                     I added a lot more memory
> >>>>                     (up to 12G from 4G, Andreas Tobler was seeing
> >>>>                     hangs as well), so it might
> >>>>                     be something in the moea64 pmap code, but that's
> >>>>                     pure speculation on my
> >>>>                     part.  Then the other panic messages, owned
> >>>>                     mutex and panic in msleep1.  I
> >>>>                     enabled more trace code, so hopefully the next
> >>>>                     time it panics I can collect
> >>>>                     better data.
> >>>>
> >>>>                     - Justin
> >>>>                     _______________________________________________
> >>>>                     freebsd-ppc at freebsd.org
> >>>>                     <mailto:freebsd-ppc at freebsd.org> mailing list
> >>>>
> http://lists.freebsd.org/mailman/listinfo/freebsd-ppc
> >>>>                     To unsubscribe, send any mail to
> >>>>                     "freebsd-ppc-unsubscribe at freebsd.org
> >>>>                     <mailto:freebsd-ppc-unsubscribe at freebsd.org>"
> >>>>
> >>>>
> >>>>                 Could you post the output from show reg? It looks
> >>>>                 like it tried to jump to a null pointer there.
> >>>>                 -Nathan
> >>>>
> >>>>
> >>>>         Well, it's hard to do get that output, because I just hit
> >>>>         that 'mutex owned' panic, and here's the backtrace:
> >>>
> >>>         The mutex thing is spurious -- it was already panicing and
> >>>         then paniced again trying to panic. Can you get the backtrace
> >>>         for the original panic (it should be different) and the
> >>>         values of the registers?
> >>>         -Nathan
> >>>
> >>>
> >>>     Here you go:
> >>>
> >>>     [ thread pid -1 tid 1006665719 ]
> >>>     Stopped at      0:      illegal instruction 0
> >>>     db:0:kdb.enter.default> show reg
> >>>     r0                   0
> >>>     r1                   0
> >>>     r2            0xab63d0  M_MACTEMP
> >>>     r3            0xbb12e0
> >>>     r4            0x741f18  .ofwcall+0xa8
> >>>     r5                   0
> >>>     r6            0xa4f1a8
> >>>     r7                 0x1
> >>>     r8                 0x1
> >>>     r9            0xc10500  __pcpu
> >>>     r10          0x1c35ec0
> >>>     r11                  0
> >>>     r12         0x2000d032
> >>>     r13         0x342eb000
> >>>     r14         0x10014200
> >>>     r15         0xffffffffffffcb58
> >>>     r16                0x2
> >>>     r17                0x2
> >>>     r18         0xffffffffffffcb50
> >>>     r19                  0
> >>>     r20         0xc000000013231478
> >>>     r21         0xc00000014c0ce200
> >>>     r22                  0
> >>>     r23               0x64  dbsize+0x10
> >>>     r24         0xc00000014c0cdf70
> >>>     r25           0xb62cb8  smp_no_rendevous_barrier
> >>>     r26                  0
> >>>     r27           0x741f18  .ofwcall+0xa8
> >>>     r28           0x741f18  .ofwcall+0xa8
> >>>     r29         0x2000d032
> >>>     r30         0x9000000000001032
> >>>     r31           0xc0cad8  mac_labeled
> >>>     srr0          0x102ca4  k_trap+0x28
> >>>     srr1        0x9000000000001032
> >>>     lr            0x102c74  u_trap+0x10
> >>>     ctr         0xff846d78
> >>>     cr          0x2000f1b0
> >>>     xer                  0
> >>>     dar         0xfffffffffffffd60
> >>>     dsisr       0x42000000
> >>>     0:      illegal instruction 0
> >>>     db:0:kdb.enter.default>  bt
> >>>     Tracing pid -1 tid 1006665719 td 0
> >>>     (nothing)
> >>     Well, that is all kinds of messed up. It appears to have halted
> >>     while handling a userland trap due to an implicit branch caused by
> >>     bad translations when it restores the kernel SRs. Could you see
> >>     what 'show pcpu' does? Does that information look valid at all? I
> >>     suspect it has become corrupted somehow.
> >>     -Nathan
> >>
> >>
> >> Here's the full log from dconschat, from bootup to panic.
> >> Unfortunately, not everything I wanted to print would print, and I
> >> can't type anything once it panics, because it panics when reading the
> >> keyboard, so I have to add everything as a ddb enter script.  Here's
> >> what I've added so far (doesn't do everything as you can see from the
> >> transcript):
> >>
> >>     script kdb.enter.default=show reg; bt; show pcpu; ps; run
> >> lockinfo; alltrace; show all procs; show files; show malloc; show
> >> allchains
> >>
> >> - Justin
> > This is now getting interesting. Reading the tea leaves, what has
> > happened is that the kernel has called into Open Firmware. Open Firmware
> > has then crashed early on, before setting up its own trap handlers,
> > which has then flung you back into FreeBSD's handlers with a totally
> > bogus environment, causing a second panic, which then causes a *third*
> > panic when trying to acquire a lock. It would be interesting to know
> > what the OF environment looked like and what commands it was trying to
> > execute (in r3), but that may be tricky to get...
> > -Nathan
> > _______________________________________________
>
> One other point: you can trace this pretty easily by just putting
> something like:
>
> if (pmap_bootstrapped) printf("Open Firmware call %p\n", args);
>
> in the top of openfirmware(). If I understood the debugger output
> correctly, something should be making a firmware call immediately before
> the crash.
>
> As a random guess about what is happening, it is possible OF is trying
> to allocate memory for itself. We just ignore the possibility that it
> might want to do that at present, but that is not necessarily a good
> assumption.
> -Nathan
> _______________________________________________
> freebsd-ppc at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-ppc
> To unsubscribe, send any mail to "freebsd-ppc-unsubscribe at freebsd.org"
>