AVILA getting close!
John-Mark Gurney
jmg at funkthat.com
Sat Jun 21 07:08:32 UTC 2014
John-Mark Gurney wrote this message on Fri, Jun 20, 2014 at 18:08 -0700:
> Andrew Turner wrote this message on Fri, Jun 20, 2014 at 20:08 +0100:
> > On Fri, 20 Jun 2014 08:10:24 -0700
> > John-Mark Gurney <jmg at funkthat.com> wrote:
> >
> > > John-Mark Gurney wrote this message on Wed, Jun 18, 2014 at 15:58
> > > -0700:
> > > > So, w/ the recent couple of patches that alc has provided, I no
> > > > longer receive kernel panics on my AVILA board!
> > > >
> > > > $ uname -a
> > > > FreeBSD avila.funkthat.com 11.0-CURRENT FreeBSD 11.0-CURRENT #27
> > > > r267333:267349M: Wed Jun 11 09:57:58 PDT 2014
> > > > jmg at carbon.funkthat.com:/usr/obj/arm.armeb/usr/src.avila/sys/AVILA
> > > > arm $ uptime 12:15AM up 1 day, 15 mins, 2 users, load averages:
> > > > 0.13, 0.11, 0.08
> > > >
> > > > This survived a portsnap extract... This is all over NFS...
> > > >
> > > > Though the issue that I'm now having is that some binaries
> > > > (newsyslog) and sometimes other binaries (awk, grep) core dump...
> > > >
> > > > I believe this is an issue w/ rtld, or related... If I compile
> > > > newsyslog -static, it works fine... Otherwise I get a SIGILL, and
> > > > that is because it jumps off into the weeds.. Though gdb on arm
> > > > isn't very useful..
> > >
> > > ok, so the SIGILL only occures under gdb, and this is because single
> > > stepping into a RAS sequence doesn't work very well... If you set a
> > > break point on the return (after the RAS sequence), you can get past
> > > this...
> > >
> > > I got to the point in rtld.c code:
> > > if (obj->pltrel)
> > > rel = (const Elf_Rel *) ((caddr_t) obj->pltrel + reloff);
> > > else
> > > rel = (const Elf_Rel *) ((caddr_t) obj->pltrela + reloffand
> > > was seeing gdb try to execute the pltrela line, but: i;
> > >
> > > and was seeing gdb try to execute the pltrela line, but:
> > > (gdb) print * (const Elf_Rel *) ((caddr_t) obj->pltrela + reloff)
> > > Error accessing memory address 0x118: Bad address.
> > > (gdb) print/x obj->pltrela
> > > $4 = 0x0
> > > (gdb) print /x reloff
> > > $5 = 0x118
> > > (gdb) print obj->pltrel
> > > $6 = (const Elf_Rel *) 0x94e8
> > Based on my copy of newsyslog I built for armeb this looks correct. To
> > verify it could you dump the .dynamic section from the binary?
> > Something like 'objdump -s newsyslog' will get it.
>
> ok, available at:
> https://www.funkthat.com/~jmg/20140619/objdump.newsyslog
>
> > > Hun? obj->pltrel is non-zero, so it should have executed the other
> > > line...
> > >
> > > I recompiled rtld w/ -O0, and sure enough, newsyslog runs fine... If
> > > I compile w/o -O, or w/ -O1, it fails...
> > >
> > > Comments or suggestions?
> >
> > What is the value of rel after the if statement? In the -O/-O1 case the
> > asm looks like:
> >
> > ldr r2, [sp, #20] ; Load obj to r2
> > ldr r3, [r2, #124] ; Load obj->pltrel to r3
> > cmp r3, #0 ; 0x0 ; if obj->pltrel:
> > ldrne r2, [sp, #16] ; != NULL: Load reloff to r2
> > addne r4, r3, r2 ; != NULL: Add obj->pltrel + reloff to r4
> > ldreq r2, [sp, #20] ; == NULL: Load obj to r2
> > ldreq r3, [r2, #132] ; == NULL: Load obj->pltrela to r3
> > ldreq r2, [sp, #16] ; == NULL: Load reloff to r2
> > addeq r4, r2, r3 ; == NULL: Add obj->pltrela + reloff to r4
> >
> > Given this I could see how gdb gets confused.
> >
> > It may also pay to get the registers from gdb at this point.
>
> Arg! This is frustrating, I'm getting such different behaviors from
> time to time.. now it isn't having that fault.. but it's getting
> farther, but... I this is because our in tree gdb is messed up..
>
> But, I am getting farther... now the last break at rtld.c:3651
> looks like it's returning a bogus pointer:
> (gdb) print *req
> $12 = {name = 0x9190 "__aeabi_read_tp", hash = 0xf008a80,
> hash_gnu = 0x52432dd3, ventry = 0x2003b1d0, flags = 0x1,
> defobj_out = 0x2003c400, sym_out = 0x20062454, lockstate = 0xbfffeda0}
>
> defobj_out looks bogus to me... We don't have any object mapped
> there:
> (gdb) info shared
> >From To Syms Read Shared Object Library
> 0x200427c8 0x20048814 Yes /lib/libgcc_s.so.1
> 0x2007a4e8 0x2017f320 Yes /lib/libc.so.7
> 0x20018f14 0x2002c99c Yes /libexec/ld-elf.so.1
>
> the data at 0x2003c400 doesn't look like code:
> (gdb) x/32x 0x2003c400
> 0x2003c400: 0xd550b87a 0x00000001 0x00000000 0x2003a080
> 0x2003c410: 0x00000000 0x00000001 0x00000000 0x20051000
> 0x2003c420: 0x0016a000 0x00143000 0x00000000 0x20051000
> 0x2003c430: 0x2019dfd0 0x2007a4e8 0x20051034 0x000000a0
> 0x2003c440: 0x00000000 0x00000007 0x00000002 0x2019bcf0
> 0x2003c450: 0x00000004 0x00000058 0x00000000 0x00000008
> 0x2003c460: 0x20051000 0x00000000 0x2019e0b8 0x200713e0
> 0x2003c470: 0x000040c0 0x00000000 0x00000000 0x200754a0
>
> and then as I stepi out of symlook_global:
> (gdb) x/6i $pc
> 0x2001f0b4 <symlook_global+348>: cmp r0, #0 ; 0x0
> 0x2001f0b8 <symlook_global+352>: moveq r0, #3 ; 0x3
> 0x2001f0bc <symlook_global+356>: movne r0, #0 ; 0x0
> 0x2001f0c0 <symlook_global+360>: add sp, sp, #36 ; 0x24
> 0x2001f0c4 <symlook_global+364>: pop {r4, r5, r6, r7, lr}
> 0x2001f0c8 <symlook_global+368>: bx lr
> (gdb) info registers
> r0 0x20062454 0x20062454
> r1 0x933b 0x933b
> r2 0x0 0x0
> r3 0xa4 0xa4
> r4 0x0 0x0
> r5 0xbfffed3c 0xbfffed3c
> r6 0xbfffed08 0xbfffed08
> r7 0x20037af4 0x20037af4
> r8 0x0 0x0
> r9 0x1 0x1
> r10 0x8a2c 0x8a2c
> r11 0xbfffed30 0xbfffed30
> r12 0x23de 0x23de
> sp 0xbfffec94 0xbfffec94
> lr 0x2001efb0 0x2001efb0
> pc 0x2001f0b4 0x2001f0b4
> fps 0x0 0x0
> cpsr 0x60000010 0x60000010
>
> Then stepi till 0x2001f0c8:
> (gdb) info registers
> r0 0x0 0x0
> r1 0x933b 0x933b
> r2 0x0 0x0
> r3 0xa4 0xa4
> r4 0x2003c000 0x2003c000
> r5 0xbfffed3c 0xbfffed3c
> r6 0x20037af4 0x20037af4
> r7 0xbfffece8 0xbfffece8
> r8 0x0 0x0
> r9 0x1 0x1
> r10 0x8a2c 0x8a2c
> r11 0xbfffed30 0xbfffed30
> r12 0x23de 0x23de
> sp 0xbfffeccc 0xbfffeccc
> lr 0x2003c000 0x2003c000
> pc 0x2001f0c8 0x2001f0c8
> fps 0x0 0x0
> cpsr 0x20000010 0x20000010
>
> and now the lr is bogus... it transfers control to 0x2003c000 which
> is before the fault at 0x2003c0f4... And again, this looks like data,
> not code:
> (gdb) x/64x 0x2003c000
> 0x2003c000: 0xd550b87a 0x00000001 0x2003c200 0xbfffffb8
> 0x2003c010: 0x00000000 0x00000001 0x00000000 0x00008000
> 0x2003c020: 0x00012000 0x00009000 0x00008000 0x00000000
> 0x2003c030: 0x00018724 0x00009ca8 0x00008034 0x000000e0
> 0x2003c040: 0x00008114 0x00000007 0x00000000 0x00000000
> 0x2003c050: 0x00000000 0x00000000 0x00000000 0x00000000
> 0x2003c060: 0x00000000 0x00000000 0x0001881c 0x000094a0
> 0x2003c070: 0x00000048 0x00000000 0x00000000 0x000094e8
> 0x2003c080: 0x00000308 0x00000000 0x00000000 0x0000888c
> 0x2003c090: 0x00008f9c 0x000003b1 0x00009430 0x00000002
> 0x2003c0a0: 0x00000000 0x00000000 0x0000934e 0x00008180
> 0x2003c0b0: 0x00000061 0x00008304 0x00000071 0x00000061
> 0x2003c0c0: 0x00000005 0x0000001f 0x0000000a 0x00000071
> 0x2003c0d0: 0x000084d8 0x00008558 0x000086c8 0x00000000
> 0x2003c0e0: 0x00000000 0x2003d000 0x00000000 0x00000000
> 0x2003c0f0: 0x00000000 0x2003c0f0 0x2003b180 0x00000007
>
> If I continue to stepi from here, it will fault at f4...
>
> This looks like a stack smash issue as the lr we pop off the stack
> is incorrect..
So, forgot I could set a watchpoint on the stack address, so I did:
(gdb) c
Continuing.
Watchpoint 9: *0xbfffecc8
Old value = 0x2001f584
New value = 0x2003c000
0x20019960 in donelist_check (dlp=0xbfffed08, obj=0x2003c000)
at /usr/src.avila/libexec/rtld-elf/rtld.c:1380
1380 dlp->objs[dlp->num_used++] = obj;
I tracked dlp, and it's the donelist pointer, and it is allocated
from symlook_default via donelist_init at the top of the function:
#define donelist_init(dlp) \
((dlp)->objs = alloca(obj_count * sizeof (dlp)->objs[0]), \
assert((dlp)->objs != NULL), \
(dlp)->num_alloc = obj_count, \
(dlp)->num_used = 0)
Hmm... since obj_count is a global, and we are in a possibly
threaded environment, we should probably move assigning num_alloc
before the alloca, and then use num_alloc instead of obj_count just
to be safe... but taht won't be the issue here, as num_used is 0
in my case...
I've looked at the assembly for donelist_init for both the working
and non-working case, and besides an extra store in the good version,
which I think is just because of the unoptimized code, things look
the same to me...
I guess the next step is to make sure that the Donelist pointer is
the same...
--
John-Mark Gurney Voice: +1 415 225 5579
"All that I will do, has been done, All that I have, has not."
More information about the freebsd-arm
mailing list