re(4) unaligned panic on -current

Fri Dec 30 05:36:29 PST 2005

On Friday 30 December 2005 05:00 am, Ruslan Ermilov wrote:
> On Thu, Dec 29, 2005 at 11:40:17AM -0500, John Baldwin wrote:
> > On Wednesday 28 December 2005 11:49 pm, Bernd Walter wrote:
> > > On Wed, Dec 28, 2005 at 11:01:47PM -0500, John Baldwin wrote:
> > > > On Dec 28, 2005, at 11:35 AM, Bernd Walter wrote:
> > > > >The same card works fine on an AS4100 running 5.4-STABLE.
> > > > >
> > > > >Booting [/boot/kernel/kernel]...
> > > > >Entering /boot/kernel/kernel at 0xfffffc000033bf00...
> > > > > ...
> > > > >re0: <RealTek 8169S Single-chip Gigabit Ethernet> port
> > > > >0x11000-0x110ff mem 0x80320000-0x803200ff irq 0 at device 11.0 on
> > > > > pci0 miibus1: <MII bus> on re0
> > > > >rgephy0: <RTL8169S/8110S media interface> on miibus1
> > > > >rgephy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX,
> > > > >1000baseTX, 1000baseTX-FDX, auto
> > > > >re0: Ethernet address: 00:40:f4:d0:8d:eb
> > > > >
> > > > >fatal kernel trap:
> > > > >
> > > > >    trap entry     = 0x4 (unaligned access fault)
> > > > >    cpuid          = 0
> > > > >    faulting va    = 0xfffffc00008a472b
> > > > >    opcode         = 0x28
> > > > >    register       = 0x12
> > > > >    pc             = 0xfffffc00003b0608
> > > > >    ra             = 0xfffffc00003b05cc
> > > > >    sp             = 0xfffffc00007339d0
> > > > >    usp            = 0x0
> > > > >    curthread      = 0xfffffc000068b008
> > > > >        pid = 0, comm = swapper
> > > > >
> > > > >[thread pid 0 tid 0 ]
> > > > >Stopped at      re_init_locked+0xd8:    jsr     ra,
> > > > >(pv),re_init_locked+0xdc
> > > > ><ra=0xfffffc00003b05cc,pv=0xfffffc00005d2dd0>
> > > > >db> bt
> > > > >Tracing pid 0 tid 0 td 0xfffffc000068b008
> > > > >re_init_locked() at re_init_locked+0xd8
> > > > >re_diag() at re_diag+0x178
> > > >
> > > > My first guess would be Ruslan's IF_LLADDR changes.  If so, you can
> > > > try doing a bcopy to a char array as a workaround similar to the
> > > > recent changes to de(4) and dc(4) to fix similar panics on Alpha.  It
> > > > might be something else though.  If you could pull up gdb on your
> > > > kernel.debug and do 'l *re_init_locked+0xd8' to see what file/line
> > > > that corresponds to that would be helpful.
> > >
> > > Your guess looks right - will try your bcopy suggestion.
> > >
> > > [54]cicely12# gdb kernel.debug
> > > GNU gdb 6.1.1 [FreeBSD]
> > > Copyright 2004 Free Software Foundation, Inc.
> > > GDB is free software, covered by the GNU General Public License, and
> > > you are welcome to change it and/or distribute copies of it under
> > > certain conditions. Type "show copying" to see the conditions.
> > > There is absolutely no warranty for GDB.  Type "show warranty" for
> > > details. This GDB was configured as "alpha-marcel-freebsd"...
> > > (gdb) l *re_init_locked+0xd8
> > > 0xfffffc00003b0608 is in re_init_locked (../../../dev/re/if_re.c:2127).
> > > 2122             * Init our MAC address.  Even though the chipset
> > > 2123             * documentation doesn't mention it, we need to enter
> > > "Config 2124             * register write enable" mode to modify the ID
> > > registers. 2125             */
> > > 2126            CSR_WRITE_1(sc, RL_EECMD, RL_EEMODE_WRITECFG);
> > > 2127            CSR_WRITE_STREAM_4(sc, RL_IDR0,
> > > 2128                *(u_int32_t *)(&IF_LLADDR(sc->rl_ifp)[0]));
> > > 2129            CSR_WRITE_STREAM_4(sc, RL_IDR4,
> > > 2130                *(u_int32_t *)(&IF_LLADDR(sc->rl_ifp)[4]));
> > > 2131            CSR_WRITE_1(sc, RL_EECMD, RL_EEMODE_OFF);
> > > (gdb)
> >
> > Hmm, even worse is that IF_LLADDR() might not be valid yet since from the
> > trace it looked like re_diag() was being called from re_attach() and thus
> > likely before ether_ifattach().  You'll have to somehow get the copy of
> > the MAC address via the softc if this function is called too early like I
> > did for de(4).  Try the bcopy first though.
>
> What do you mean?
>
>    1236         /*
>    1237          * Call MI attach routine.
>    1238          */
>    1239         ether_ifattach(ifp, eaddr);
>    1240
>    1241         /* Perform hardware diagnostic. */
>    1242         error = re_diag(sc);

Ah, most drivers call ether_ifattach() last since otherwise you open yourself 
up to race conditions since, e.g. a user thread could have done an ifconfig 
up before this thread returns from ether_ifattach() to call re_diag().

-- 
John Baldwin <jhb at FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org