Problems with BCE network adapter (Dell PE2950)
Tom Judge
tom at tomjudge.com
Thu Jun 28 14:29:34 UTC 2007
Dave,
Sorry for the top post, but I have just managed to repeat is exact crash
twice on a new PE 1950 system. I have core files available.
It seems that after a couple of reboots the problem goes away. The
system actually crashed 4 times but 2 of the cores where corrupt.
It also seems that the system will be stable if the following message is
not produced shortly after /etc/rc.d/netif start:
bce0: /usr/src/sys/dev/bce/if_bce.c(3489): Too many free rx_bd (0xFFF9 >
0x01FE)!
I have attached the chip information bellow.
Any help with this would be appreciated as we now have 21 systems
PE[12]950 systems which randomly crash due to the original error
bce0: discard frame w/o leading ethernet header (len 4294967292 pkt len
4294967292)
Tom
PE 2950 Chips:
bce0 at pci9:0:0: class=0x020000 card=0x01b21028 chip=0x164c14e4 rev=0x11
hdr=0x00
vendor = 'Broadcom Corporation'
class = network
subclass = ethernet
--
bce1 at pci5:0:0: class=0x020000 card=0x01b21028 chip=0x164c14e4 rev=0x11
hdr=0x00
vendor = 'Broadcom Corporation'
class = network
subclass = ethernet
PE1950 Chips:
bce0 at pci9:0:0: class=0x020000 card=0x01b31028 chip=0x164c14e4 rev=0x12
hdr=0x00
vendor = 'Broadcom Corporation'
class = network
subclass = ethernet
--
bce1 at pci5:0:0: class=0x020000 card=0x01b31028 chip=0x164c14e4 rev=0x12
hdr=0x00
vendor = 'Broadcom Corporation'
class = network
subclass = ethernet
Tom Judge wrote:
> David Christensen wrote:
>> Tom,
>>
>> There's already some debug code to watch for unusual size packets.
>> If you can recompile the driver from HEAD with the attached diffs
>> we can printout the first 128 bytes of any unusual sized packets.
>>
>> This does enabled other debugging code so performance will drop
>> but that should be OK since this doesn't present as a performance
>> problem.
>>
>> Dave
>>
> <SNIP/>
> I am currently running the driver from RELENG_6 (With the MSI code
> backed out and your patch applied by hand) on a 6.2-p5 amd64 system
> (Dell PE2950) and have managed to get the following crash.
>
> The crash was caused by "cat * >/dev/null" in an NFS mounted directory.
>
> I'm not sure if this is the same crash but some other boxes (identical)
> to this one have crashed first time they are rebooted with the new
> driver. Unfortunately I have not managed to get a dump from one of these
> crashes yet.
>
> Also I am seeing a lot of these messages on boxes running this driver:
>
> bce0: bce_rx_intr(): Invalid TCP/UDP checksum = 0xD0F5!
>
> It seems to be caused by NFS traffic.
>
> I still have the core file if you need any more information.
>
> Tom
>
> kgdb /usr/obj/usr/src/sys/PE2950/kernel.debug vmcore.0
> [GDB will not be able to debug user-mode threads:
> /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd".
>
> Unread portion of the kernel message buffer:
> bce0: bce_rx_intr(): Invalid TCP/UDP checksum = 0xD0F5!
> bce0: bce_rx_intr(): Invalid TCP/UDP checksum = 0x2F0A!
> <SNIP LOTS OF THESE ERRORS>
> bce0: /usr/src/sys/dev/bce/if_bce.c(3489): Too many free rx_bd (0xFFFB >
> 0x01FE)!
> bce0: bce_rx_intr(): Invalid TCP/UDP checksum = 0xF043!
> bce0: /usr/src/sys/dev/bce/if_bce.c(3489): Too many free rx_bd (0xFFF9 >
> 0x01FE)!
> bce0: bce_rx_intr(): Invalid TCP/UDP checksum = 0x8C5F!
> bce0: /usr/src/sys/dev/bce/if_bce.c(3973): Unexpected mbuf found in
> rx_bd[0x005A]!
> bce0: ---------------------------- Driver State
> ----------------------------
> bce0: 0xFFFFFFFF:8B92A000 - (sc) driver softc structure virtual address
> bce0: 0xFFFFFF00:F4000000 - (sc->bce_vhandle) PCI BAR virtual address
> bce0: 0xFFFFFF00:009E3680 - (sc->status_block) status block virtual address
> bce0: 0xFFFFFF00:009D6400 - (sc->stats_block) statistics block virtual
> address
> bce0: 0xFFFFFFFF:8B92A1B0 - (sc->tx_bd_chain) tx_bd chain virtual adddress
> bce0: 0xFFFFFFFF:8B92A1E8 - (sc->rx_bd_chain) rx_bd chain virtual address
> bce0: 0xFFFFFFFF:8B92B260 - (sc->tx_mbuf_ptr) tx mbuf chain virtual address
> bce0: 0xFFFFFFFF:8B92D260 - (sc->rx_mbuf_ptr) rx mbuf chain virtual address
> bce0: 0x0000357F - (sc->interrupts_generated) h/w intrs
> bce0: 0x00002981 - (sc->rx_interrupts) rx interrupts handled
> bce0: 0x0000212A - (sc->tx_interrupts) tx interrupts handled
> bce0: 0x0000706B - (sc->last_status_idx) status block index
> bce0: 0x0000675E - (sc->tx_prod) tx producer index
> bce0: 0x00006707 - (sc->tx_cons) tx consumer index
> bce0: 0x001B39EA - (sc->tx_prod_bseq) tx producer bseq index
> bce0: 0x0000F25C - (sc->rx_prod) rx producer index
> bce0: 0x0000F059 - (sc->rx_cons) rx consumer index
> bce0: 0x0B850C00 - (sc->rx_prod_bseq) rx producer bseq index
> bce0: 0x000000AB - (sc->rx_mbuf_alloc) rx mbufs allocated
> bce0: 0x0000FFF8 - (sc->free_rx_bd) free rx_bd's
> bce0: 0x00000000/000001FE - (sc->rx_low_watermark) rx low watermark
> bce0: 0x0000001D - (sc->txmbuf_alloc) tx mbufs allocated
> bce0: 0x000000AB - (sc->rx_mbuf_alloc) rx mbufs allocated
> bce0: 0x00000057 - (sc->used_tx_bd) used tx_bd's
> bce0: 0x000001FE/000001FE - (sc->tx_hi_watermark) tx hi watermark
> bce0: 0x00000000 - (sc->mbuf_alloc_failed) failed mbuf alloc
> bce0:
> ------------------------------------------------------------------------
> bce0: ---------------------------- Status Block
> ----------------------------
> bce0: attn_bits = 0x00000001, attn_bits_ack = 0x00000001, index = 0x70BF
> bce0: rx_cons0 = 0x0000F061, tx_cons0 = 0x0000675E
> bce0: status_idx = 0x70BF
> bce0:
> ------------------------------------------------------------------------
>
>
> Fatal trap 3: breakpoint instruction fault while in kernel mode
> cpuid = 4; apic id = 04
> instruction pointer = 0x8:0xffffffff801ee956
> stack pointer = 0x10:0xffffffffb6d60b40
> frame pointer = 0x10:0x5a
> code segment = base 0x0, limit 0xfffff, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags = interrupt enabled, IOPL = 0
> current process = 27 (irq16: bce0 bce1)
> trap number = 3
> panic: breakpoint instruction fault
> cpuid = 4
> Uptime: 3m10s
> Dumping 8191 MB (3 chunks)
> chunk 0: 1MB (156 pages) ... ok
> chunk 1: 3327MB (851624 pages) 3311 3295 3279 3263 3247 3231 3215 3199
> 3183 31
> <SNIP>
> #0 doadump () at pcpu.h:172
> 172 pcpu.h: No such file or directory.
> in pcpu.h
> (kgdb) bt
> #0 doadump () at pcpu.h:172
> #1 0x0000000000000004 in ?? ()
> #2 0xffffffff8029e0e7 in boot (howto=260) at
> /usr/src/sys/kern/kern_shutdown.c:409
> #3 0xffffffff8029e781 in panic (fmt=0xffffff021ef0a4c0
> "?\206?\036\002?????\036\002???") at /usr/src/sys/kern/kern_shutdown.c:565
> #4 0xffffffff803f9e3f in trap_fatal (frame=0xffffff021ef0a4c0,
> eva=18446742983307069104) at /usr/src/sys/amd64/amd64/trap.c:660
> #5 0xffffffff803fa2e2 in trap (frame=
> {tf_rdi = 0, tf_rsi = -2139025408, tf_rdx = 1, tf_rcx = 1915683,
> tf_r8 = 1048064, tf_r9 = 10, tf_rax = 79, tf_rbx = -1953325056, tf_rbp =
> 90, tf_r10 = -1227486624, tf_r11 = 4294967208, tf_r12 = -1953325056,
> tf_r13 = 90, tf_r14 = 61537, tf_r15 = 61530, tf_trapno = 3, tf_addr = 0,
> tf_flags = -1099501259136, tf_err = 0, tf_rip = -2145457834, tf_cs = 8,
> tf_rflags = 642, tf_rsp = -1227486384, tf_ss = 16}) at
> /usr/src/sys/amd64/amd64/trap.c:469
> #6 0xffffffff803e55fb in calltrap () at
> /usr/src/sys/amd64/amd64/exception.S:168
> #7 0xffffffff801ee956 in bce_breakpoint (sc=0xffffffff8b92a000) at
> cpufunc.h:63
> #8 0xffffffff801ef0f6 in bce_intr (xsc=0x0) at
> /usr/src/sys/dev/bce/if_bce.c:3970
> #9 0xffffffff80284919 in ithread_loop (arg=0xffffff00009e4000) at
> /usr/src/sys/kern/kern_intr.c:682
> #10 0xffffffff802830b7 in fork_exit (callout=0xffffffff802847d0
> <ithread_loop>, arg=0xffffff00009e4000, frame=0xffffffffb6d60c50) at
> /usr/src/sys/kern/kern_fork.c:821
> #11 0xffffffff803e595e in fork_trampoline () at
> /usr/src/sys/amd64/amd64/exception.S:394
> #12 0x0000000000000000 in ?? ()
> #13 0x0000000000000000 in ?? ()
> #14 0x0000000000000001 in ?? ()
> #15 0x0000000000000000 in ?? ()
> #16 0x0000000000000000 in ?? ()
> #17 0x0000000000000000 in ?? ()
> #18 0x0000000000000000 in ?? ()
> #19 0x0000000000000000 in ?? ()
> <SNIP LOTS OF 0 FRAMES>
> #44 0x00000000007f3000 in ?? ()
> #45 0xffffff021ef286b0 in ?? ()
> #46 0x0000000000000104 in ?? ()
> #47 0x0000000000000000 in ?? ()
> #48 0xffffff021ef286b0 in ?? ()
> #49 0xffffff021ef68000 in ?? ()
> #50 0xffffffffb6d60848 in ?? ()
> #51 0xffffff021ef0a4c0 in ?? ()
> #52 0xffffffff802b4856 in sched_switch (td=0xffffff00009e4000,
> newtd=0x0, flags=0) at /usr/src/sys/kern/sched_4bsd.c:973
> <SNIP LOTS OF 0 FRAMES>
> #124 0x0000000000000000 in ?? ()
> Cannot access memory at address 0xffffffffb6d61000
> (kgdb) frame 8
> #8 0xffffffff801ef0f6 in bce_intr (xsc=0x0) at
> /usr/src/sys/dev/bce/if_bce.c:3970
> 3970 DBRUNIF((!(rxbd->rx_bd_flags &
> RX_BD_FLAGS_END)),
> (kgdb) list
> 3965
> 3966 /* The mbuf is stored with the last rx_bd entry
> of a packet. */
> 3967 if (sc->rx_mbuf_ptr[sw_chain_cons] != NULL) {
> 3968
> 3969 /* Validate that this is the last rx_bd. */
> 3970 DBRUNIF((!(rxbd->rx_bd_flags &
> RX_BD_FLAGS_END)),
> 3971 BCE_PRINTF("%s(%d): Unexpected
> mbuf found in rx_bd[0x%04X]!\n",
> 3972 __FILE__, __LINE__, sw_chain_cons);
> 3973 bce_breakpoint(sc));
> 3974
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
More information about the freebsd-net
mailing list