regression: msk0 watchdog timeout and interrupt storm
Curtis Villamizar
curtis at ipv6.occnc.com
Wed Jan 8 03:11:36 UTC 2014
In message <20140107084938.GA1361 at michelle.cdnetworks.com>
Yonghyeon PYUN writes:
> On Mon, Jan 06, 2014 at 10:20:40AM -0500, Curtis Villamizar wrote:
>
> [...]
>
> > Here are some relevant parts of dmesg. Is there anything else you want?
> >
> > real memory = 2147483648 (2048 MB)
> > avail memory = 2061438976 (1965 MB)
> > Event timer "LAPIC" quality 400
> > ACPI APIC Table: <LENOVO TC-9I >
> > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
> > FreeBSD/SMP: 1 package(s) x 2 core(s)
> > cpu0 (BSP): APIC ID: 0
> > cpu1 (AP): APIC ID: 1
> >
> > pcib2: <ACPI PCI-PCI bridge> irq 19 at device 7.0 on pci0
> > pci2: <ACPI PCI bus> on pcib2
> > on pci1
> > pcib2: <ACPI PCI-PCI bridge> irq 19 at device 7.0 on pci0
> > pci2: <ACPI PCI bus> on pcib2
> > mskc0: <Marvell Yukon 88E8057 Gigabit Ethernet> port 0xe800-0xe8ff mem
> > 0xfebfc000-0xfebfffff irq 19 at device 0.0 on pci2
> > msk0: <Marvell Technology Group Ltd. Yukon Ultra 2 Id 0xba Rev 0x00>
> > on mskc0
> > msk0: Ethernet address: c8:9c:dc:56:38:ef
> > miibus0: <MII bus> on msk0
> > e1000phy0: <Marvell 88E1149 Gigabit PHY> PHY 0 on miibus0
> > e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX,
> > 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master,
> > auto, auto-flow
> >
>
> Thank you for the info.
>
> > The computer is a Lenovo ThinkCenter (small tower) and not an uncommon
> > machine so others are likely to run into this.
> >
> > > > Please let me know what I could do to help debug this.
> > > >
> > >
> > > If you have more than 4GB memory, try reducing the amount of
> > > memory(e.g. 3G) in /boot/loader.conf and let me know whether that
> > > makes any difference for you.
> > > Note, in order to test this you have to back out your local
> > > changes.
> >
> > Only have 2 GB memory.
> >
>
> Ok, that means my wild guess was not right. :-(
>
>
> [...]
>
> > > I'm under the impression that the controller may have additional
> > > DMA addressing limitation where TX/RX and status LEs should have
> > > the same high DMA address. Due to the lack of documentation I'm
> > > not sure about that. If the issue does not happen with 3GB memory,
> > > we have to use 32bit DMA addressing.
> >
> > We have 2 GB memory so the problem with the original code does happen
> > with less than 4 GB memory. Everything has the same high address of
> > zero.
> >
>
> Right.
>
> > Is there anything else you want me to try?
>
> msk(4) uses 4KB alignment for status/TX/RX rings. Your local change
> will reduce the number of status LEs to be 1024. Stock msk(4) will
> use 2048 entries for status LEs and you said the cons variable is
> stuck with 1024 in this case. I have no idea this can happen at
> this moment.
> Did msk(4) ever work on your box? If the answer is yes, would you
> back out both r258780 and your local change?
This host worked for a few years under FreeBSD 8.x and FreeBSD 9.x,
most recently 9.2. I have other machines running stable_10 at about
the 10.0.beta3 vintage. I had mostly good luck building the ports I
use (except openoffice never seems to build).
I transferred a bunch of small stuff over after upgrading to
10.0.beta3 on this machine but then with the big move of a tar backup
through the GbE, it locked up consisitently.
I tried my patch and symptom gone.
> I have a small local diff which was made after seeing r258780. But
> I'm not sure whether it makes any difference.
So it seems what you want me to do is:
1. verify whether just backing out r258780 on if_mskreg.h fixes this.
2. if so, then put back r258780 and try your patch below and see if
that fixes it.
I think I can find some time to do this maybe immediately or at least
very soon. After doing that I will report back. Please stand by.
> > Curtis
> >
> > btw - I added someone from Marvell on the Bcc in case he wants to join
> > in on the conversation or give us a hint in private email.
>
> --ikeVEW9yuYc//A+q
> Content-Type: text/x-diff; charset=us-ascii
> Content-Disposition: attachment; filename="msk.type.diff"
>
> Index: sys/dev/msk/if_msk.c
> ===================================================================
> --- sys/dev/msk/if_msk.c (revision 260362)
> +++ sys/dev/msk/if_msk.c (working copy)
> @@ -3600,7 +3600,8 @@ msk_handle_events(struct msk_softc *sc)
> int rxput[2];
> struct msk_stat_desc *sd;
> uint32_t control, status;
> - int cons, len, port, rxprog;
> + int len, port, rxprog;
> + uint16_t cons;
>
> if (sc->msk_stat_cons == CSR_READ_2(sc, STAT_PUT_IDX))
> return (0);
> Index: sys/dev/msk/if_mskreg.h
> ===================================================================
> --- sys/dev/msk/if_mskreg.h (revision 260362)
> +++ sys/dev/msk/if_mskreg.h (working copy)
> @@ -2539,8 +2539,8 @@ struct msk_softc {
> bus_addr_t msk_stat_ring_paddr;
> int msk_int_holdoff;
> int msk_process_limit;
> - int msk_stat_cons;
> - int msk_stat_count;
> + uint16_t msk_stat_cons;
> + uint16_t msk_stat_count;
> struct mtx msk_mtx;
> };
More information about the freebsd-stable
mailing list