interrupt storm arge0, tplink 1043nd
Adrian Chadd
adrian at freebsd.org
Sat Jul 26 21:38:31 UTC 2014
So those interrupts are:
ar71xxreg.h:#define AR71XX_DMA_INTR 0x198
ar71xxreg.h:#define AR71XX_DMA_INTR_STATUS 0x19C
ar71xxreg.h:#define DMA_INTR_ALL ((1 << 8) - 1)
ar71xxreg.h:#define DMA_INTR_RX_BUS_ERROR (1 << 7)
ar71xxreg.h:#define DMA_INTR_RX_OVERFLOW (1 << 6)
ar71xxreg.h:#define DMA_INTR_RX_PKT_RCVD (1 << 4)
ar71xxreg.h:#define DMA_INTR_TX_BUS_ERROR (1 << 3)
ar71xxreg.h:#define DMA_INTR_TX_UNDERRUN (1 << 1)
ar71xxreg.h:#define DMA_INTR_TX_PKT_SENT (1 << 0)
.. so interrupt bit 4 is packet received.
So yeah, it going up is quite expected. but is it triggering the
storm? I'm not sure.
So the next thing is figuring out if this s causing the storm logic to
fire or not.
I'l go digging. Thanks!
-a
On 26 July 2014 13:29, Harm Weites <harm at weites.com> wrote:
> Oops, ofcourse it didn't work... After passing the correct argument
> (&sc->intr_status, instead of sc) I got answers.
>
> These are the results of three times sysctl, producing 4 lines per run
> (presumably 2 lines arge0 and 2 lines for the dumb arge1). First run
> took place after boot, second a while after that and third just after
> the storm.
>
> interrupt 1 count 135
> interrupt 1 count 135
> interrupt 1 count 0
> interrupt 1 count 0
> interrupt 1 count 4738
> interrupt 1 count 4738
> interrupt 1 count 0
> interrupt 1 count 0
> interrupt 1 count 5041
> interrupt 1 count 5041
> interrupt 1 count 0
> interrupt 1 count 0
>
> interrupt 4 count 108
> interrupt 4 count 108
> interrupt 4 count 0
> interrupt 4 count 0
> interrupt 4 count 15843
> interrupt 4 count 15844
> interrupt 4 count 0
> interrupt 4 count 0
> interrupt 4 count 35311
> interrupt 4 count 35311
> interrupt 4 count 0
> interrupt 4 count 0
>
> interrupt 6 count 0
> interrupt 6 count 0
> interrupt 6 count 0
> interrupt 6 count 0
> interrupt 6 count 4
> interrupt 6 count 4
> interrupt 6 count 0
> interrupt 6 count 0
> interrupt 6 count 11
> interrupt 6 count 11
> interrupt 6 count 0
> interrupt 6 count 0
>
> Interrupt 4 went up rather quick, so that likely is the bad guy. Right?
>
> Regards,
> Harm
>
> op 22-07-14 21:26, Adrian Chadd schreef:
>> Hi!
>>
>> So, ignore the ath0 stuff for now. int2 should be arge0, right?
>>
>> what's vmstat -ia say?
>>
>> Assuming it's actually arge0, we need to add some debugging counters
>> to the interrupt path to count how many of each interrupt are
>> occuring. The stuff i stuck behind ARGEDEBUG() is useful for debugging
>> some silly bugs but not at the rate that you're getting interrupts.
>>
>> So I'd add something like this to the arge softc struct:
>>
>> uint32_t intr_status[32];
>>
>> .. then in the interrupt routine, something like this:
>>
>> temp_status = status;
>> for (i = 0; i < 32; i++) {
>> if (temp_status & 1) {
>> intr_status[i]++;
>> }
>> temp_status = temp_status >> 1;
>> }
>>
>> That'll count the number of interrupts that are firing for each
>> interrupt status bit.
>>
>> Then, you'll want to write a sysctl for it. Have a look at
>> if_ath_sysctl.c for the SYSCTL_PROC() entries. Just write one that
>> when called will just printf() the intr_status array:
>>
>> for (i = 0; i < 32; i++) {
>> printf("interrupt %d count %u\n", i, intr_status[i]);
>> }
>>
>> Just make sure you do a complete kernel recompile as changing the
>> headers doesn't always force the source files to recompile.
>>
>>
>> -a
>>
>>
>> On 22 July 2014 12:08, Harm Weites <harm at weites.com> wrote:
>>> Hi,
>>>
>>> My 1043nd is complaining about interrupt storms, presumably only when
>>> wifi is beeing used. When this occurs, networking is gone.
>>>
>>> The exact message thats flooding me:
>>> interrupt storm detected on "int2"; throttling interrupt source
>>>
>>> The device associated with int2 is arge0.
>>>
>>> Some possibly related logs, though these messages start at boot:
>>>
>>> # /sbin/dmesg | tail
>>> ath0: stuck beacon; resetting (bmiss count 4)
>>> ar5416StopDmaReceive: dma failed to stop in 10ms
>>> AR_CR=0x00000024
>>> AR_DIAG_SW=0x42000020
>>> MBSSID Set bit 22 of AR_STA_ID 0xb8c16866
>>> ath0: stuck beacon; resetting (bmiss count 4)
>>> ar5416StopDmaReceive: dma failed to stop in 10ms
>>> AR_CR=0x00000024
>>> AR_DIAG_SW=0x42000020
>>> MBSSID Set bit 22 of AR_STA_ID 0xb8c16866
>>>
>>> This unit is configured with (arge0) port0 bound to device vlan1, port4
>>> to vlan2 and ports 1,2,3 make up vlan3. There is wlan0, bound to ath0
>>> and a bridge device that connects wlan0 to vlan3. There is a dhcp server
>>> running in vlan3 to answer to wifi clients, internet is routed through
>>> vlan1. This initially works but after a little while the storm begins
>>> and the wifi client is left to die.
>>>
>>> Adrian@ suggested to start with reading which interrupt(s) occur(s), but
>>> that is perhaps a little to hard for me to code :) Looking at if_arge.c,
>>> it seems there is some debug code already in place (ARGEDEBUG()) though
>>> I'm not sure on how to use that. Reading from the AR71XX_DMA_INTR
>>> register and mapping its content to AR71XX_DMA_INTR_STATUS would be
>>> something I'd like to do with a separate program (instead of boldly
>>> taking a deepdive in to if_arge.c and recompiling/flashing untill
>>> something works).
>>>
>>> One of my other units is configured with just a vlan device per switch
>>> port, no wifi and no bridge. A third unit is configured with a wlan0,
>>> vlan1 (port0) and vlan2 (ports 1,2,3,4). Both not showing any issues in
>>> the past months. The only difference would be this problem-unit has a
>>> bridge.
>>>
>>> Any thoughts on how to approach or 'just' fix this?
>>>
>>> Regards,
>>> Harm
>>> _______________________________________________
>>> freebsd-mips at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-mips
>>> To unsubscribe, send any mail to "freebsd-mips-unsubscribe at freebsd.org"
>
More information about the freebsd-mips
mailing list