Problems with ath at kern.hz=100

Bengt Ahlgren bengta at sics.se
Tue Feb 10 09:40:19 PST 2009


Sam Leffler <sam at freebsd.org> writes:

> Ian Smith wrote:
>> On Mon, 9 Feb 2009, Bengt Ahlgren wrote:

[...]

>> Sam has talked of a problem with ath (and maybe other) interrupt
>> rates overwhelming slow CPUs to the extent that userland (ie powerd)
>> doesn't get a look-in for a while, as I understand it.  A fast CPU
>> clocked back to less than say 300MHz IS a slow CPU, until powerd
>> gets to crank it up.
>>   
>
> If powerd starvation were happening you'd see livelock not slow
> traffic.  The problem was that MIB overflow interrupts from the card
> were pounding the host but since the cpu clock was lowered they were
> not serviced fast enough.
>
> FWIW I believe the MIB interrupt storm issue is fixed in head.  I
> posted a patch to stable@ to enable backmerge of the necessary code
> but received ~zero feedback in >1 month so RELENG_7 will continue to
> rot w/ old code.

The system was not livelocked - it was fully working except for ath
not sending out packets.  I can however try the patch if you think it
is worthwhile.

[...]

>>  >  > Any idea on what the problem can be, or how do debug this
>> further?
>>
>>   
>
> You can use tcpdump to monitor traffic at multiple levels in the hierarchy:
>
> tcpdump -i ath0 -n   # tap at 802.3 level
> tcpdump -i ath0 -n -y IEEE802_11   # tap at 802.11 level
> tcpdump -i ath0 -n -y IEEE802_11_RADIO   # tap at driver level
>
> and/or you can check statistics using athstats and wlanstats or you
> can turn on debugging msgs in net80211 or the ath driver using
> wlandebug and athdebug.  Some of these may require you to build your
> kernel w/ debug options (e.g. ATH_DEBUG).

Does the following mean anything to you?  I did athdebug
watchdog+xmit+calibrate+xmit_proc - more or less random choice, since
I don't know what it really means.  I see a lot of those
ath_calibrate:s with low "x" in "next +x" when it is not sending.
When it is operating normally, it mostly says "next +30" with "tries
0".  But there is no strict correlation, it can sometimes send packets
even when it says "!iqCalDone".

ath_tx_dmasetup: m 0xc37d4a00 len 178
TODS 00:05:4e:4e:1f:c7->00:14:4f:0f:ae:f6(00:13:1a:47:7a:33) data 54M
 0801 2c00 0013 1a47 7a33 0005 4e4e 1fc7 0014 4f0f aef6 f007 aaaa 0300 0000 0800 4500 0092 0128 4000 4006 3426 c10a 42fd c10a 4006 f506 008f 3433 0198 3326 9ce6 5018 ffff c1f2 0000 1703 0100 200d 2e0f 8857 2590 18d0 745b ce33 3874 4089 c851 5aab e5de ef81 8a43 ab98 ad4b 8b17 0301 0040 6b34 58a9 a0e4 b41b 9868 fa90 c69c 95a3 dc6c 99f9 c942 1bf4 219d 5b5f 5711 174d a810 c6d8 2dff 0c88 bb42 c566 6eae 8b7c acff bba3 847b 09d0 a77a e597 fc87 a6c8
ath_tx_handoff: 0: 00000000 014cea3c c12200b6 0000a0b2 0004006c 01e0000c
ath_tx_handoff: link[1](0xdecd8900)=0x14b1bcc0 (0xdecd8cc0) depth 2
ath_tx_processq: tx queue 1 head 0x14b1bcc0 link 0xdecd8cc0
ath_tx_dmasetup: m 0xc37d7800 len 162
TODS 00:05:4e:4e:1f:c7->00:14:4f:0f:ae:f6(00:13:1a:47:7a:33) data 54M
 0801 2c00 0013 1a47 7a33 0005 4e4e 1fc7 0014 4f0f aef6 0008 aaaa 0300 0000 0800 4500 0082 0129 4000 4006 3435 c10a 42fd c10a 4006 f506 008f 3433 0202 3326 9d4b 5018 ffff 24fa 0000 1703 0100 206c 06b7 7a5e 1427 c691 772f b923 07be 4c94 4fa0 80e8 4130 5192 e751 970e 8092 c617 0301 0030 0124 d75d c26b 0a82 13ce 8044 73e8 2cb1 74ab d1f9 ce01 5aca 116d f95c 79db 058e 6f58 c86d 51d2 31b9 345d 02b4 836e fb26
ath_tx_handoff: 0: 00000000 014d183c c12200a6 0000a0a2 0004006c 01e0000c
ath_tx_handoff: link[1](0xdecd8cc0)=0x14b1c080 (0xdecd9080) depth 2
ath_tx_processq: tx queue 1 head 0x14b1c080 link 0xdecd9080
ath_tx_dmasetup: m 0xc37d5200 len 162
TODS 00:05:4e:4e:1f:c7->00:14:4f:0f:ae:f6(00:13:1a:47:7a:33) data 54M
 0801 2c00 0013 1a47 7a33 0005 4e4e 1fc7 0014 4f0f aef6 1008 aaaa 0300 0000 0800 4500 0082 012a 4000 4006 3434 c10a 42fd c10a 4006 f506 008f 3433 025c 3326 9db0 5018 ffff 0df1 0000 1703 0100 20bf 5e4c fea5 c3d7 37af 29f1 3071 9791 6e7d 5cad 559b 00aa 204b e704 eea9 58b4 8017 0301 0030 f92c e0f3 3511 5618 19db fb48 5408 86d2 365b 3a0a ca5a 86c0 cd67 3b33 1333 3a73 ed6a 4410 491a e454 5901 e1b3 4e33 d3f8
ath_tx_handoff: 0: 00000000 014cf23c c12200a6 0000a0a2 0004006c 01e0000c
ath_tx_handoff: link[1](0xdecd9080)=0x14b1c440 (0xdecd9440) depth 2
ath_tx_processq: tx queue 1 head 0x14b1c440 link 0xdecd9440
ath_tx_dmasetup: m 0xc37d4200 len 109
TODS 00:05:4e:4e:1f:c7->00:14:4f:0f:ae:f6(00:13:1a:47:7a:33) data 54M
 0801 2c00 0013 1a47 7a33 0005 4e4e 1fc7 0014 4f0f aef6 2008 aaaa 0300 0000 0800 4500 004d 012b 4000 4006 3468 c10a 42fd c10a 4006 f506 008f 3433 02b6 3326 9e05 5018 ffff a6cb 0000 1503 0100 20c4 db33 8cb3 dbfd 8f42 a0da 812c 1415 7ccd 8031 ada1 d5b3 48f8 45d4 ad00 c2eb 47
ath_tx_handoff: 0: 00000000 014ce23c c1220071 0000a06d 00040064 01e0000c
ath_tx_handoff: link[1](0xdecd9440)=0x14b1c300 (0xdecd9300) depth 2
ath_tx_dmasetup: m 0xc37d3800 len 72
TODS 00:05:4e:4e:1f:c7->00:14:4f:0f:ae:f6(00:13:1a:47:7a:33) data 54M
 0801 2c00 0013 1a47 7a33 0005 4e4e 1fc7 0014 4f0f aef6 3008 aaaa 0300 0000 0800 4500 0028 012c 4000 4006 348c c10a 42fd c10a 4006 f506 008f 3433 02db 3326 9e05 5011 ffff aceb 0000
ath_tx_handoff: 0: 00000000 014cd83c c122004c 0000a048 0004005c 01e0000c
ath_tx_handoff: link[1](0xdecd9300)=0x14b1ca80 (0xdecd9a80) depth 3
ath_tx_processq: tx queue 1 head 0x14b1ca80 link 0xdecd9a80
ath_calibrate: next +2 (!iqCalDone tries 2)
ath_calibrate: next +2 (!iqCalDone tries 3)
ath_calibrate: next +2 (!iqCalDone tries 4)
ath_calibrate: next +4 (!iqCalDone tries 0)
ath_calibrate: next +4 (!iqCalDone tries 1)
ath_calibrate: next +4 (!iqCalDone tries 2)
ath_calibrate: next +4 (!iqCalDone tries 3)

After a while I also saw some:

ath0: [00:13:1a:47:7a:33] pwr save q overflow, drops 19 (size 50)

athstats says:

[root at P142 ~]# athstats
306      data frames received
335      data frames transmit
336      long on-chip tx retries
48       tx failed 'cuz too many retries
48M      current transmit rate
9        beacon miss interrupts
107      tx management frames
199      tx frames discarded prior to association
78       tx frames with no ack marked
407      tx frames with short preamble
259      tx frames with 11g protection
11791    rx failed 'cuz of bad CRC
201147   rx failed 'cuz of PHY err
    116925   OFDM timing
    1        OFDM restart
    84126    CCK timing
    95       CCK restart
243      periodic calibrations
31       avg recv rssi
-97      rx noise floor
4997     switched default/rx antenna
Antenna profile:
[1] tx      363 rx    53933
[2] tx       14 rx    18762

and ifconfig -v:

ath0: flags=8847<UP,BROADCAST,DEBUG,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 00:05:4e:4e:1f:c7
        inet 193.10.66.253 netmask 0xfffffc00 broadcast 193.10.67.255
        media: IEEE 802.11 Wireless Ethernet autoselect (OFDM/48Mbps)
        status: associated
        ssid SICS channel 11 (2462 Mhz 11g) bssid 00:13:1a:47:7a:33
        authmode OPEN privacy OFF deftxkey UNDEF powersavemode OFF
        powersavesleep 100 txpower 31.5 txpowmax 17.0 rtsthreshold 2346
        fragthreshold 2346 bmiss 7 scanvalid 60 bgscan bgscanintvl 300
        bgscanidle 250 roam:rssi11a 7 roam:rate11a 12 roam:rssi11b 7
        roam:rate11b 1 roam:rssi11g 7 roam:rate11g 5 -pureg protmode CTS -ht
        -htcompat -ampdu ampdulimit 8k ampdudensity - -amsdu -shortgi
        htprotmode RTSCTS -puren -wme burst -ff -dturbo roaming AUTO
        bintval 100

Bengt


More information about the freebsd-mobile mailing list