Problems with ath at kern.hz=100

Sam Leffler sam at freebsd.org
Tue Feb 10 08:45:29 PST 2009


Ian Smith wrote:
> On Mon, 9 Feb 2009, Bengt Ahlgren wrote:
>
>  > I was changing kern.hz to 100 on my IBM X40 (Pentium-M) laptop to see
>  > whether that saves some battery, but ran into problems with ath not
>  > working properly.  At first I thought that my hardware was failing,
>  > but changing hz back to 1000 solved the problem.  I am running
>  > 7.1-RELEASE-p1 and have the following Atheros chip:
>  > 
>  > Feb  9 11:13:10 P142 kernel: ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
>  > Feb  9 11:13:10 P142 kernel: ath0: <Atheros 5212> mem 0xd0200000-0xd020ffff irq 11 at device 2.0 on pci2
>  > Feb  9 11:13:10 P142 kernel: ath0: [ITHREAD]
>  > Feb  9 11:13:10 P142 kernel: ath0: WARNING: using obsoleted if_watchdog interface
>  > Feb  9 11:13:10 P142 kernel: ath0: Ethernet address: 00:05:4e:4e:1f:c7
>  > Feb  9 11:13:10 P142 kernel: ath0: mac 5.6 phy 4.1 5ghz radio 1.7 2ghz radio 2.3
>
> There's an order of magnitude room to move between 100 and 1000 Hz.  
> You could also try with maybe 250 or 500?
>
> What speed is your Pentium-M?  Are you using powerd?  If so, what 
> minimum CPU freq does it drop back to (sysctl debug.cpufreq.lowest)?
>
> Sam has talked of a problem with ath (and maybe other) interrupt rates 
> overwhelming slow CPUs to the extent that userland (ie powerd) doesn't 
> get a look-in for a while, as I understand it.  A fast CPU clocked back 
> to less than say 300MHz IS a slow CPU, until powerd gets to crank it up.
>   

If powerd starvation were happening you'd see livelock not slow 
traffic.  The problem was that MIB overflow interrupts from the card 
were pounding the host but since the cpu clock was lowered they were not 
serviced fast enough.

FWIW I believe the MIB interrupt storm issue is fixed in head.  I posted 
a patch to stable@ to enable backmerge of the necessary code but 
received ~zero feedback in >1 month so RELENG_7 will continue to rot w/ 
old code.

> You might find that rc.conf setting economy_cx_lowest="LOW" can save you 
> some battery, though this benefit seems somewhat processor-dependent.
>
> HTH, Ian
>
>
>  > The symptom is that ath stops sending packets out.  They seem to be
>  > buffered somewhere.  After a while it gets going again, albeit some
>  > packets are lost.  "A while" can be from 5 secs to a minute or so when
>  > continously running a ping.  In the below logs, a ping is started when
>  > ath has stopped sending.
>   
>  > 
>  > "netstat -I ath0 1" reports that no packets are sent, but the byte
>  > count is still incremented.  When it gets going again, 18 packets are
>  > sent, but the byte count does not reflect that (which perpaps is
>  > logical):
>  > 
>  >          1     0         56          0     0          0     0
>  >          0     0          0          0     0          0     0
>  >          0     0          0          0     0         98     0
>  >          1     0         56          0     0         98     0
>  >          1     0         56          0     0         98     0
>  >             input         (ath0)           output
>  >    packets  errs      bytes    packets  errs      bytes colls
>  >          0     0          0          0     0         98     0
>  >          1     0         56          0     0         98     0
>  >          1     0         56          0     0         98     0
>  >          1     0         56          0     0         98     0
>  >          0     0          0          0     0         98     0
>  >          0     0          0          0     0         98     0
>  >          0     0          0          0     0         98     0
>  >          0     0          0          0     0         98     0
>  >          0     0          0          0     0         98     0
>  >          1     0         56          0     0         98     0
>  >          1     0         56          0     0         98     0
>  >          2     0        112          0     0         98     0
>  >          0     0          0          0     0         98     0
>  >          4     0        350         18     0        140     0
>  >          2     0        112          0     0          0     0
>  >          1     1         56          0     0          0     0
>  >          1     0         56          0     0          0     0
>  >          0     0          0          0     0          0     0
>  > ^C
>  > 
>  > Here is a tcpdump of the same event:
>  > 
>  > 18:40:59.025364 arp who-has 10.1.1.72 tell 10.1.7.250
>  > 18:41:00.049385 arp who-has 10.1.1.72 tell 10.1.7.250
>  > 18:41:00.868608 arp who-has 10.1.6.253 tell 10.1.7.250
>  > 18:41:01.073387 arp who-has 10.1.1.72 tell 10.1.7.250
>  > 18:41:02.761445 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 0, length 64
>  > 18:41:02.761456 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 1, length 64
>  > 18:41:02.761461 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 2, length 64
>  > 18:41:02.761466 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 3, length 64
>  > 18:41:02.761470 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 4, length 64
>  > 18:41:02.761474 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 5, length 64
>  > 18:41:02.761479 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 6, length 64
>  > 18:41:02.761483 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 7, length 64
>  > 18:41:02.761487 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 8, length 64
>  > 18:41:02.761492 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 9, length 64
>  > 18:41:02.761497 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 10, length 64
>  > 18:41:02.761502 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 11, length 64
>  > 18:41:02.761506 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 12, length 64
>  > 18:41:02.761510 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 13, length 64
>  > 18:41:02.761515 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 14, length 64
>  > 18:41:02.761519 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 15, length 64
>  > 18:41:02.761523 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 16, length 64
>  > 18:41:02.916556 arp who-has 10.1.1.107 tell 10.1.7.250
>  > 18:41:02.916579 arp reply 10.1.1.107 is-at 00:05:4e:4e:1f:c7
>  > 18:41:02.917707 IP 193.10.65.193 > 10.1.1.107: ICMP echo reply, id 63239, seq 14, length 64
>  > 18:41:02.917906 IP 193.10.65.193 > 10.1.1.107: ICMP echo reply, id 63239, seq 15, length 64
>  > 18:41:02.918187 IP 193.10.65.193 > 10.1.1.107: ICMP echo reply, id 63239, seq 16, length 64
>  > 18:41:03.531108 arp who-has 10.1.1.72 tell 10.1.7.250
>  > 18:41:04.350185 arp who-has 10.1.6.166 tell 10.1.7.250
>  > 18:41:04.554984 arp who-has 10.1.1.72 tell 10.1.7.250
>  > 18:41:05.578972 arp who-has 10.1.1.72 tell 10.1.7.250
>  > ^C
>  > 289 packets captured
>  > 289 packets received by filter
>  > 0 packets dropped by kernel
>  > 
>  > Any idea on what the problem can be, or how do debug this further?
>
>   

You can use tcpdump to monitor traffic at multiple levels in the hierarchy:

tcpdump -i ath0 -n   # tap at 802.3 level
tcpdump -i ath0 -n -y IEEE802_11   # tap at 802.11 level
tcpdump -i ath0 -n -y IEEE802_11_RADIO   # tap at driver level

and/or you can check statistics using athstats and wlanstats or you can 
turn on debugging msgs in net80211 or the ath driver using wlandebug and 
athdebug.  Some of these may require you to build your kernel w/ debug 
options (e.g. ATH_DEBUG).

    Sam



More information about the freebsd-mobile mailing list