Ongoing problems with the "ath" interface - is any relief in
sight??
Sam Leffler
sam at errno.com
Sat Jul 29 16:11:05 UTC 2006
Sam Leffler wrote:
> Ross Finlayson wrote:
>> For several months now, the "ath" interface has been spazzing out at
>> random times (in systems that are acting as wireless base stations). For
>> example:
>>
>> Jul 28 21:44:47 ns kernel: ath0: stuck beacon; resetting (bmiss count 4)
>> Jul 28 21:44:47 ns kernel: ath0: ath_reset: unable to reset hardware;
>> hal status 3
>> Jul 28 21:45:08 ns kernel: ath0: device timeout
>> Jul 28 21:45:08 ns kernel: ath0: stuck beacon; resetting (bmiss count 4)
>> Jul 28 21:45:08 ns kernel: ath0: ath_reset: unable to reset hardware;
>> hal status 3
>> [and then the interface stops working]
>>
>>
>> %cat /etc/motd
>> FreeBSD 6.1-STABLE (GENERIC) #6: Thu Jul 27 20:55:43 PDT 2006
>>
>> The error isn't always the same, however. Often it is
>> ath0: device timeout
>> or
>> ath0: discard frame w/o packet header
>> or even
>> arp: unknown hardware address format (0x4500)
>>
>> In each case, however, the "ath" interface stops working Immediately
>> after the error report, so I don't believe that the latter two error
>> reports are legitimate. I'm wondering it perhaps there's a memory smash
>> somewhere that's corrupting some driver data structures (thereby causing
>> bogus error reports in addition to stopping the interface from working)?
>>
>> The last time I asked about this, someone speculated that 'power save
>> mode' was the culprit. Unfortunately, the system is running in a coffee
>> shop that provides public WiFi, so it's not possible to stop clients
>> from using power save mode.
>>
>> On my system, these errors are often happening several times a day. Has
>> anyone else run into frequent problems like this, and is anyone looking
>> into a solution?
>
> "stuck beacon" means the tx dma of the beacon frame failed to complete
> in a full beacon interval. Diagnosing such a problem requires
> understanding why dma failed to complete. This usually involves
> checking the dma descriptor for clues and/or looking at other
> h/w-related state. If you have a "memory smash" then you will see it in
> the descriptor contents--but I doubt it. In my experience this problem
> is usually caused by feeding bogus data to the dma engine that causes it
> to lockup but the problem in general is very complicated and not
> something I can diagnose remotely.
BTW, the fact the subsequent reset failed with error 3 (HAL_EIO in ah.h)
indicates you've got something more going on. But since you didn't
provide any details on what you're doing it's hard to say if you've got
a hardware problem. Presumably you've done basic things like swap out
parts and/or try to reproduce the problem in a controlled environment.
Sam
More information about the freebsd-mobile
mailing list