iwn firmware instability with an up-to-date stable kernel
Garrett Cooper
yanefbsd at gmail.com
Sat Apr 24 06:27:33 UTC 2010
On Fri, Apr 23, 2010 at 10:08 PM, Brandon Gooch
<jamesbrandongooch at gmail.com> wrote:
> On Sat, Apr 24, 2010 at 4:59 AM, Garrett Cooper <yanefbsd at gmail.com> wrote:
>> On Fri, Apr 23, 2010 at 9:42 PM, Garrett Cooper <yanefbsd at gmail.com> wrote:
>>> On Fri, Apr 23, 2010 at 8:05 PM, Brandon Gooch
>>> <jamesbrandongooch at gmail.com> wrote:
>>>> 2010/4/23 Garrett Cooper <yanefbsd at gmail.com>:
>>>>> 2010/4/23 Garrett Cooper <yanefbsd at gmail.com>:
>>>>>> 2010/4/18 Olivier Cochard-Labbé <olivier at cochard.me>:
>>>>>>> 2010/4/18 Bernhard Schmidt <bschmidt at techwires.net>:
>>>>>>>> Are you able to reproduce this on demand? As in type a few commands and
>>>>>>>> the firmware error occurs?
>>>>>>>>
>>>>>>>
>>>>>>> No, I'm not able to reproduce on demand this problem.
>>>>>>
>>>>>> I'm seeing similar issues on occasion with my Lenovo as well:
>>>>>>
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: firmware error log:
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: error type =
>>>>>> "NMI_INTERRUPT_WDG" (0x00000004)
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: program counter = 0x0000046C
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: source line = 0x000000D0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: error data = 0x0000000207030000
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: branch link = 0x00008370000004C2
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: interrupt link = 0x000006DA000018B8
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: time = 4287402440
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: driver status:
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 0: qid=0 cur=1 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 1: qid=1 cur=0 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 2: qid=2 cur=0 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 3: qid=3 cur=36 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 4: qid=4 cur=123 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 5: qid=5 cur=0 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 6: qid=6 cur=0 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 7: qid=7 cur=0 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 8: qid=8 cur=0 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 9: qid=9 cur=0 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 10: qid=10 cur=0 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 11: qid=11 cur=0 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 12: qid=12 cur=0 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 13: qid=13 cur=0 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 14: qid=14 cur=0 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 15: qid=15 cur=0 queued=0
>>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: rx ring: cur=8
>>>>>>
>>>>>> This may be because the system was under load (I was installing a port
>>>>>> shortly before the connection dropped). I'll try poking at this
>>>>>> further because it's going to be an annoying productivity loss :/.
>>>>>
>>>>> Sorry... should have included more helpful details.
>>>>> Thanks,
>>>>> -Garrett
>>>>>
>>>>> dmesg:
>>>>>
>>>>> iwn0: <Intel(R) PRO/Wireless 4965BGN> mem 0xdf2fe000-0xdf2fffff irq 17
>>>>> at device 0.0 on pci3
>>>>> iwn0: MIMO 2T3R, MoW1, address 00:1d:e0:7d:9f:c7
>>>>> iwn0: [ITHREAD]
>>>>> iwn0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
>>>>> iwn0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
>>>>> iwn0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps
>>>>> 24Mbps 36Mbps 48Mbps 54Mbps
>>>>>
>>>>> pciconf -lv snippet:
>>>>>
>>>>> iwn0 at pci0:3:0:0: class=0x028000 card=0x11108086 chip=0x42308086
>>>>> rev=0x61 hdr=0x00
>>>>> vendor = 'Intel Corporation'
>>>>> device = 'Intel Wireless WiFi Link 4965AGN (Intel 4965AGN)'
>>>>> class = network
>>>>> cbb0 at pci0:21:0:0: class=0x060700 card=0x20c617aa chip=0x04761180
>>>>> rev=0xba hdr=0x02
>>>>>
>>>>> uname -a:
>>>>>
>>>>> $ uname -a
>>>>> FreeBSD garrcoop-fbsd.cisco.com 8.0-STABLE FreeBSD 8.0-STABLE #0
>>>>> r207006: Wed Apr 21 13:18:44 PDT 2010
>>>>> root at garrcoop-fbsd.cisco.com:/usr/obj/usr/src/sys/LAPPY_X86 i386
>>>>
>>>> I'm actually looking at this right now. For me, it's actually
>>>> happening when my machine stays on overnight (or for long periods of
>>>> time, idle).
>>>>
>>>> Also, it seems to be causing the kernel to panic, although I'm now
>>>> wondering if the Machine Check Architecture is somehow catching this
>>>> device error and causing an exception (hw.mca.enabled=1)(?) -- not
>>>> possible, right ???
>>>>
>>>> Whatever the case, I can't seem to get the firmware error to occur
>>>> with iwn(4) debugging or wlandebug options enabled, so who knows
>>>> exactly what leads to this.
>>>>
>>>> I know Bernhard has worked hard on this driver, it's a shame that this
>>>> freaky bug has bit us all now, without leaving many clues :(
>>>>
>>>> I've attached a textdump for posterity if nothing else :)
>>>
>>> Connectivity appears to be shoddy in my neck of the woods (kind of
>>> ironic... but meh). Just running buildworld, buildkernel, then doing a
>>> tcpdump in parallel causes the pseudo device to go up and down a lot.
>>> I assume this isn't standard behavior?
>>> Just for reference buildworld was started shortly after 19:39:05,
>>> and it finished at 21:29. The interface has also gone up and down once
>>> since then while the system's been basically idle.
>>
>> Hmmm... I'm seem to be in an excellent position to reproduce this
>> issue. I've reproduced it twice by merely bringing the interface up
>> and down several times using:
>>
>> ifconfig_wlan0="WPA DHCP"
>>
>> instead of my usual:
>>
>> ifconfig_wlan0="WPA ssid <base-station-id1> DHCP"
>>
>> Maybe others who are experiencing the issue should try that? I'll
>> do more testing when I get home...
>
> My rc.conf is:
>
> ifconfig_wlan0="WPA DHCP"
>
> ...as well, although I haven't tried manually taking the interface
> down and bringing it back up.
Hmmm... that is interesting. I wish I could do that, but it seems to
be alluding my grasp right now. The driver just kind of freaks out
with a bunch of SSIDs, one being my target SSID, a bunch of NUL string
ones, and then finally it just croaks. I need to figure out whether or
not the SSIDs are valid when I boot it up at my desk again.
> Are you waiting for the device to associate and begin passing traffic
> before you each up/down cycle?
I was, but I'm not sure whether or not the Ajax pieces in GMail were.
I'll try some more rudimentary tests when I get back to work on Monday
in that environment, but I need to try out other things at home as
well in the meantime.
Thanks,
-Garrett
More information about the freebsd-stable
mailing list