FreeBSD 8.0 - network stack crashes?
Weldon S Godfrey 3
weldon at excelsusphoto.com
Mon Nov 2 21:48:34 UTC 2009
If memory serves me right, sometime around 4:11pm, Weldon S Godfrey 3 told me:
>
>
> If memory serves me right, sometime around 10:52am, Weldon S Godfrey 3 told
> me:
>
>>
>> Up until yesterday, we have been running FreeBSD-CURRENT of 12/08. We
>> started to see a couple months ago some very odd network behavior. Something
>> happens to the stack that causes processes accessing the network to just
>> hang. After the problem happens, usually (but not always), you can't ssh
>> in. Always, you can't ssh or telnet out, and nothing can access the NFS
>> shares on the server. You can ping everything from the server. You can't
>> even do a route add, you can't ssh if you use just the IP address (although
>> pinging with hostnames it doesn't have cached or in hosts table resolves).
>> When you try to ssh out, do a route add from the box, the process just
>> hangs. You can't control C it at all, it hangs forever. There is nothing
>> in dmesg or messages to indicate an issue. I try to up/down the interfaces.
>> In CURRENT-12/08, it may allow things to work for like 30s.
>>
>> We upgraded to 8.0-RC2 yesterday and, at first, the problem appeared to
>> happen a lot more often. We expected that was related with the increase in
>> network performance. At least in 8.0-RC2, I did see a large amount of input
>> errors with netstat -in on the heavily loaded interface before it started
>> the locking up behavior. I have replaced the ethernet cable and move ports.
>> The Catalyst 3650 never records any errors. The problem would reoccur in
>> about 5 minutes once our load kicked in this morning.
>>
>>
>> One change in this upgrade, we switched from NFS v2 to v3. When we
>> downgraded to the previous OS, we stayed at v3. The problem was just about
>> as bad with v3 with the 12/08 OS
>>
>> We went back to RC2 with NFS v2 and appeared to stabilize to a degree.
>> It ran for about an hour and a half and then the issue came up
>>
>> We are currently back to the 12/08 version using NFS2 and watching things.
>>
>> We are using a Dell PowerEdge 2950-iii, the problem happens when using the
>> onboard nics using the bce driver and with an Intel card using the em driver
>>
>> I am hunting down any MTU/duplex/speed problems that could cause it (haven't
>> found any so far). Of course, any problems on the network wouldn't
>> (ideally) freak out the network stack on the server). I don't know how to
>> troubleshoot this further on the server since I am not getting any problems
>> indicated in logging, panics, cores, etc.
>>
>> Any help is appreciated.
>>
>
>
> I have swapped out the computer, switch, ethernet card, 3ware card. We are
> running on 8.0-CURRENT 12/08 that was what we where using with a lot less
> issues. No help.
>
> If it happens again, I am going to try to do a netif restart and routing
> restart. Although I believe I tried that at the begining and it did not help.
>
BTW.. doing a netif / routing restart doesn't help
More information about the freebsd-current
mailing list