Network stack unstable after arp flapping

Brandon Gooch jamesbrandongooch at gmail.com
Fri Apr 1 16:00:37 UTC 2011


On Fri, Apr 1, 2011 at 9:50 AM, Steve Polyack <korvus at comcast.net> wrote:
> On 04/01/11 10:16, Frederique Rijsdijk wrote:
>>
>> Hi,
>>
>> We (hosting provider) are in the process of implementing ipv6 in our
>> network (yay). Yesterday one of the final steps in configuring and updating
>> our core routers were taken, which did not go entirely as planned. As a
>> result, the default gateway mac addresses for all our machines changed about
>> 800 times in a time span of about 4 minutes.
>>
>> Here's a small piece of the logging:
>>
>> Mar 31 18:36:12 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:9f:f0:3d to
>> 00:00:0c:07:ac:3d on bge0
>> Mar 31 18:36:12 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:07:ac:3d to
>> 00:00:0c:9f:f0:3d on bge0
>> Mar 31 18:36:13 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:9f:f0:3d to
>> 00:00:0c:07:ac:3d on bge0
>> Mar 31 18:36:14 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:07:ac:3d to
>> 00:00:0c:9f:f0:3d on bge0
>> Mar 31 18:36:14 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:9f:f0:3d to
>> 00:00:0c:07:ac:3d on bge0
>> Mar 31 18:36:14 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:07:ac:3d to
>> 00:00:0c:9f:f0:3d on bge0
>> Mar 31 18:36:15 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:9f:f0:3d to
>> 00:00:0c:07:ac:3d on bge0
>>
>> The x.x.x.1 is always the same IP, the gateway of the machine.
>>
>> The result of that, is that loads of FreeBSD machines (6.x, 7.x and 8.x)
>> developed serious network issues, mainly being no or slow traffic between
>> other (FreeBSD) machine accross different VLAN's in our own network.
>>
>> First thing that comes to mind is the network itself, but all Linux
>> machines (Ubuntu, Red Hat and CentOS) had no issues at all. Only BSD.
>>
>> An arp -ad on both machines where problems occured, didn't solve anything.
>> What worked better was /etc/rc.d/netif restart and a /etc/rc.d/routing
>> restart. Some machines even had to be rebooted in order to get networking
>> back to normal.
>>
>> This almost sounds like a bug in the network stack in BSD, but I can not
>> imagine that I'm right. The BSD networking stack is considered to be one of
>> the best..
>>
>> Any ideas anyone?
>
> We experienced a similar issue here, but IIRC only on our 8.x systems (we
> don't have any 7.x).  Disabling flowtable cleared everything up immediately.
>  You can try that and see if it helps.  It seems like the flowtable  caches
> and associates the next-hop router MAC address with each flow, and
> unfortunately this doesn't get purged when the kernel senses and logs an ARP
> change.  The only other solution I've seen was to stop all network traffic
> on the machine until the flows/cache entries expired.
>
> http://www.freebsd.org/cgi/query-pr.cgi?pr=155604 has more details of my
> run-in with this.  The title should be corrected though, as I found shortly
> after that all traffic is affected.
>
> - Steve

FYI, the FLOWTABLE option has been removed from the DEFAULT kernel
config on HEAD, a change which will be MFC'd in a couple of days to
8-STABLE...

-Brandon


More information about the freebsd-net mailing list