A flood of bacula traffic causes igb interface to go offline.
Sean Bruno
seanbru at yahoo-inc.com
Wed Feb 2 18:07:38 UTC 2011
On Tue, 2011-02-01 at 12:50 -0800, Mike Carlson wrote:
> Hey net@,
>
> I have a FreeBSD 8.2-RC2 system running on a HP DL180 G6, using the
> onboard Intel controller, and it is our primary Bacula storage node and
> director node.
>
> We have 96 clients that are scheduled to run at 8:30pm. After about 9 -
> 10 minutes of activity (mrtg graphs show about 50-60MB/sec incoming
> traffic), the igb1 interface is no longer able to communicate with the
> Cisco switch.
>
> The interesting part is, the interface is still "up", there is nothing
> in the kernel message buffer, and nothing relevant in the log file (just
> syslogd and ldap errors because they cannot reach their respective
> network servers). The system only responds to the network until I either
> reboot, or run 'ifconfig igb1 down ; ifconfig igb1 up'. There is no
> firewall loaded/configured.
>
> Thankfully, I have a KVM over IP, so when this happens I can at least
> run script(1) and capture some useful information.
> ifconfig igb1
> igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>
> options=1bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4>
> ether 1c:c1:de:e9:fb:af
> inet 128.15.136.105 netmask 0xffffff00 broadcast 128.15.136.255
> inet 128.15.136.108 netmask 0xffffff00 broadcast 128.15.136.255
> inet 128.15.136.102 netmask 0xffffff00 broadcast 128.15.136.255
> media: Ethernet autoselect (1000baseT <full-duplex>)
> status: active
>
> I can ping the internal IP (but I realize that is probably a useless
> test...)
> root at write /etc]> ping 128.15.136.105
> PING 128.15.136.105 (128.15.136.105): 56 data bytes
> 64 bytes from 128.15.136.105: icmp_seq=0 ttl=64 time=0.024 ms
> 64 bytes from 128.15.136.105: icmp_seq=1 ttl=64 time=0.015 ms
> ^C
> --- 128.15.136.105 ping statistics ---
> 2 packets transmitted, 2 packets received, 0.0% packet loss
> round-trip min/avg/max/stddev = 0.015/0.019/0.024/0.005 ms
>
> Attempting to ping the router:
> root at write /etc]> ping 128.15.136.254
> PING 128.15.136.254 (128.15.136.254): 56 data bytes
> ping: sendto: Host is down
> ping: sendto: Host is down
> ping: sendto: Host is down
> ping: sendto: Host is down
> ^C
> --- 128.15.136.254 ping statistics ---
> 9 packets transmitted, 0 packets received, 100.0% packet loss
>
>
> The only thing that seems to solve this problem is to either reboot, or
> do an "ifconfig down/up":
>
> root at write /etc]> ifconfig igb1 down
> root at write /etc]> ifconfig igb1
> root at write /etc]> ping 128.15.136.254
> PING 128.15.136.254 (128.15.136.254): 56 data bytes
> 64 bytes from 128.15.136.254: icmp_seq=1 ttl=255 time=1.015 ms
> 64 bytes from 128.15.136.254: icmp_seq=2 ttl=255 time=0.217 ms
> 64 bytes from 128.15.136.254: icmp_seq=3 ttl=255 time=0.278 ms
> 64 bytes from 128.15.136.254: icmp_seq=4 ttl=255 time=0.238 ms
> ^C
> --- 128.15.136.254 ping statistics ---
> 5 packets transmitted, 4 packets received, 20.0% packet loss
> round-trip min/avg/max/stddev = 0.217/0.437/1.015/0.334 ms
>
> I was able to run tcpdump during all of this, and it *nothing* between
> the system and the switch until I run ifconfig igb1 down/up, and then
> you see the CDP and Tree Spanning traffic.
>
> The networking team here has told me there are no errors on the switch,
> or the port I am on, and they even moved me from one port to another,
> but this is still happening on a fairly regular basis now that I've
> added more backup clients.
>
> Is this a possible bug with my hardware and the intel driver? I have a
> pcap file and more system information that might provide a lot more
> information, but I don't want to send that out to a mailing list.
> _______________________________________________
You may want to pay attention to the current discussions regarding the
intel driver (em and igb).
Can you post the output of lspci -vvv ?
Sean
More information about the freebsd-net
mailing list