Issue with igb and lagg (was Re: Problem with link aggregation +
sshd)
Giulio Ferro
auryn at zirakzigil.org
Tue Sep 11 21:11:06 UTC 2012
Well, there definitely seems to be a problem with igb and lagg.
igb alone works as it should, but doesn't seem to work properly in lagg.
To be sure I started from scratch from a 9.0 release with nothing but:
/etc/rc.conf
---------------------------------------------------
ifconfig_igb0="inet ..."
ifconfig_igb1="up"
ifconfig_igb2="up"
ifconfig_igb3="up"
cloned_interfaces="lagg0"
ifconfig_lagg0="laggproto lacp laggport igb1 laggport igb2 laggport igb3
192.168.x.x/24"
sshd_enable="YES"
---------------------------------------------------
This doesn't even manage to start sshd, it just hangs there at boot.
Disabling lagg configuration everything works correctly.
This installation is a zfs root, but I don't think this has anything to
do with this.
Yes, I think that the maintainer of igb and/or lagg driver should
absolutely look into this...
On 09/07/2012 12:01 PM, Simon Dick wrote:
> We've had similar problems with lagg at work, each lagg is made up of
> one igb and one em port, sometimes for no apparent reason they seem to
> stop passing through traffic. The easiest way we've found to get it
> working again is ifconfig down and up on one of the physical
> interfaces. This is on 8.1
>
> On 3 September 2012 19:25, Giulio Ferro <auryn at zirakzigil.org> wrote:
>> No idea anybody why this bug happens? Patches?
>>
>>
>>
>> On 08/29/2012 10:22 PM, Giulio Ferro wrote:
>>>
>>> On 08/28/2012 11:12 AM, Damien Fleuriot wrote:
>>>>
>>>> Hi Giulio,
>>>>
>>>>
>>>>
>>>> Just to clear things up:
>>>> igb0: 192.168.9.60/24
>>>> lagg0: 192.168.12.21/24
>>>>
>>>
>>> Yes.
>>> Actually I notice now that the lagg0 address is different from what
>>> I wrote below in my rc.conf (192.168.12.7). I've just made many test
>>> with different configuration, but no matter, it just doesn't work...
>>>
>>>
>>>>
>>>> What's the IP of the host you're trying ssh connections from ?
>>>
>>>
>>> I'm just trying to connect to and from management interface igb0
>>> (192.168.9.60).
>>> From external pc I do : ssh myuser at 192.168.9.60
>>> From that server I do : ssh myuser at pcaddress
>>>
>>> Just to be more precise, the consequences are:
>>> 1) daemon sshd on the server gets stuck and becomes unkillable
>>> 2) the first connection may work, but then the program ssh on the
>>> server becomes unresponsive and unkillable
>>>
>>> If I don't create a lagg0 interface and just connect (say) igb1 to
>>> the data switch, I've no problem and everything works.
>>>
>>> Just to answer others' question, I connect igb1, igb2 and igb3 to the
>>> same data switch in ports configured for aggregation.
>>> I connect igb0 to another management switch (of course not configured
>>> for aggregation)
>>>
>>>
>>>>
>>>> Also, just in case, did you enable any firewall ? (PF, ipfw)
>>>
>>>
>>> As I already said, no. Nothing is working/active on this server, just
>>> sshd.
>>>
>>> Thank you.
>>>
>>>
>>>>
>>>>
>>>>
>>>> On 27 August 2012 21:22, Giulio Ferro <auryn at zirakzigil.org> wrote:
>>>>>
>>>>> Hi, thanks for the answer
>>>>>
>>>>> Here is what you asked for:
>>>>>
>>>>> # ifconfig igb0
>>>>> igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu
>>>>> 1500
>>>>>
>>>>>
>>>>> options=4401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
>>>>>
>>>>> ether ...
>>>>> inet 192.168.9.60 netmask 0xffffff00 broadcast 192.168.9.255
>>>>> inet6 .... prefixlen 64 scopeid 0x1
>>>>> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>>>> media: Ethernet autoselect (1000baseT <full-duplex>)
>>>>> status: active
>>>>>
>>>>>
>>>>>
>>>>> # netstat -rn
>>>>> Routing tables
>>>>>
>>>>> Internet:
>>>>> Destination Gateway Flags Refs Use Netif
>>>>> Expire
>>>>> default 192.168.9.1 UGS 0 0 igb0
>>>>> 127.0.0.1 link#12 UH 0 0 lo0
>>>>> 192.168.9.0/24 link#1 U 0 14 igb0
>>>>> 192.168.9.60 link#1 UHS 0 0 lo0
>>>>> 192.168.12.0/24 link#13 U 0 109 lagg0
>>>>> 192.168.12.21 link#13 UHS 0 0 lo0
>>>>>
>>>>> Internet6:
>>>>> Destination Gateway Flags
>>>>> Netif Expire
>>>>> ::/96 ::1
>>>>> UGRS lo0
>>>>> ::1 link#12
>>>>> UH lo0
>>>>> ::ffff:0.0.0.0/96 ::1
>>>>> UGRS lo0
>>>>> fe80::/10 ::1
>>>>> UGRS lo0
>>>>> fe80::%igb0/64 link#1 U
>>>>> igb0
>>>>> fe80::ea39:35ff:feb6:a0d4%igb0 link#1
>>>>> UHS lo0
>>>>> fe80::%igb1/64 link#2 U
>>>>> igb1
>>>>> fe80::ea39:35ff:feb6:a0d5%igb1 link#2
>>>>> UHS lo0
>>>>> fe80::%igb2/64 link#3 U
>>>>> igb2
>>>>> fe80::ea39:35ff:feb6:a0d6%igb2 link#3
>>>>> UHS lo0
>>>>> fe80::%igb3/64 link#4 U
>>>>> igb3
>>>>> fe80::ea39:35ff:feb6:a0d7%igb3 link#4
>>>>> UHS lo0
>>>>> fe80::%lo0/64 link#12 U
>>>>> lo0
>>>>> fe80::1%lo0 link#12
>>>>> UHS lo0
>>>>> fe80::%lagg0/64 link#13 U
>>>>> lagg0
>>>>> fe80::ea39:35ff:feb6:a0d5%lagg0 link#13
>>>>> UHS lo0
>>>>> ff01::%igb0/32 fe80::ea39:35ff:feb6:a0d4%igb0
>>>>> U igb0
>>>>> ff01::%igb1/32 fe80::ea39:35ff:feb6:a0d5%igb1
>>>>> U igb1
>>>>> ff01::%igb2/32 fe80::ea39:35ff:feb6:a0d6%igb2
>>>>> U igb2
>>>>> ff01::%igb3/32 fe80::ea39:35ff:feb6:a0d7%igb3
>>>>> U igb3
>>>>> ff01::%lo0/32 ::1 U
>>>>> lo0
>>>>> ff01::%lagg0/32 fe80::ea39:35ff:feb6:a0d5%lagg0 U
>>>>> lagg0
>>>>> ff02::/16 ::1
>>>>> UGRS lo0
>>>>> ff02::%igb0/32 fe80::ea39:35ff:feb6:a0d4%igb0
>>>>> U igb0
>>>>> ff02::%igb1/32 fe80::ea39:35ff:feb6:a0d5%igb1
>>>>> U igb1
>>>>> ff02::%igb2/32 fe80::ea39:35ff:feb6:a0d6%igb2
>>>>> U igb2
>>>>> ff02::%igb3/32 fe80::ea39:35ff:feb6:a0d7%igb3
>>>>> U igb3
>>>>> ff02::%lo0/32 ::1 U
>>>>> lo0
>>>>> ff02::%lagg0/32 fe80::ea39:35ff:feb6:a0d5%lagg0 U
>>>>> lagg0
>>>>>
>>>>>
>>>>>
>>>>> # netstat -aln | grep 22
>>>>> tcp4 0 0 *.22 *.* LISTEN
>>>>> tcp6 0 0 *.22 *.* LISTEN
>>>>>
>>>>> Note that I already tried to only listen on igb0 interface
>>>>> (192.168.9.60) in
>>>>> sshd_config, but the results are exactly
>>>>> the same described below.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 08/25/2012 01:22 PM, Damien Fleuriot wrote:
>>>>>>
>>>>>>
>>>>>> In the meantime kindly post:
>>>>>>
>>>>>>
>>>>>> Ifconfig for your igb0
>>>>>> Netstat -rn
>>>>>> Netstat -aln | grep 22
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 25 Aug 2012, at 13:18, Damien Fleuriot <ml at my.gd> wrote:
>>>>>>
>>>>>>> I'll get back to you regarding link aggregation when I'm done with
>>>>>>> groceries.
>>>>>>>
>>>>>>> We use it here in production and it works flawlessly.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 25 Aug 2012, at 09:54, Giulio Ferro <auryn at zirakzigil.org> wrote:
>>>>>>>
>>>>>>>> No answer, so it seems that link aggregation doesn't really work in
>>>>>>>> freebsd,
>>>>>>>> this may help others with the same problem...
>>>>>>>>
>>>>>>>> I reverted back to one link for management and one for service,
>>>>>>>> and ssh
>>>>>>>> works as it should...
>>>>>>>>
>>>>>>>>
>>>>>>>> On 08/21/2012 11:18 PM, Giulio Ferro wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Scenario : freebsd 9 stable (yesterday) amd64 on HP server with 4
>>>>>>>>> nic
>>>>>>>>> (igb)
>>>>>>>>>
>>>>>>>>> 1 nic is connected standalone to the management switch, the 3 other
>>>>>>>>> nics
>>>>>>>>> are connected to a switch configured for aggregation.
>>>>>>>>>
>>>>>>>>> If I configure the first nic (igb0) there is no problem, I can
>>>>>>>>> operate
>>>>>>>>> as I normally do and sshd functions normally.
>>>>>>>>>
>>>>>>>>> The problems start when I configure the 3 other nics for
>>>>>>>>> aggregation:
>>>>>>>>>
>>>>>>>>> in /etc/rc.conf
>>>>>>>>> ...
>>>>>>>>> ifconfig_igb1="up"
>>>>>>>>> ifconfig_igb2="up"
>>>>>>>>> ifconfig_igb3="up"
>>>>>>>>>
>>>>>>>>> cloned_interfaces=lagg0
>>>>>>>>> ifconfig_lagg0="laggproto lacp laggport igb1 laggport igb2 laggport
>>>>>>>>> igb3 192.168.12.7/24"
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> I restart the server and the aggregation seems to work correctly, in
>>>>>>>>> fact ifconfig returns the correct lagg0 interface with the
>>>>>>>>> aggregated
>>>>>>>>> links, the correct protocol (lacp) and the correct ip address and
>>>>>>>>> the
>>>>>>>>> status is active. I can ping other IPs on the aggregated link.
>>>>>>>>>
>>>>>>>>> Also the other (standalone) link seems to work correctly. I can ping
>>>>>>>>> that address from other machines, and I can ping other IPs from that
>>>>>>>>> server.
>>>>>>>>>
>>>>>>>>> DNS lookups work ok too I can also use telnet to connect to pop3
>>>>>>>>> servers so there seems to be no problem on the network stack.
>>>>>>>>>
>>>>>>>>> But if I try to connect to the sshd service on that server, it hangs
>>>>>>>>> indefinitely. On the server I find two sshd processes:
>>>>>>>>> /usr/sbin/sshd
>>>>>>>>> /usr/sbin/sshd -R
>>>>>>>>>
>>>>>>>>> There is no message in the logs.
>>>>>>>>>
>>>>>>>>> If I try to kill sshd (/etc/rc.d/sshd stop) I can't. it just stays
>>>>>>>>> there
>>>>>>>>> forever waiting for the pid to die (it never does)
>>>>>>>>>
>>>>>>>>> Even ssh client doesn't seem to work. In fact, if I try to
>>>>>>>>> connect to
>>>>>>>>> another server, the ssh client may start to work correctly, then
>>>>>>>>> soon
>>>>>>>>> or later it just hangs there forever, and I can't kill it with
>>>>>>>>> ctrl-c.
>>>>>>>>>
>>>>>>>>> No firewall is configured, there is nothing else working on this
>>>>>>>>> server.
>>>>>>>>>
>>>>>>>>> Thanks for any suggestions...
>>>>>>>>> _______________________________________________
>>>>>>>>> freebsd-stable at freebsd.org mailing list
>>>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>>>>>>>> To unsubscribe, send any mail to
>>>>>>>>> "freebsd-stable-unsubscribe at freebsd.org"
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> freebsd-stable at freebsd.org mailing list
>>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>>>>>>> To unsubscribe, send any mail to
>>>>>>>> "freebsd-stable-unsubscribe at freebsd.org"
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> freebsd-net at freebsd.org mailing list
>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>>>
>>>
>>> _______________________________________________
>>> freebsd-net at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>>
>>
>> _______________________________________________
>> freebsd-net at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>
More information about the freebsd-net
mailing list