sfxge, lagg, cannot flush Tx/Rx queue and disconnects
sashk
b at sashk.xyz
Mon Aug 10 18:26:01 UTC 2020
Hi,
Apologies, first email went out as html letter. Re-sending as plain text.
I have a FreeBSD 12.1 system which has Solarflare SFN8522 network
controller. Everything works perfectly fine, until at some point I loose
connectivity to the server: it will stop responding to pings for some
time, then will start and will continue for a long time.
lagg0 configured like this in the /etc/rc.conf:
ifconfig_sfxge0="up mtu 9000"
ifconfig_sfxge1="up mtu 9000"
cloned_interfaces="lagg0"
ifconfig_lagg0="laggproto failover laggport sfxge0 laggport sfxge1
xxx.xxx.xxx.xxx/24"
Output of the pciconf -lv:
sfxge0 at pci0:133:0:0: class=0x020000 card=0x80171924 chip=0x0a031924
rev=0x02 hdr=0x00
vendor = 'Solarflare Communications'
device = 'SFC9220 10/40G Ethernet Controller'
class = network
subclass = ethernet
sfxge1 at pci0:133:0:1: class=0x020000 card=0x80171924 chip=0x0a031924
rev=0x02 hdr=0x00
vendor = 'Solarflare Communications'
device = 'SFC9220 10/40G Ethernet Controller'
class = network
subclass = ethernet
The simplest fix is to reboot server and everything works as before, but
this isn't the best option. When I tried to restart networking, during
one of the troubleshooting session, (/etc/rc.d/netif restart) the
process got stuck and I saw several message in the logs
kernel: sfxge0: Cannot flush Tx queue 23
kernel: sfxge0: Cannot flush Tx queue 15
kernel: sfxge0: Cannot flush Rx queue 23
kernel: sfxge0: Cannot flush Rx queue 15
I don't have access to switch to see what's going on, but from what I
hear they don't see anything suspicious, which rolling out switch issue.
The latest step in troubleshoot is to disable tso4, tso6 and LRO by running
ifconfig sfxge0 -tso4 -tso6 -lro
Not sure if that helped yet.
Any help would be appreciated.
Thanks!
More information about the freebsd-net
mailing list