[Bug 221919] ixl: TX queue hang when using TSO and having a high and mixed network load

Sat Dec 15 14:19:05 UTC 2018

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221919

--- Comment #20 from Peter Eriksson <peter.x.eriksson at liu.se> ---
Just a quick note that we're still seeing the same problem on our production
servers if we enable "tso" on the 10G interfaces. FreeBSD 11.2-RELEASE-p6.
Haven't been able to reproduce it on the test servers (identical hardware)
running 11.2-RELEASE-p5 (and 12-0-RELEASE) so far though (but they don't see
any traffic)...

Driver version:
> dev.ixl.0.%desc: Intel(R) Ethernet Connection 700 Series PF Driver, Version - 1.9.9-k

Firmware:
> dev.ixl.0.fw_version: fw 6.80.48812 api 1.7 nvm 6.00 etid 80003751 oem 18.4608.17

Watch Events in the output from sysctl -a
> dev.ixl.0.watchdog_events: 4

Dmesg errors:
> ixl0: WARNING: queue 3 appears to be hung!
> ixl0: WARNING: queue 2 appears to be hung!
> ixl2: WARNING: queue 2 appears to be hung!
> ixl2: WARNING: queue 4 appears to be hung!
> ixl2: WARNING: queue 7 appears to be hung!
> ixl2: WARNING: queue 3 appears to be hung!
> ixl0: WARNING: queue 7 appears to be hung!
> ixl2: WARNING: queue 3 appears to be hung!
> ixl0: WARNING: queue 4 appears to be hung!

(Output from ifconfig with TSO disabled)
> # ifconfig lagg0
> lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> 	> options=6404bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
> 	ether 3c:fd:fe:25:47:a0
> 	inet6 fe80::3efd:feff:fe25:47a0%lagg0 prefixlen 64 scopeid 0xa
> 	inet6 2001:6b0:17:2400::8:43 prefixlen 64
> 	inet 130.236.8.43 netmask 0xffffffe0 broadcast 130.236.8.63
> 	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
> 	media: Ethernet autoselect
> 	status: active
> 	groups: lagg
> 	laggproto lacp lagghash l2,l3,l4
> 	laggport: ixl0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
> 	laggport: ixl2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

iperf3 output with TSO disabled:
> # iperf3 -c filur00 -t4
> Connecting to host filur00, port 5201
> [  5] local 2001:6b0:17:2400::8:43 port 51226 connected to 2001:6b0:17:2400::8:40 port 5201
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-1.00   sec   318 MBytes  2.66 Gbits/sec    0    561 KBytes
> [  5]   1.00-2.00   sec   350 MBytes  2.94 Gbits/sec    0   1.11 MBytes
> [  5]   2.00-3.00   sec   392 MBytes  3.28 Gbits/sec    0   1.67 MBytes
> [  5]   3.00-4.00   sec   351 MBytes  2.94 Gbits/sec    0   1.77 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-4.00   sec  1.38 GBytes  2.95 Gbits/sec    0             sender
> [  5]   0.00-4.00   sec  1.38 GBytes  2.95 Gbits/sec                  receiver
> 
> iperf Done.

With TSO enabled (when things work):

> # ifconfig lagg0 tso ; iperf3 -c filur00 -t4
> Connecting to host filur00, port 5201
> [  5] local 2001:6b0:17:2400::8:43 port 51237 connected to 2001:6b0:17:2400::8:40 port 5201
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-1.00   sec   976 MBytes  8.19 Gbits/sec    0    492 KBytes
> [  5]   1.00-2.00   sec  1.08 GBytes  9.29 Gbits/sec    0   1021 KBytes
> [  5]   2.00-3.00   sec  1.08 GBytes  9.29 Gbits/sec    0   1.50 MBytes
> [  5]   3.00-4.00   sec  1.08 GBytes  9.28 Gbits/sec    0   1.75 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-4.00   sec  4.20 GBytes  9.01 Gbits/sec    0             sender
> [  5]   0.00-4.00   sec  4.19 GBytes  9.01 Gbits/sec                  receiver
> 
> iperf Done.

But often queues get stuck and freezes. Hmm.. I just noticed that it was IPv6
that stopped working when I tried to enable it on a production server and ran
iperf3 on it - IPv4 traffic was still passing thru. 

Can it be that there still are IPv6 (TSO6)-related bugs and that the IPv4 ones
are solved? Too bad I can't find a way to force it to happen on the test
servers...

-- 
You are receiving this mail because:
You are on the CC list for the bug.