Regression? VLAN packet drop after upgrading from r281235
Zé Claudio Pastore
zclaudio at bsd.com.br
Wed Apr 27 18:41:43 UTC 2016
Hello,
On a BGP border router I help manage, we run FreeBSD 10.1-STABLE,
version r281235 and it works fine for several years now.
We have around 4Gbit/s and 1.8Mpps routed on peak while per port interface
we peak at 300Kpps.
Our quality metrics are measured with:
ping -s 1472 -i 0.1 <our-other-ibgp-router>
As well as iperf bidirecional.
This metric is similar to what Speedy Test and SIMET tests are done and our
customers reference.
Systems working w/o problem:
- 10.1-STABLE / r281235
Systems tested with drops:
- 10.2-STABLE / r292035M
- 10.3-STABLE / r298705
- 11.0-CURRENT / r295683 (downloaded snapshot from ftp.freebsd.org)
- 11.0-CURRENT Melifaro Routing Branch / r297731M
While testing, when errors happen I can see output errs on the vlan port on
the output from "netstat -w1 -I vlan6"
input vlan6 output
packets errs idrops bytes packets errs bytes colls
1 0 0 66 30557 2 33310968 0
1 0 0 105 31458 3 33912219 0
2 0 0 2954 32001 8 34983986 0
1 0 0 1512 33150 6 35942558 0
1 0 0 1512 33654 4 37311862 0
1 0 0 1512 34825 3 38213793 0
3 0 0 1683 35376 4 39488912 0
5 0 0 7280 32423 3 35551869 0
Problems may happen under high load (~200Kpps) or low load (~30Kpps) on a
vlan port. The observed frame loss never happens on untagged ports, only
vlan related. The observed loss happens with packets sized 900 bytes and
above but noticeably loss rate is higher with packets close to 1400 (1472
is my reference size).
Loss rate on all listed systems different from r281235 is 9-19% with
ping(1) and iperf, while it's 0% on r281235.
First I believed it to be a Intel driver error on systems newer than 10.1.
My reference card are dual port 82599EB 10-Gigabit SFI/SFP+ Network
Connection (2x2 on x8 PCIe bus, total 4x10G). But yesterday I replaced
Intel by Chelsio T5 and the problem is still exactly the same, so it's not
related to card vendor.
I always test the very same hardware, I have two SSD drives in this router,
one for the 10.1 which just runs fine and the other disk to test the
various versions of FreeBSD.
Only minor loader and sysctl confs are tweaked:
kern.hz=2000
net.inet.ip.redirect=1 # do not send IP redirects
net.inet.ip.accept_sourceroute=0 # drop source routed packets since
they ca
net.inet.ip.sourceroute=0 # if source routed packets are
accepted th
net.inet.tcp.drop_synfin=1 # SYN/FIN packets get dropped on
initial c
net.inet.udp.blackhole=1 # drop udp packets destined for
closed soc
net.inet.tcp.blackhole=2 # drop tcp packets destined for
closed por
security.bsd.see_other_uids=0
Can anyone suggest what might be a fix/tuning for this behavior? Was there
any relevant change on vlan code from particular revisions close to the one
I run on 10.1 and later which would lead to such a big difference?
More information about the freebsd-net
mailing list