[Bug 209351] VLAN TX errors, possible performance regression after 10.1-STABLE (r281235)
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Sat May 7 00:08:50 UTC 2016
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209351
Bug ID: 209351
Summary: VLAN TX errors, possible performance regression after
10.1-STABLE (r281235)
Product: Base System
Version: 11.0-CURRENT
Hardware: amd64
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: freebsd-bugs at FreeBSD.org
Reporter: zclaudio at bsd.com.br
CC: freebsd-amd64 at FreeBSD.org
CC: freebsd-amd64 at FreeBSD.org
On a BGP, running FreeBSD 10.1-STABLE, version r281235 and it works fine for
several years now. After upgrading to any newer version I start having vlan TX
errors on the exact same hardware, just booting an SSD with a newer system.
Details:
We have around 4Gbit/s and 1.8Mpps routed on peak while per port interface we
peak at 300Kpps.
Our quality metrics are measured with:
ping -s 1472 -i 0.1 <our-other-ibgp-router>
As well as iperf bidirecional.
Systems working w/o problem:
- 10.1-STABLE / r281235
Systems tested with drops:
- 10.2-STABLE / r292035M
- 10.3-STABLE / r298705
- 11.0-CURRENT / r295683 (downloaded snapshot from ftp.freebsd.org)
- 11.0-CURRENT Melifaro Routing Branch / r297731M
While testing, when errors happen I can see output errs on the vlan port on the
output from "netstat -w1 -I vlan6"
input vlan6 output
packets errs idrops bytes packets errs bytes colls
1 0 0 66 30557 2 33310968 0
1 0 0 105 31458 3 33912219 0
2 0 0 2954 32001 8 34983986 0
1 0 0 1512 33150 6 35942558 0
1 0 0 1512 33654 4 37311862 0
1 0 0 1512 34825 3 38213793 0
3 0 0 1683 35376 4 39488912 0
5 0 0 7280 32423 3 35551869 0
Problems may happen under high load (~200Kpps) or low load (~30Kpps) on a vlan
port.
The observed frame loss never happens on untagged ports, only vlan related.
The observed loss happens with packets sized 900 bytes and above but noticeably
loss rate is higher with packets close to 1400 (1472 is my reference size).
Loss rate on all listed systems different from r281235 is 9-19% with ping(1)
and iperf, while it's 0% (no loss or very irrelevant loss) on r281235.
Hardware tried:
- Intel 82599EB 10-Gigabit SFI/SFP+ Network Connection (2x2 on x8 PCIe bus,
total 4x10G).
- Chelsio T520, 2x2 on x8PCIe bus, total 4x10G
Exactly the same behavior, so it's not Intel related/exclusive.
Same hardware:
I always test the very same hardware, I have two SSD drives in this router, one
for the 10.1 which just runs fine and the other disk to test the various
versions of FreeBSD.
Sysctl/loader:
Only minor loader and sysctl confs are tweaked:
kern.hz=2000
net.inet.ip.redirect=1 # do not send IP redirects
net.inet.ip.accept_sourceroute=0 # drop source routed packets since they
ca
net.inet.ip.sourceroute=0 # if source routed packets are accepted
th
net.inet.tcp.drop_synfin=1 # SYN/FIN packets get dropped on initial
c
net.inet.udp.blackhole=1 # drop udp packets destined for closed
soc
net.inet.tcp.blackhole=2 # drop tcp packets destined for closed
por
security.bsd.see_other_uids=0
Netstat output when errors happen:
input vlan6 output
packets errs idrops bytes packets errs bytes colls
1 0 0 66 30557 2 33310968 0
1 0 0 105 31458 3 33912219 0
2 0 0 2954 32001 8 34983986 0
1 0 0 1512 33150 6 35942558 0
1 0 0 1512 33654 4 37311862 0
1 0 0 1512 34825 3 38213793 0
3 0 0 1683 35376 4 39488912 0
5 0 0 7280 32423 3 35551869 0
No relevant errors on the phisical ix(4) o cxl(4) ports happen.
It's very easy to simulate/reproduce in my environment, I just need to boot a
newer system and very soon some vlan start to drop packets which are not
dropped on 10.1-STABLE and I can be contacted if a developer want to ssh in. I
can also updated this PR with more informatio if needed.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the freebsd-amd64
mailing list