Poor PF performance with 2.5k rdr's
chris g
cgodspd at gmail.com
Sun Dec 11 16:22:40 UTC 2016
Hello,
I've decided to write here, as we had no luck troubleshooting PF's
poor performance on 1GE interface.
Network scheme, given as simplest as possible is:
ISP <-> BGP ROUTER <-> PF ROUTER with many rdr rules <-> LAN
Problem is reproducible on any PF ROUTER's connection - to LAN and to BGP ROUTER
Both BGP and PF routers' OS versions and tunables, hardware:
Hardware: E3-1230 V2 with HT on, 8GB RAM, ASUS P8B-E, NICs: Intel I350 on PCIe
FreeBSD versions tested: 9.2-RELEASE amd64 with Custom kernel,
10.3-STABLE(compiled 4th Dec 2016) amd64 with Generic kernel.
Basic tunables (for 9.2-RELEASE):
net.inet.ip.forwarding=1
net.inet.ip.fastforwarding=1
kern.ipc.somaxconn=65535
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536
net.inet.udp.recvspace=65536
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.point_to_point=0
kern.random.sys.harvest.interrupt=0
kern.polling.idle_poll=1
BGP router doesn't have any firewall.
PF options of PF router are:
set state-policy floating
set limit { states 2048000, frags 2000, src-nodes 384000 }
set optimization normal
Problem description:
We are experiencing low throughput when PF is enabled with all the
rdr's. If 'skip' is set on benchmarked interface or the rdr rules are
commented (not present) - the bandwidth is flawless. In PF, there is
no scrubbing done, most of roughly 2500 rdr rules look like this,
please note that no interface is specified and it's intentional:
rdr pass inet proto tcp from any to 1.2.3.4 port 1235 -> 192.168.0.100 port 1235
All measurements were taken using iperf 2.0.5 with options "-c <IP>"
or "-c <IP> -m -t 60 -P 8" on client side and "-s" on server side. We
changed directions too.
Please note that this is a production environment and there was some
other traffic on bencharked interfaces (let's say 20-100Mbps) during
both tests, thus iperf won't show full Gigabit. There is no networking
eqipment between 'client' and 'server' - just 2 NICs independly
connected with Cat6 cable.
Without further ado, here are benchmark results:
server's PF enabled with fw rules but without rdr rules:
root at client:~ # iperf -c server
------------------------------------------------------------
Client connecting to server, TCP port 5001
TCP window size: 65.0 KByte (default)
------------------------------------------------------------
[ 3] local clients_ip port 51361 connected with server port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.09 GBytes 936 Mbits/sec
server's PF enabled with fw rules and around 2500 redirects present:
root at client:~ # iperf -c seerver
------------------------------------------------------------
Client connecting to server, TCP port 5001
TCP window size: 65.0 KByte (default)
------------------------------------------------------------
[ 3] local clients_ip port 45671 connected with server port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 402 MBytes 337 Mbits/sec
That much of a difference is 100% reproducible on production env.
Performance depends on hours of day&night, the result is 160-400Mbps
with RDR rules present and always above 900Mbps with RDR rules
disabled.
Some additional information:
# pfctl -s info
Status: Enabled for 267 days 10:25:22 Debug: Urgent
State Table Total Rate
current entries 132810
searches 5863318875 253.8/s
inserts 140051669 6.1/s
removals 139918859 6.1/s
Counters
match 1777051606 76.9/s
bad-offset 0 0.0/s
fragment 191 0.0/s
short 518 0.0/s
normalize 0 0.0/s
memory 0 0.0/s
bad-timestamp 0 0.0/s
congestion 0 0.0/s
ip-option 4383 0.0/s
proto-cksum 0 0.0/s
state-mismatch 52574 0.0/s
state-insert 172 0.0/s
state-limit 0 0.0/s
src-limit 0 0.0/s
synproxy 0 0.0/s
# pfctl -s states | wc -l
113705
# pfctl -s memory
states hard limit 2048000
src-nodes hard limit 384000
frags hard limit 2000
tables hard limit 1000
table-entries hard limit 200000
# pfctl -s Interfaces|wc -l
75
# pfctl -s rules | wc -l
1226
In our opinion hardware is not too weak as we have only 10-30% of CPU
usage and during the benchmark it doesn't go to 100%. Even any one
vcore isn't filled up to 100% of CPU usage.
I would be really grateful if someone could point me at the right direction.
Thank you,
Chris
More information about the freebsd-pf
mailing list