Poor PF performance with 2.5k rdr's
Ian FREISLICH
ian.freislich at capeaugusta.com
Tue Dec 13 15:18:15 UTC 2016
Chris,
It's been a fairly long time since I ran a FreeBSD router in a
production environment, 10-CURRENT at the time. tcp.sendspace/recvspace
will have no effect on forwarding performance. Digging around in my
email I've found some of my config:
--- /etc/pf.conf
# Options
# ~~~~~~~
set timeout { \
adaptive.start 900000, \
adaptive.end 1800000 \
}
set block-policy return
set state-policy if-bound
set optimization normal
set ruleset-optimization basic
set limit states 1500000
set limit frags 40000
set limit src-nodes 150000
---
--- /etc/sysctl.conf ---
net.inet.ip.fastforwarding=1
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
net.inet.ip.fastforwarding=1
net.inet.carp.preempt=1
net.inet.icmp.icmplim_output=0
net.inet.icmp.icmplim=0
kern.random.sys.harvest.interrupt=0
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.point_to_point=0
net.route.netisr_maxqlen=8192
---
--- /boot/loader.conf
console="comconsole"
net.isr.maxthreads="8"
net.isr.defaultqlimit="4096"
net.isr.maxqlimit="81920"
net.isr.direct="0"
net.isr.direct_force="0"
kern.ipc.nmbclusters="262144"
kern.maxusers="1024"
hw.bce.rx_pages="8"
hw.bce.tx_pages="8"
---
Our pfctl -s inf at the time:
State Table Total Rate
current entries 330022
searches 516720212 91910.4/s
inserts 24545254 4365.9/s
removals 24215232 4307.2/s
Counters
match 66166232 11769.2/s
We were using a different NIC to you and eventually moved to ixgb(4) and
bxe(4) NICs to handle the traffic, but the principle is the same: tune
the queues. We didn't have as many rdr rules as you do, but the rule
set is only linearly searched when there is no matching state in the
state table. This means the rules are linearly searched for the first
packet in each flow.
In my testing, the other large contributor to forwarding rate is L1
cache size. Intel CPUs have traditionally had very small L1 cache sizes
ranging from 12K to 32K and they're almost never quoted in marketing or
comparison material. Your CPU has 32K of L1 data and 32K of L1
instruction cache per core. You may want to try disabling HT if that's
possible these days to reduce L1 contention with the HT instance on each
core. I may be talking total rubbish regarding HT and cache
architecture but I think it's worth a try.
Ian
--
Ian Freislich
On 12/11/16 11:22, chris g wrote:
> Hello,
>
> I've decided to write here, as we had no luck troubleshooting PF's
> poor performance on 1GE interface.
>
> Network scheme, given as simplest as possible is:
>
> ISP <-> BGP ROUTER <-> PF ROUTER with many rdr rules <-> LAN
>
> Problem is reproducible on any PF ROUTER's connection - to LAN and to BGP ROUTER
>
>
> Both BGP and PF routers' OS versions and tunables, hardware:
>
> Hardware: E3-1230 V2 with HT on, 8GB RAM, ASUS P8B-E, NICs: Intel I350 on PCIe
>
> FreeBSD versions tested: 9.2-RELEASE amd64 with Custom kernel,
> 10.3-STABLE(compiled 4th Dec 2016) amd64 with Generic kernel.
>
> Basic tunables (for 9.2-RELEASE):
> net.inet.ip.forwarding=1
> net.inet.ip.fastforwarding=1
> kern.ipc.somaxconn=65535
> net.inet.tcp.sendspace=65536
> net.inet.tcp.recvspace=65536
> net.inet.udp.recvspace=65536
> kern.random.sys.harvest.ethernet=0
> kern.random.sys.harvest.point_to_point=0
> kern.random.sys.harvest.interrupt=0
> kern.polling.idle_poll=1
>
> BGP router doesn't have any firewall.
>
> PF options of PF router are:
> set state-policy floating
> set limit { states 2048000, frags 2000, src-nodes 384000 }
> set optimization normal
>
>
> Problem description:
> We are experiencing low throughput when PF is enabled with all the
> rdr's. If 'skip' is set on benchmarked interface or the rdr rules are
> commented (not present) - the bandwidth is flawless. In PF, there is
> no scrubbing done, most of roughly 2500 rdr rules look like this,
> please note that no interface is specified and it's intentional:
>
> rdr pass inet proto tcp from any to 1.2.3.4 port 1235 -> 192.168.0.100 port 1235
>
> All measurements were taken using iperf 2.0.5 with options "-c <IP>"
> or "-c <IP> -m -t 60 -P 8" on client side and "-s" on server side. We
> changed directions too.
> Please note that this is a production environment and there was some
> other traffic on bencharked interfaces (let's say 20-100Mbps) during
> both tests, thus iperf won't show full Gigabit. There is no networking
> eqipment between 'client' and 'server' - just 2 NICs independly
> connected with Cat6 cable.
>
> Without further ado, here are benchmark results:
>
> server's PF enabled with fw rules but without rdr rules:
> root at client:~ # iperf -c server
> ------------------------------------------------------------
> Client connecting to server, TCP port 5001
> TCP window size: 65.0 KByte (default)
> ------------------------------------------------------------
> [ 3] local clients_ip port 51361 connected with server port 5001
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0-10.0 sec 1.09 GBytes 936 Mbits/sec
>
>
>
> server's PF enabled with fw rules and around 2500 redirects present:
> root at client:~ # iperf -c seerver
> ------------------------------------------------------------
> Client connecting to server, TCP port 5001
> TCP window size: 65.0 KByte (default)
> ------------------------------------------------------------
> [ 3] local clients_ip port 45671 connected with server port 5001
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0-10.0 sec 402 MBytes 337 Mbits/sec
>
>
> That much of a difference is 100% reproducible on production env.
>
> Performance depends on hours of day&night, the result is 160-400Mbps
> with RDR rules present and always above 900Mbps with RDR rules
> disabled.
>
>
> Some additional information:
>
> # pfctl -s info
> Status: Enabled for 267 days 10:25:22 Debug: Urgent
>
> State Table Total Rate
> current entries 132810
> searches 5863318875 253.8/s
> inserts 140051669 6.1/s
> removals 139918859 6.1/s
> Counters
> match 1777051606 76.9/s
> bad-offset 0 0.0/s
> fragment 191 0.0/s
> short 518 0.0/s
> normalize 0 0.0/s
> memory 0 0.0/s
> bad-timestamp 0 0.0/s
> congestion 0 0.0/s
> ip-option 4383 0.0/s
> proto-cksum 0 0.0/s
> state-mismatch 52574 0.0/s
> state-insert 172 0.0/s
> state-limit 0 0.0/s
> src-limit 0 0.0/s
> synproxy 0 0.0/s
>
> # pfctl -s states | wc -l
> 113705
>
> # pfctl -s memory
> states hard limit 2048000
> src-nodes hard limit 384000
> frags hard limit 2000
> tables hard limit 1000
> table-entries hard limit 200000
>
> # pfctl -s Interfaces|wc -l
> 75
>
> # pfctl -s rules | wc -l
> 1226
>
>
> In our opinion hardware is not too weak as we have only 10-30% of CPU
> usage and during the benchmark it doesn't go to 100%. Even any one
> vcore isn't filled up to 100% of CPU usage.
>
>
> I would be really grateful if someone could point me at the right direction.
>
>
> Thank you,
> Chris
> _______________________________________________
> freebsd-pf at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-pf
> To unsubscribe, send any mail to "freebsd-pf-unsubscribe at freebsd.org"
--
Cape Augusta Digital Properties, LLC a Cape Augusta Company
*Breach of confidentiality & accidental breach of confidentiality *
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
prohibited.
More information about the freebsd-pf
mailing list