fastforward/routing: a 3 million packet-per-second system?
Julian Elischer
julian at freebsd.org
Fri Jul 25 17:14:01 UTC 2014
On 7/22/14, 11:18 PM, John Jasen wrote:
> Feedback and/or tips and tricks more than welcome.
>
> Outstanding questions:
>
> Would increasing the number of processor cores help?
>
> Would a system where both processor QPI ports connect to each other
> mitigate QPI bottlenecks?
>
> Are there further performance optimizations I am missing?
>
> Server Description:
>
> The system in question is a Dell Poweredge R820, 16GB of RAM, and two
> Intel(R) Xeon(R) CPU E5-4610 0 @ 2.40GHz.
>
> Onboard, in a 16x PCIe slot, I have one Chelsio T-580-CR two-port 40GbE
> NIC, and in an 8x slot, another T-580-CR dual port.
>
> I am running FreeBSD 10.0-STABLE.
>
> BIOS tweaks:
>
> Hyperthreading (or Logical Processors) is turned off.
while this used to be a win the newer processors have got this (more)
right
so that logical processors can be a real win now.
Make sure you KNOW you need this turned off by doing tests.
> Memory Node Interleaving is turned off, but did not appear to impact
> performance.
>
> /boot/loader.conf contents:
> #for CARP+PF testing
> carp_load="YES"
> #load cxgbe drivers.
> cxgbe_load="YES"
> #maxthreads appears to not exceed CPU.
> net.isr.maxthreads=12
> #bindthreads may be indicated when using cpuset(1) on interrupts
> net.isr.bindthreads=1
> #random guess based on googling
> net.isr.maxqlimit=60480
> net.link.ifqmaxlen=90000
> #discussions with cxgbe maintainer and list led me to trying this.
> Allows more interrupts
> #to be fixed to CPUs, which in some cases, improves interrupt balancing.
> hw.cxgbe.ntxq10g=16
> hw.cxgbe.nrxq10g=16
>
> /etc/sysctl.conf contents:
>
> #the following is also enabled by rc.conf gateway_enable.
> net.inet.ip.fastforwarding=1
> #recommendations from BSD router project
> kern.random.sys.harvest.ethernet=0
> kern.random.sys.harvest.point_to_point=0
> kern.random.sys.harvest.interrupt=0
> #probably should be removed, as cxgbe does not seem to affect/be
> affected by irq storm settings
> hw.intr_storm_threshold=25000000
> #based on Calomel.Org performance suggestions. 4x40GbE, seemed
> reasonable to use 100GbE settings
> kern.ipc.maxsockbuf=1258291200
> net.inet.tcp.recvbuf_max=1258291200
> net.inet.tcp.sendbuf_max=1258291200
> #attempting to play with ULE scheduler, making it serve packets versus
> netstat
> kern.sched.slice=1
> kern.sched.interact=1
>
> /etc/rc.conf contains:
>
> hostname="fbge1"
> #should remove, especially given below duplicate entry
> ifconfig_igb0="DHCP"
> sshd_enable="YES"
> #ntpd_enable="YES"
> # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
> dumpdev="AUTO"
> # OpenBSD PF options to play with later. very bad for raw packet rates.
> #pf_enable="YES"
> #pflog_enable="YES"
> # enable packet forwarding
> # these enable forwarding and fastforwarding sysctls. inet6 does not
> have fastforward
> gateway_enable="YES"
> ipv6_gateway_enable="YES"
> # enable OpenBSD ftp-proxy
> # should comment out until actively playing with PF
> ftpproxy_enable="YES"
> #left in place, commented out from prior testing
> #ifconfig_mlxen1="inet 172.16.2.1 netmask 255.255.255.0 mtu 9000"
> #ifconfig_mlxen0="inet 172.16.1.1 netmask 255.255.255.0 mtu 9000"
> #ifconfig_mlxen3="inet 172.16.7.1 netmask 255.255.255.0 mtu 9000"
> #ifconfig_mlxen2="inet 172.16.8.1 netmask 255.255.255.0 mtu 9000"
> # -lro and -tso options added per mailing list suggestion from Bjoern A.
> Zeeb (bzeeb-lists at lists.zabbadoz.net)
> ifconfig_cxl0="inet 172.16.3.1 netmask 255.255.255.0 mtu 9000 -lro -tso up"
> ifconfig_cxl1="inet 172.16.4.1 netmask 255.255.255.0 mtu 9000 -lro -tso up"
> ifconfig_cxl2="inet 172.16.5.1 netmask 255.255.255.0 mtu 9000 -lro -tso up"
> ifconfig_cxl3="inet 172.16.6.1 netmask 255.255.255.0 mtu 9000 -lro -tso up"
> # aliases instead of reconfiguring test clients. See above commented out
> entries
> ifconfig_cxl0_alias0="172.16.7.1 netmask 255.255.255.0"
> ifconfig_cxl1_alias0="172.16.8.1 netmask 255.255.255.0"
> ifconfig_cxl2_alias0="172.16.1.1 netmask 255.255.255.0"
> ifconfig_cxl3_alias0="172.16.2.1 netmask 255.255.255.0"
> # for remote monitoring/admin of the test device
> ifconfig_igb0="inet 172.30.60.60 netmask 255.255.0.0"
>
> Additional configurations:
> cpuset-chelsio-6cpu-high
> # Original provided by Navdeep Parhar <nparhar at gmail.com>
> # takes vmstat -ai output into a list, and assigns interrupts in order to
> # the available CPU cores.
> # Modified: to assign only to the 'high CPUs', ie: on core1.
> # See: http://lists.freebsd.org/pipermail/freebsd-net/2014-July/039317.html
> #!/usr/local/bin/bash
> ncpu=12
> irqlist=$(vmstat -ia | egrep 't4nex|t5nex|cxgbc' | cut -f1 -d: | cut -c4-)
> i=6
> for irq in $irqlist; do
> cpuset -l $i -x $irq
> i=$((i+1))
> [ $i -ge $ncpu ] && i=6
> done
>
> Client Description:
>
> Two Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz processors
> 64 GB ram
> Mellanox Technologies MT27500 Family [ConnectX-3]
> Centos 6.4 with updates
> iperf3 installed from yum repositories: iperf3-3.0.3-3.el6.x86_64
>
> Test setup:
>
> I've found about 3 streams between Centos clients is about the best way
> to get the most out of them.
> Above certain points, the -b flag does not change results.
> -N is an artifact from using TCP
> -l is needed, as -M doesn't work for UDP.
>
> I usually use launch scripts similar to the following:
>
> for i in `seq 41 60`; do ssh loader$i "export TIME=120; export
> STREAMS=1; export PORT=52$i; export PKT=64; export RATE=2000m;
> /root/iperf-test-8port-udp" & done
>
> The scripts execute the following on each host.
>
> #!/bin/bash
> PORT1=$PORT
> PORT2=$(($PORT+1000))
> PORT3=$(($PORT+2000))
> iperf3 -c loader41-40gbe -u -b 10000m -i 0 -N -l $PKT -t$TIME
> -P$STREAMS -p$PORT1 &
> iperf3 -c loader42-40gbe -u -b 10000m -i 0 -N -l $PKT -t$TIME
> -P$STREAMS -p$PORT1 &
> iperf3 -c loader43-40gbe -u -b 10000m -i 0 -N -l $PKT -t$TIME
> -P$STREAMS -p$PORT1 &
> ... (through all clients and all three ports) ...
> iperf3 -c loader60-40gbe -u -b 10000m -i 0 -N -l $PKT -t$TIME
> -P$STREAMS -p$PORT3 &
>
>
> Results:
>
> Summarized, netstat -w 1 -q 240 -bd, run through:
> cat test4-tuning | egrep -v {'packets | input '} | awk '{ipackets+=$1}
> {idrops+=$3} {opackets+=$5} {odrops+=$9} END {print "input "
> ipackets/NR, "idrops " idrops/NR, "opackets " opackets/NR, "odrops "
> odrops/NR}'
>
> input 1.10662e+07 idrops 8.01783e+06 opackets 3.04516e+06 odrops 3152.4
>
> Snapshot of raw output:
>
> input (Total) output
> packets errs idrops bytes packets errs bytes colls drops
> 11189148 0 7462453 1230805216 3725006 0 409750710 0 799
> 10527505 0 6746901 1158024978 3779096 0 415700708 0 127
> 10606163 0 6850760 1166676673 3751780 0 412695761 0 1535
> 10749324 0 7132014 1182425799 3617558 0 397930956 0 5972
> 10695667 0 7022717 1176521907 3669342 0 403627236 0 1461
> 10441173 0 6762134 1148528662 3675048 0 404255540 0 6021
> 10683773 0 7005635 1175215014 3676962 0 404465671 0 2606
> 10869859 0 7208696 1195683372 3658432 0 402427698 0 979
> 11948989 0 8310926 1314387881 3633773 0 399714986 0 725
> 12426195 0 8864415 1366877194 3562311 0 391853156 0 2762
> 13006059 0 9432389 1430661751 3570067 0 392706552 0 5158
> 12822243 0 9098871 1410443600 3715177 0 408668500 0 4064
> 13317864 0 9683602 1464961374 3632156 0 399536131 0 3684
> 13701905 0 10182562 1507207982 3523101 0 387540859 0
> 8690
> 13820227 0 10244870 1520221820 3562038 0 391823322 0
> 2426
> 14437060 0 10955483 1588073033 3480105 0 382810557 0
> 2619
> 14518471 0 11119573 1597028105 3397439 0 373717355 0
> 5691
> 14890287 0 11675003 1637926521 3199812 0 351978304 0
> 11007
> 14923610 0 11749091 1641594441 3171436 0 348857468 0
> 7389
> 14738704 0 11609730 1621254991 3117715 0 342948394 0
> 2597
> 14753975 0 11549735 1622935026 3207393 0 352812846 0
> 4798
>
>
>
>
>
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>
>
More information about the freebsd-net
mailing list