Poor performance with natd/ipfw and TSO enabled on bce(4) card
and 8.1-PRERELEASE
Garrett Cooper
yanefbsd at gmail.com
Fri Jul 2 05:30:41 UTC 2010
On Thu, Jul 1, 2010 at 9:19 PM, Ian Smith <smithi at nimnet.asn.au> wrote:
> On Thu, 1 Jul 2010, Garrett Cooper wrote:
> > On Thu, Jul 1, 2010 at 4:54 PM, Pyun YongHyeon <pyunyh at gmail.com> wrote:
> > > On Wed, Jun 30, 2010 at 07:00:53PM -0700, Garrett Cooper wrote:
> > >> Hi,
> > >> Just an observation I made while transferring a file:
> > >>
> > >> # time scp floppy.img somehost:
> > >> Password:
> > >> floppy.img 100% 1440KB 13.7KB/s 01:45
> > >>
> > >> real 1m59.400s
> > >> user 0m0.031s
> > >> sys 0m0.028s
> > >> # sysctl net.inet.tcp.tso=0
> > >> net.inet.tcp.tso: 1 -> 0
> > >> # time scp floppy.img somehost:
> > >> floppy.img 100% 1440KB 1.4MB/s 00:00
> > >>
> > >> real 0m0.712s
> > >> user 0m0.018s
> > >> sys 0m0.018s
> > >>
> > >> Going ISDN speeds transferring a 1.44MB file is sad when you have
> > >> a gigabit uplink :(... natd seems to be doing a LOT of spinning when
> > >> TSO is enabled (it's going up to 73% CPU on a dual-proc quad-core
> > >> machine).
> > >
> > > I would use pf(4) if I have to handle lots of NAT rules.
>
> There's only one NAT rule here, not clear how many active NAT sessions
> are involved. I'm tending to doubt this is really a natd issue; natd
> has no interaction with interface issues like TSO, that I know of,
> hopefully someone will correct me if I'm wrong about that.
>
> > >> Here are some other details:
> > >>
> > >> # ipfw list
> > >> 00050 divert 8668 ip4 from any to any via bce1
> > >> 00100 allow ip from any to any via lo0
> > >> 00200 deny ip from any to 127.0.0.0/8
> > >> 00300 deny ip from 127.0.0.0/8 to any
> > >> 00400 deny ip from any to ::1
> > >> 00500 deny ip from ::1 to any
> > >> 00600 allow ipv6-icmp from :: to ff02::/16
> > >> 00700 allow ipv6-icmp from fe80::/10 to fe80::/10
> > >> 00800 allow ipv6-icmp from fe80::/10 to ff02::/16
> > >> 00900 allow ipv6-icmp from any to any ip6 icmp6types 1
> > >> 01000 allow ipv6-icmp from any to any ip6 icmp6types 2,135,136
> > >> 65000 allow ip from any to any
> > >> 65535 deny ip from any to any
> > >> # ls /etc/natd*
> > >> ls: /etc/natd*: No such file or directory
>
> I assume that's the 'open' rc.firewall ruleset?
Yes.
$ grep ^firewall /etc/rc.conf
firewall_type="open"
> So you have no
> natd.conf, and are taking all defaults? Just to check the config:
Correct.
$ ls /etc/natd.conf
ls: /etc/natd.conf: No such file or directory
> # grep natd_ /etc/rc.conf
$ grep ^natd_ /etc/rc.conf
natd_enable="YES"
natd_interface="bce1"
> # ps axw | grep "[n]atd"
>
> Do you have options IPFIREWALL and IPDIVERT in kernel, or are you
> loading these as modules?
Modules.
$ egrep 'IPDIVERT|IPFIREWALL' /root/TAMESHI_STABLE
$ make -VMODULES_OVERRIDE -f /etc/src.conf foo
bce bge em bridgestp if_bridge ipdivert ipfw ipfw_nat libalias
i2c/smbus ipmi ipmi/ipmi_linux linprocfs linsysfs linux
> > >> # uname -a
> > >> FreeBSD tameshi.cisco.com 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #0
> > >> r209169: Mon Jun 14 12:41:49 PDT 2010
> > >> root@:/usr/obj/data/scratch/src/stable/8/sys/TAMESHI_STABLE amd64
> > >> # pciconf -lv | grep -A 4 bce
> > >> bce1 at pci0:7:0:0: class=0x020000 card=0x01b21028 chip=0x164c14e4
> > >> rev=0x12 hdr=0x00
> > >> vendor = 'Broadcom Corporation'
> > >> device = 'Broadcom NetXtreme II Gigabit Ethernet Adapter (BCM5708)'
> > >> class = network
> > >> subclass = ethernet
> > >> --
> > >> bce0 at pci0:3:0:0: class=0x020000 card=0x01b21028 chip=0x164c14e4
> > >> rev=0x12 hdr=0x00
> > >> vendor = 'Broadcom Corporation'
> > >> device = 'Broadcom NetXtreme II Gigabit Ethernet Adapter (BCM5708)'
> > >> class = network
> > >> subclass = ethernet
> > >>
> > >> Let me know what other info is required.
> > >
> > > Can you reproduce this issue on other TSO capable drivers?
> > > I'm not aware of any TSO issues on bce(4).
> >
> > Hi Pyun!
> >
> > I'll have to pop in a Copper Intel card that we have laying around in
> > the lab. I think it's em(4) compatible.. I forget... I have a few
> > things to test network wise this weekend, so I'll try and repro a few
> > things this weekend (say, Sunday?).
> >
> > I also have my msk(4) enabled machine in the lab I can test with, but
> > I'll have to install the machine to spec with the Poweredge 2950 I
> > have in the lab.
> >
> > I'm using ipfw because it was easy to setup according to the handbook,
> > but in reality if ipfw is this bad dealing with nat rules, then I need
> > to work with someone to improve how it scales.
>
> Unless there's something weird with tagging or something going on with
> divert sockets, this looks like something else;
Ok.
> natd usually works fine
> at much higher rates, but I can't talk about gigabit. Though in-kernel
> NAT should be better at the higher throughput end,
But this panics deterministically as I've shown in another thread on
8-STABLE, so unfortunately I can't use this.
> your 'ISDN' rate and the high CPU usage for natd is certainly not typical.
That I wouldn't doubt.
> Does this box have a public IP address on bce1?
Nope.
> It's not clear whether you're doing this transfer from this box, or from another, through it, ie what address translation is expected?
I'm doing the transfer from tameshi.cisco.com to ironport1.cisco.com
via (what I would hope) is the public interface -- bce1 -- because my
routes are setup that way:
$ netstat -nr
Routing tables
Internet:
Destination Gateway Flags Refs Use Netif Expire
default 173.37.10.1 UGS 0 42504770 bce1
127.0.0.1 link#1 UH 0 2052 lo0
173.37.10.0/24 link#4 U 38 2752472 bce1
173.37.10.6 link#4 UHS 0 20258228 lo0
192.168.20.0/22 link#3 U 3 5570413 bce0
192.168.20.1 link#3 UHS 0 2572 lo0
192.168.21.1 link#3 UHS 0 0 lo0
192.168.22.1 link#3 UHS 0 0 lo0
192.168.23.1 link#3 UHS 0 0 lo0
192.168.24.0/22 link#3 U 0 0 bce0
192.168.24.1 link#3 UHS 0 0 lo0
Internet6:
Destination Gateway Flags
Netif Expire
::1 ::1 UH lo0
fe80::%lo0/64 link#1 U lo0
fe80::1%lo0 link#1 UHS lo0
ff01:1::/32 fe80::1%lo0 U lo0
ff02::%lo0/32 fe80::1%lo0 U lo0
$ ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=3<RXCSUM,TXCSUM>
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
inet6 ::1 prefixlen 128
inet 127.0.0.1 netmask 0xff000000
nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
ipfw0: flags=8800<SIMPLEX,MULTICAST> metric 0 mtu 65536
bce0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>
metric 0 mtu 1500
options=c01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
ether 00:1e:4f:38:65:ab
inet 192.168.20.1 netmask 0xfffffc00 broadcast 192.168.23.255
inet 192.168.21.1 netmask 0xfffffc00 broadcast 192.168.23.255
inet 192.168.22.1 netmask 0xfffffc00 broadcast 192.168.23.255
inet 192.168.23.1 netmask 0xfffffc00 broadcast 192.168.23.255
inet 192.168.24.1 netmask 0xfffffc00 broadcast 192.168.27.255
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
bce1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=c01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
ether 00:1e:4f:38:65:ad
inet 173.37.10.6 netmask 0xffffff00 broadcast 173.37.10.255
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
I would expect bce1 -> bce0 to hop a vlan, but apart from that
transfer speeds should be reasonably fast. It (ironport1) is a
semi-ancient Sparc machine, so I don't expect the speeds to be blazing
fast, but I've gotten up to 15 MBps on a good day.
> Where is 'somehost'?
Ok, bleh... turns out that someone internally used somehost as a
CNAME, so rather than obfuscating things I'll just divulge the real
hostname because it's needed:
$ host ironport1.cisco.com ; host tameshi.cisco.com
ironport1.cisco.com has address 173.37.5.41
tameshi.cisco.com has address 173.37.10.6
> Hence, knowing natd's config options and net topology might be helpful.
Fair enough .. security by obscurity isn't going to do any difference
because all of this crud is behind the corporate firewall anyhow :).
Another weird thing I noticed when I looked at it further is that
dhcpcd's usage is spiking up to 33% instead of remaining near idle,
and I had no idea why; so I truss'ed the process and there's a lot of
chatter on 127.0.0.1:0 with recvfrom, so it looks like the traffic is
being broadcast to all ports instead of port 67/68, and something
looks horribly broken in the networking stack with TSO on. Turning off
TSO shows that _no_ traffic is being intercepted via lo0 by dhcpd when
I scp the file, which I would expect to occur.
I'll see whether or not there are any firmware upgrades for the NIC on
this machine because there might be some sort of hardware errata that
I need to take into consideration. I'll get back to you guys after I
do that because I'm concerned that that might be an issue.
Thanks,
-Garrett
More information about the freebsd-net
mailing list