Significant network latency when using ipfw and in-kernel NAT
Soren Dreijer
dreijer+bsd at echobit.net
Thu Sep 13 17:37:24 UTC 2012
> i'd start by disabling all accelerations (and jumobgrams)
> and then move on from the results to figure out where is the problem.
So, I went ahead and disabled TSO on ix0. That seemed to fix the
intermittent connection issues I had been experiencing with keeping an
XMPP connection alive to one of our internal boxes. It hasn't done
anything for the ICMPs or TCP traffic originating from the FreeBSD
box, of course.
I'm very puzzled how I can ping the box just fine from my home
connection, but I can't ping OUT of the box itself without seeing huge
latency. Similarly, proxying XMPP traffic through the box to an
internal server and proxying back the result is as fast as it should
be, but trying a simple wget on the box times out due to heavy
latency.
/ Soren
On Thu, Sep 13, 2012 at 12:46 PM, Luigi Rizzo <rizzo at iet.unipi.it> wrote:
> On Thu, Sep 13, 2012 at 12:01:56PM -0500, Soren Dreijer wrote:
>> Luigi and Ian,
>>
>> As Ian mentioned, we had some off-list discussion by accident and he
>> suggested the TSO approach too (although I don't know how that would
>> affect e.g. ICMP traffic). It seems to have been a known issue for a
>> while (http://lists.freebsd.org/pipermail/freebsd-net/2010-July/025743.html).
>> Does anybody know if this is still the case in 9-0-RELEASE?
>>
>> I've already done "ifconfig ix1 -tso" to disable TSO on the public
>> nic, but there was no difference. I'm not sure what VLAN_HWTSO means,
>> though. Is the nic doing TSO on its own? Do I need to turn that off as
>> well?. also, do I need to turn off TSO on ix0, which is what the ip
>> tunnel runs over?
>
> i'd start by disabling all accelerations (and jumobgrams)
> and then move on from the results to figure out where is the problem.
>
> When the nat code was written it assumed well-formed
> 1500-byte packets, and it uses the checksums when rebuilding the
> headers. TSO/RSC can generate large segments causing buffer overflows,
> whereas the *XCSUM can generate invalid packets that are sometimes
> recovered by retransmissions.
>
> cheers
> luigi
>
>> Thanks,
>> Soren
>>
>> On Thu, Sep 13, 2012 at 11:30 AM, Luigi Rizzo <rizzo at iet.unipi.it> wrote:
>> >
>> > [top posting for readability]
>> > i have seen this kind of issues related to bad interaction
>> > between the nat code and the various accelerations
>> > (mostly TSO/RSC, but i would also try to disable the
>> > checksums).
>> > Try to remove tso,csum, possibly rsc if you have it, and see
>> > if the problem continues. Please post the result so people
>> > reading this thread in the future can tell whether my suggestion
>> > was useful or not.
>> >
>> > cheers
>> > luigi
>> >
>> >
>> > On Thu, Sep 13, 2012 at 10:48:01AM -0500, Soren Dreijer wrote:
>> >> Definitely. Since this is a server in production, I've obfuscated some
>> >> of the IPs, etc.
>> >>
>> >> First off, here's the ifconfig. Our setup consists of a private (ix0)
>> >> and a public nic (ix1) and an ip tunnel (gif0), which is what we use
>> >> in ipfw to forward incoming packets to our internal boxes:
>> >>
>> >> ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>> >> options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
>> >> ether XX:XX:XX:XX:XX:XX
>> >> inet <private VLAN IP> netmask 0xffffffc0 broadcast xx
>> >> inet6 xxxx::xxx:xxxx:xxxx:xxxx%ix0 prefixlen 64 scopeid 0x7
>> >> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>> >> media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
>> >> status: active
>> >> ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>> >> options=400bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
>> >> ether XX:XX:XX:XX:XX:XX
>> >> inet <public IP> netmask 0xfffffff8 broadcast xx
>> >> inet6 xxxx::xxx:xxxx:xxxx:xxxx%ix1 prefixlen 64 scopeid 0x8
>> >> inet <alias public IP> netmask 0xffffffff broadcast xx
>> >> inet <alias public IP> netmask 0xffffffff broadcast xx
>> >> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>> >> media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
>> >> status: active
>> >> ipfw0: flags=8801<UP,SIMPLEX,MULTICAST> metric 0 mtu 65536
>> >> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>> >> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
>> >> options=3<RXCSUM,TXCSUM>
>> >> inet6 ::1 prefixlen 128
>> >> inet6 fe80::1%lo0 prefixlen 64 scopeid 0xa
>> >> inet 127.0.0.1 netmask 0xff000000
>> >> nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
>> >> gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
>> >> tunnel inet <private VLAN IP> --> <private VLAN IP>
>> >> inet 172.16.1.1 --> 172.16.1.2 netmask 0xffff0000
>> >> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>> >> options=1<ACCEPT_REV_ETHIP_VER>
>> >>
>> >> The basic ruleset looks like this. One-pass is off so that packets are
>> >> reinjected after going through NAT'ing and pipes:
>> >>
>> >> 00001 16653 4417407 allow ip from any to any via ix0
>> >> 00003 14588 2860344 allow ip from any to any via gif1
>> >> 00006 0 0 allow ip from any to any via lo0
>> >> 00010 0 0 deny ip from 192.168.0.0/16 to any in via ix1
>> >> 00011 0 0 deny ip from 172.16.0.0/12 to any in via ix1
>> >> 00012 0 0 deny ip from 10.0.0.0/8 to any in via ix1
>> >> 00013 0 0 deny ip from 127.0.0.0/8 to any in via ix1
>> >> 00014 0 0 deny ip from 0.0.0.0/8 to any in via ix1
>> >> 00015 0 0 deny ip from 169.254.0.0/16 to any in via ix1
>> >> 00016 0 0 deny ip from 192.0.2.0/24 to any in via ix1
>> >> 00017 0 0 deny ip from 204.152.64.0/23 to any in via ix1
>> >> 00018 0 0 deny ip from 224.0.0.0/3 to any in via ix1
>> >> 00019 15 1020 allow icmp from any to any via ix1 # For
>> >> testing purposes, allow all ICMP in and out of the public adapter
>> >> 00020 7537 647951 nat 1 ip from any to any in via ix1 # NAT all
>> >> incoming traffic
>> >> 00030 0 0 check-state # For some reason, this never gets
>> >> matched even though rule #100 is matched
>> >> 00100 161 124340 skipto 805 tcp from any to any out via ix1
>> >> setup keep-state # For testing purposes, allow all TCP originating
>> >> from the box out of the public adapter
>> >> 00110 0 0 skipto 805 icmp from any to any out via ix1 keep-state
>> >> 00200 36557 1996626 skipto 500 tcp from any to 172.16.1.2 dst-port
>> >> 443 in via ix1 # Forward NAT'ed traffic for port 443 over the ip
>> >> tunnel
>> >> 00201 46593 63973143 skipto 805 tcp from 172.16.1.2 443 to any out via ix1
>> >> 00400 8 6192 deny ip from any to any via ix1
>> >> 00500 0 0 pipe 1 ip from any to any in via ix1 # Packet shaping
>> >> 00501 0 0 allow ip from any to any in via ix1
>> >> 00805 8963 3412995 nat 1 ip from any to any out via ix1
>> >> 00806 8963 3412995 allow ip from any to any
>> >> 10000 0 0 deny ip from any to any via ix1 # Last ditch catch
>> >> 65535 864357 867120912 allow ip from any to any
>> >>
>> >> 'ipfw nat show config' yields:
>> >>
>> >> ipfw nat 1 config if ix1 log reset redirect_port tcp 172.16.1.2:443
>> >> <public IP>:443
>> >>
>> >> And finally, here are the horrifying ping times (furthermore, all
>> >> outgoing TCP traffic originating from this box, such as wget or
>> >> pkg_add, time out. I've managed to get an outgoing telnet working, but
>> >> it's horrible slow and takes a while to establish):
>> >>
>> >> PING google.com (74.125.227.14): 56 data bytes
>> >> 64 bytes from 74.125.227.14: icmp_seq=0 ttl=56 time=2746.953 ms
>> >> 64 bytes from 74.125.227.14: icmp_seq=1 ttl=56 time=2097.460 ms
>> >> 64 bytes from 74.125.227.14: icmp_seq=2 ttl=56 time=2186.068 ms
>> >> 64 bytes from 74.125.227.14: icmp_seq=3 ttl=56 time=4292.776 ms
>> >> 64 bytes from 74.125.227.14: icmp_seq=4 ttl=56 time=5056.965 ms
>> >> 64 bytes from 74.125.227.14: icmp_seq=5 ttl=56 time=5323.720 ms
>> >> 64 bytes from 74.125.227.14: icmp_seq=6 ttl=56 time=5007.974 ms
>> >> 64 bytes from 74.125.227.14: icmp_seq=7 ttl=56 time=4756.587 ms
>> >>
>> >> It's worth mentioning that when I switch back to using natd and divert
>> >> in the ruleset (which really only changes the nat portions and
>> >> everything else stays the same), the ping time drops to ~300ms, which
>> >> is a big difference for simply "using" natd even when the ICMP packets
>> >> aren't supposed to be going through NAT'ing whatsoever. The ~300ms
>> >> ping time is still way too high, though, since our other boxes have a
>> >> ping time to Google of ~0.300ms...
>> >>
>> >> Any ideas?
>> >>
>> >> On Thu, Sep 13, 2012 at 7:41 AM, Ian Smith <smithi at nimnet.asn.au> wrote:
>> >> > On Wed, 12 Sep 2012 23:09:27 -0500, Soren Dreijer wrote:
>> >> > > Hi there,
>> >> > >
>> >> > > We're running freebsd 9.0-RELEASE on a box whose primary purpose is to
>> >> > > act as a firewall and a gateway. Up until today, we've been using ipfw
>> >> > > in conjunction with natd and the divert action in ipfw to forward
>> >> > > packets between the freebsd box (i.e. the public Internet) and our
>> >> > > private servers.
>> >> > >
>> >> > > Unfortunately, natd appears to be quite the CPU hog and we therefore
>> >> > > decided to switch to the in-kernel NAT support in ipfw. The issue
>> >> > > we're running in to is that the network latency appears to be
>> >> > > skyrocketing when ipfw contains nat rules. Basically all TCP traffic
>> >> > > originating from the box times out and pinging google.com on the box
>> >> > > gives an average of ~10 SECONDS -- and that's even if I explicitly
>> >> > > allow all ICMP traffic before the packets even get to the nat rules in
>> >> > > ipfw.
>> >> > >
>> >> > > The really odd part, however, is that I can ping the freebsd box just
>> >> > > fine externally. For instance, pinging the server from my home
>> >> > > connection gives an average of 45 ms. I'm also able to communicate
>> >> > > just fine with the internal servers through the freebsd box.
>> >> > >
>> >> > > Does anybody have any idea what's going on? I assume I must've
>> >> > > misconfigured something big here...
>> >> >
>> >> > Or maybe only something small .. but without seeing your basic ruleset
>> >> > and network config - obscured as need be - we can only guess. Maybe an
>> >> > 'ifconfig', 'ipfw show' and 'ipfw nat show config' would illustrate?
>> >> >
>> >> > cheers, Ian
>> >> _______________________________________________
>> >> freebsd-ipfw at freebsd.org mailing list
>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-ipfw
>> >> To unsubscribe, send any mail to "freebsd-ipfw-unsubscribe at freebsd.org"
More information about the freebsd-ipfw
mailing list