Re: FreeBSD 13: IPSec netisr overload causes unrelated packet loss

From: Timothy Pearson <tpearson_at_raptorengineering.com>
Date: Sun, 19 Jan 2025 01:46:47 UTC
Forcibly disabling RSS with the IPSec deferred patch seems to have fixed the issue.  Given the wide ranging deleterious effects with RSS on vs. a bit of IPsec theoretical maximum bandwidth loss with it off, we'll take the bandwidth hit at the moment. ;)

Are there any significant concerns with running the patch for deferred IPSec input?  From my analysis of the code, I think the absolute worst case might be a reordered packet or two, but that's always possible with IPSec over UDP transport AFAIK.

----- Original Message -----
> From: "Timothy Pearson" <tpearson@raptorengineeringinc.com>
> To: "freebsd-net" <freebsd-net@FreeBSD.org>
> Sent: Saturday, January 18, 2025 7:24:57 PM
> Subject: Re: FreeBSD 13: IPSec netisr overload causes unrelated packet loss

> Quick update --tried the IPSec deferred update patch [1], no change.
> 
> A few tunables I forgot to include as well:
> net.route.netisr_maxqlen: 256
> net.isr.numthreads: 32
> net.isr.maxprot: 16
> net.isr.defaultqlimit: 256
> net.isr.maxqlimit: 10240
> net.isr.bindthreads: 1
> net.isr.maxthreads: 32
> net.isr.dispatch: direct
> 
> [1] https://www.mail-archive.com/freebsd-net@freebsd.org/msg64742.html
> 
> ----- Original Message -----
>> From: "Timothy Pearson" <tpearson@raptorengineeringinc.com>
>> To: "freebsd-net" <freebsd-net@FreeBSD.org>
>> Sent: Saturday, January 18, 2025 4:16:29 PM
>> Subject: FreeBSD 13: IPSec netisr overload causes unrelated packet loss
> 
>> Hi all,
>> 
>> I've been pulling my hair out over a rather interesting problem that I've traced
>> into an interaction between IPSec and the rest of the network stack.  I'm not
>> sure if this is a bug or if there's a tunable I'm missing somewhere, so here
>> goes...
>> 
>> We have a pf-based multi-CPU firewall running FreeBSD 13.x with multiple subnets
>> directly attached, one per NIC, as well as multiple IPSec tunnels to remote
>> sites alongside a UDP multicast proxy system (this becomes important later).
>> For the most part the setup works very well, however we have discovered
>> through extensive trial and error / debugging that we can induce major packet
>> loss on the firewall host itself by simply flooding the system with small IPSec
>> packets (high PPS, low bandwidth).
>> 
>> The aforementioned (custom) multicast UDP proxy is an excellent canary for the
>> problem, as it checks for and reports any dropped packets in the receive data
>> stream.  Normally, there are no dropped packets even with saturated links on
>> any of the local interfaces or when *sending* high packet rates over IPsec.  As
>> soon as high packet rates are *received* over IPsec, the following happens:
>> 
>> 1.) netisr on one core only goes to 100% interrupt load
>> 2.) net.inet.ip.intr_queue_drops starts incrementing rapidly
>> 3.) The multicast receiver, which only receives traffic from one of the *local*
>> interfaces (not any of the IPsec tunnels), begins to see packet loss despite
>> more than adequate buffers in place with no buffer overflows in the UDP stack /
>> application buffering.  The packets are simply never received by the kernel UDP
>> stack.
>> 4.) Other applications (e.g. NTP) start to see sporadic packet loss as well,
>> again on local traffic not over IPsec.
>> 
>> As soon as the IPSec receive traffic is lowered enough to get the netisr
>> interrupt load below 100% on the one CPU core, everything recovers and
>> functions normally.  Note this has to be done by lowering the IPSec transmit
>> rate on the remote system, there is no way I have discovered to "protect" the
>> receiver from this kind of overload.
>> 
>> While I would expect packet loss in an overloaded IPSec link scenario like this
>> just due to the decryption not keeping up, I would also expect that loss to be
>> confined to the IPSec tunnel.  It should not spider out into the rest of the
>> system and start affecting all of the other applications and
>> routing/firewalling on the box -- this is what was miserable to debug, as the
>> IPSec link was originally only hitting the PPS limits described above
>> sporadically during overnight batch processing.  Now that I know what's going
>> on, I can provoke easily with iperf3 in UDP mode.  On the boxes we are using,
>> the limit seems to be around 50kPPS before we hit 100% netisr CPU load -- this
>> limit is *much* lower with async crypto turned off.
>> 
>> Important tunables already set:
>> 
>> net.inet.ipsec.async_crypto=1 (turning this off just makes the symptoms appear
>> at lower PPS rates)
>> net.isr.dispatch=direct (deferred or hybrid does nothing to change the symptoms)
>> net.inet.ip.intr_queue_maxlen=4096
>> 
>> Thoughts are welcome...if there's any way to stop the "spread" of the loss I'm
>> all ears.  It seems that somehow the IPSec traffic (perhaps by nature of its
>> lengthy decryption process) is able to grab an unfair share of netisr queue 0,
>> and that interferes with the other traffic.  If there was a way to move the
>> IPSec decryption to another netisr queue, that might fix the problem, but I
>> don't see any tunables to do so.
>> 
> > Thanks!