Traffic "corruption" in 12-stable

Joe Clarke jclarke at marcuscom.com
Mon Aug 3 21:22:52 UTC 2020



> On Jul 27, 2020, at 15:41, Joe Clarke <jclarke at marcuscom.com> wrote:
> 
> 
> 
>> On Jul 27, 2020, at 15:01, Mark Johnston <markj at freebsd.org> wrote:
>> 
>> On Sun, Jul 26, 2020 at 06:16:07PM -0400, Joe Clarke wrote:
>>> About two weeks ago, I upgraded from the latest 11-stable to the latest 12-stable.  After that, I periodically see the network throughput come to a near standstill.  This FreeBSD machine is an ESXi VM with two interfaces.  It acts as a router.  It uses vmxnet3 interfaces for both LAN and WAN.  It runs ipfw with in-kernel NAT.  The LAN side uses a bridge with vmx0 and a tap0 L2 VPN interface.  My LAN side uses an MTU of 9000, and my vmx1 (WAN side) uses the default 1500.
>>> 
>>> Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN ping times), I know the problem has occurred because my lldpd reports:
>>> 
>>> Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on bridge0
>>> 
>>> And if I turn on ipfw verbose messages, I see tons of:
>>> 
>>> Jul 26 16:02:23 namale kernel: ipfw: pullup failed
>>> 
>>> This leads to me to believe packets are being corrupted on ingress.  I’ve applied all the recent iflib changes, but the problem persists. What causes it, I don’t know.
>>> 
>>> The only thing that changed (and yes, it’s a big one) is I upgraded to 12-stable.  Meaning, the rest of the network infra and topology has remained the same.  This did not happen at all in 11-stable.
>>> 
>>> I’m open to suggestions.
>> 
>> There are some fixes for vmx not present in stable/12 (yet).  I did a
>> merge of a number of outstanding revisions.  Would you be able to test
>> the patch?  I haven't observed any problems with it on a host using igb,
>> but I have no ability to test vmx at the moment.
> 
> I’m down to test anything.  I did notice quite a few vmxnet3 changes around performance that appealed to me.  I tried a few of them on my last kernel.  That took much longer to exhibit the problem, but eventually did.
> 
> I can tell you I don’t have all of these patches in, though.  I’ll build with this diff and start running it now.  I’ll let you know how it goes.

So it’s been just over a week of runtime with this full patch set.  I have seen no further issues with ingress packet “truncation”, and performance has been what I expect.  I’m going to keep running, but I think this seems like a good set to MFC.

Thanks again for your help.

Joe


---
PGP Key : http://www.marcuscom.com/pgp.asc






More information about the freebsd-stable mailing list