[Bug 262571] epair(4) interfaces stop forwarding traffic on moderate load
Date: Tue, 15 Mar 2022 14:08:19 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262571 Bug ID: 262571 Summary: epair(4) interfaces stop forwarding traffic on moderate load Product: Base System Version: 13.1-RELEASE Hardware: Any URL: https://lists.freebsd.org/archives/freebsd-net/2022-Ma rch/001449.html OS: Any Status: New Severity: Affects Many People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: grembo@FreeBSD.org CC: bz@FreeBSD.org, kp@freebsd.org Flags: maintainer-feedback?(kp@freebsd.org) Created attachment 232471 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=232471&action=edit Patch that works around the problem As discussed on the freebsd-net mailing list[0]. Also affects CURRENT. When running on multicore systems, epair interfaces stop forwarding traffic even on moderate load and don't recover unless recreated. This is a critical problem, as it breaks vnet jails running non-trivial workloads. The problem can be reproduced easily using a shell script[1]. This was introduced when adding multi-core improvements to epair[2]. It happens because work is scheduled in taskqueue(s) based on a check if mbuf ring buffers are empty, a logic which is racy on multi-core systems. The race is happening between epair_menq() and epair_tx_start_deferred(). The patch attached to this PR addresses the problem, but it needs to be looked at, profiled, and most likely improved by somebody who has a better understanding of both the code in question and writing lock free-code in general. [0]https://lists.freebsd.org/archives/freebsd-net/2022-March/001449.html [1]https://people.freebsd.org/~grembo/hang_epair.sh [2]https://cgit.freebsd.org/src/commit/?id=24f0bfbad57b9c3cb9b543a60b2ba00e4812c286 -- You are receiving this mail because: You are the assignee for the bug.