From nobody Wed Feb 07 15:25:11 2024 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TVP7X0K6Cz59YWZ for ; Wed, 7 Feb 2024 15:25:12 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TVP7W4jYxz4L38 for ; Wed, 7 Feb 2024 15:25:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1707319511; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=wSl/NB2O4dXvNhQA2MRdcwgJfuFuToFR10ohehamoeQ=; b=fKhHzG5brhw0B7TdRRAUBwJxqnub8QnftgPlPlXaWQYywTVfsafILp4y2X0tNI3myrowO3 /TCWE46Vp4lnFitTxwj5xEzlX/uc5RbYEYe+rDRkpnJfRzQLs50oFFdtZVlMaUtygHZekt DEXHhniojB9TYAu/F1GP2mDAYtPzX8qp06n04UWG6+0mxJ0ywnKlSQ4U0ClHb4EwwPOZUA z0TDkqaIJZ7h7+1Bibp6Yzwc79wLBd1TsLWTHK3R2tgIpKzR0TMwF9GDEb/BaB8eCFIxiq GUG32uPdUfQdgVoANujUpTn6FRobAIyaS9Qd33NEMlwPM0j+s5wrwQNRjtelgw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1707319511; a=rsa-sha256; cv=none; b=FTtfpaM0JceoR976j1cNVWnbxzkw3K84w/UgZuqp/4/j+uKBiYfrrjO8/u+JK1cBRKJRdE V4LkhI56UsiHW+XhEjoVerngVitg37zekKd7NEqQO32nOXwnU+hz1mXtFkJ7xcTL6s1IL9 /g6RMQYk1VZgAUKSavfABF4V0NzzNG7lgtuHET9a/opu4tVNt/nXk2tGYnt5CSOS8hzS7J oM/sU1BZZdyg17j7Uarnr9QO5zUMyQKo6WSJYXuMPUVSYDzwWbx5EkVbL5RWVrOOKWPW+F vgHmrGRRHk4pzsKfKfIPVSD8KOX/YfgrMV6QRjLzB1hzbGKwIa4utC95BNluow== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4TVP7W3nK9z1Bkv for ; Wed, 7 Feb 2024 15:25:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 417FPBs0081104 for ; Wed, 7 Feb 2024 15:25:11 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 417FPBgH081103 for bugs@FreeBSD.org; Wed, 7 Feb 2024 15:25:11 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 276870] mbuf cluster leak with on pf+bird2 bgp routers Date: Wed, 07 Feb 2024 15:25:11 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 13.2-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: thomas@gibfest.dk X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D276870 Bug ID: 276870 Summary: mbuf cluster leak with on pf+bird2 bgp routers Product: Base System Version: 13.2-STABLE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: thomas@gibfest.dk Created attachment 248234 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D248234&action= =3Dedit Screenshot of mbuf cluster total use as reported by netstat -m over time Hello =F0=9F=99=82 Last month I had one of my FreeBSD routers stop forwarding (stopped respond= ing on the network at all, had to IPMI in) because it ran out of mbuf clusters.= It usually operates far from the limit, but there is (was) something leaking m= buf clusters bad, and I suspect it might be bird2, or a combination of bird2 an= d a FreeBSD kernel bug. ---- Some background: The boxes in question are BGP routers for a small network, they run bird2 a= nd only get a default route from upstream BGP, not a full table. Due to a missing/misconfigured kernel export filter bird was repeatedly try= ing to export some routes to the kernel which the kernel already knew (from statically configured blackhole routes). So these errors have been repeatin= g in the logs for some time (more than a year, meaning this in itself has not be= en an issue): Jan 11 19:09:04 dgncr2a bird[30963]: KRT: Error sending route 2a09:94c0::/2= 9 to kernel: File exists Jan 11 19:10:04 dgncr2a syslogd: last message repeated 1 times Jan 11 19:10:04 dgncr2a bird[30963]: KRT: Error sending route 85.209.116.0/= 22 to kernel: File exists Jan 11 19:11:04 dgncr2a syslogd: last message repeated 1 times Over the holidays I upgraded from bird 2.0.9 to bird 2.14, as well as upgra= ding FreeBSD from 13-STABLE-384a885111ad to 13-STABLE-2cbd132986a7. I suspect on= e of these two changes made this problem appear. I made no changes to bird or ro= uter config other than the upgrades. ---- The mbuf cluster leak was pretty bad, like 8-10 clusters per second at a pr= etty steady rate. The kern.ipc.nmbclusters limit on my routers was around 2 mill= ion and I raised it to 4 million now. Since I had no idea what was causing the leak and I was desperate for a fix= I at one point tried adding the missing kernel export filter (as to at least silence the noisy warnings in the logs), and imagine my surprise when the m= buf cluster leak stopped. I tried removing the filers again, the leak started again, and stopped again when I re-added the filters. It appears some combination of bird 2.14 and exporting routes already found in the kernel means leaking mbuf clusters li= ke crazy. I have no idea if this is a bird or a freebsd problem. I reported the issue= to the bird-users@ list http://trubka.network.cz/pipermail/bird-users/2024-January/017314.html and = was encouraged in that thread to open this PR as well. The attached grafana screenshot shows the per-second rate of increase (seen over 5 minutes) of the "total" number in the "mbuf clusters in use" line of= the `netstat -m` output for both routers. The green line is the active and the yellow line is the passive router. The drop in the green line and the following spike towards the end (2000-2100ish) is me filtering the blackhole routes from the bird kernel export, removing the filter to confirm, and re-adding it. I can to some extent test stuff, but the routers are in production so nothi= ng too wild. --=20 You are receiving this mail because: You are the assignee for the bug.=