From nobody Mon Apr 22 15:35:16 2024 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4VNTps6D84z5HKnj for ; Mon, 22 Apr 2024 15:35:33 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from gromit.dlib.vt.edu (gromit.dlib.ipv6.vt.edu [IPv6:2001:468:c80:a103:2:5000:5555:5555]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4VNTps4Npcz3wwx; Mon, 22 Apr 2024 15:35:33 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Authentication-Results: mx1.freebsd.org; none Received: from smtpclient.apple (unknown [IPv6:2607:b400:24:0:1013:38ba:f6f4:cc9d]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gromit.dlib.vt.edu (Postfix) with ESMTPSA id 50FCB47885; Mon, 22 Apr 2024 11:35:27 -0400 (EDT) Content-Type: text/plain; charset=utf-8 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.500.171.1.1\)) Subject: Re: Strange network/socket anomalies since about a month From: Paul Mather In-Reply-To: <1fe609f252e7fae6d746530d5035ec0e@Leidinger.net> Date: Mon, 22 Apr 2024 11:35:16 -0400 Cc: Current , Gleb Smirnoff Content-Transfer-Encoding: quoted-printable Message-Id: <55E45C9C-1878-4FAD-B46A-0EA1FFCCAE1D@gromit.dlib.vt.edu> References: <1fe609f252e7fae6d746530d5035ec0e@Leidinger.net> To: Alexander Leidinger X-Mailer: Apple Mail (2.3774.500.171.1.1) X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:1312, ipnet:2001:468:c80::/48, country:US] X-Rspamd-Queue-Id: 4VNTps4Npcz3wwx On Apr 22, 2024, at 3:26=E2=80=AFAM, Alexander Leidinger = wrote: > Hi, >=20 > I see a higher failure rate of socket/network related stuff since a = while. Those failures are transient. Directly executing the same thing = again may or may not result in success/failure. I'm not able to = reproduce this at will. Sometimes they show up. >=20 > Examples: > - poudriere runs with the sccache overlay (like ccache but also works = for rust) sometimes fail to create the communication socket and as such = the build fails. I have 3 different poudriere bulk runs after each other = in my build script, and when the first one fails, the second and third = still run. If the first fails due to the sccache issue, the second and = 3rd may or may not fail. Sometimes the first fails and the rest is ok. = Sometimes all fail, and if I then run one by hand it works (the script = does the same as the manual run, the script is simply a "for type in A B = C; do; poudriere bulk -O sccache -j $type -f ${type}.pkglist; done" = which I execute from the same shell, and the script doesn't do = env-sanityzing). > - A webmail interface (inet / local net -> nginx (rev-proxy) -> nginx = (webmail service) -> php -> imap) sees intermittent issues sometimes. = Opening the same email directly again afterwards normally works. I've = also seen transient issues with pgp signing (webmail interface -> gnupg = / gpg-agent on the server), simply hitting send again after a failure = works fine. >=20 > Gleb, could this be related to the socket stuff you did 2 weeks ago? = My world is from 2024-04-17-112537. I do notice this since at least = then, but I'm not sure if they where there before that and I simply = didn't notice them. They are surely "new recently", that amount of = issues I haven's seen in January. The last two updates of current I did = before the last one where on 2024-03-31-120210 and 2024-04-08-112551. >=20 > I could also imagine that some memory related transient failure could = cause this, but with >3 GB free I do not expect this. Important here may = be that I have https://reviews.freebsd.org/D40575 in my tree, which is = memory related, but it's only a metric to quantify memory fragmentation. >=20 > Any ideas how to track this down more easily than running the entire = poudriere in ktrace (e.g. a hint/script which dtrace probes to use)? No answers, I'm afraid, just a "me too." I have the same problem as you describe when using = ports-mgmt/sccache-overlay when building packages with Poudriere. In my = case, I'm using FreeBSD 14-STABLE (stable/14-13952fbca). I actually stopped using ports-mgmt/sccache-overlay because it got to = the point where it didn't work more often than it did. Then, a few = months ago, I decided to start using it again on a whim and it worked = reliably for me. Then, starting a few weeks ago, it has reverted to the = behaviour you describe above. It is not as bad right now as it got when = I quit using it. Now, sometimes it will fail, but it will succeed when = re-running a "poudriere bulk" run. I'd love it to go back to when it was working 100% of the time. Cheers, Paul.