From nobody Sun Jan 12 07:17:48 2025 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4YW6DM4XFkz5kfSr for ; Sun, 12 Jan 2025 07:17:55 +0000 (UTC) (envelope-from paul@redbarn.org) Received: from util.redbarn.org (util.redbarn.org [24.104.150.222]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "*.redbarn.org", Issuer "RapidSSL TLS RSA CA G1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4YW6DL6ms5z42lr; Sun, 12 Jan 2025 07:17:54 +0000 (UTC) (envelope-from paul@redbarn.org) Authentication-Results: mx1.freebsd.org; none Received: from family.redbarn.org (family.redbarn.org [IPv6:2001:559:8000:cd::5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "*.redbarn.org", Issuer "RapidSSL TLS RSA CA G1" (not verified)) by util.redbarn.org (Postfix) with ESMTPS id 7D972160C2F; Sun, 12 Jan 2025 07:17:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=redbarn.org; s=util; t=1736666272; bh=yjfUBGpzqRwNxv3qPD8bb+plmyIyxJ7kvKpbYNzTyzc=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=qTDVekYgQ3W0TVuUpBwW/x9e+sm0BI3d3xgI7KBW9yNQr/+y0ONyNMy8BIAgr7rnO WG19OPwIR031TotMQBhIV1MNJXn6Fem6p5OIgBI7zJ3BG5wqHhHYGN1PdCx+QjF+Iv F68A6HET9mJyzW0NC6Zi7wO3UbNNPo9AZMxgAnBY= Received: from localhost.localnet (dhcp-159.access.rits.tisf.net [24.104.150.159]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by family.redbarn.org (Postfix) with ESMTPSA id 4BC59C3F22; Sun, 12 Jan 2025 07:17:52 +0000 (UTC) From: Paul Vixie To: Mark Johnston Cc: freebsd-net@freebsd.org Subject: Re: per-FIB socket binding Date: Sun, 12 Jan 2025 07:17:48 +0000 Message-ID: <2841433.BEx9A2HvPv@localhost> Organization: FW In-Reply-To: References: <7772475.EvYhyI6sBW@dhcp-151.access.rits.tisf.net> <3330519.aeNJFYEL58@localhost> List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@FreeBSD.org MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" X-Rspamd-Queue-Id: 4YW6DL6ms5z42lr X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:33651, ipnet:24.104.150.0/24, country:US] On Saturday, January 11, 2025 4:51:07 PM UTC Mark Johnston wrote: > On Sat, Jan 11, 2025 at 06:25:22AM +0000, Paul Vixie wrote: > > On Monday, January 6, 2025 3:56:55 PM UTC Mark Johnston wrote: > > > On Fri, Dec 27, 2024 at 08:48:48AM +0000, Paul Vixie wrote: > > ... > >=20 > > x =3D y || z; > >=20 > > so if this is forbidden by today's freebsd kernel rules, please educate > > me. i know that GCC and CLang will optimize down to the same instruction > > sequence for either, but i prefer the shorter form since in this rare > > case it is clearer. >=20 > We are not super consistent about it, but style(9) does prescribe > explicit comparisons, i.e., "if (count !=3D 0)" rather than "if (count)". > In any case, I'd add a comment, since that assignment is a bit subtle. i'll add this comment: /* ref. ISO/IEC 9899:2023 =A7 6.5.15 */ =2E..and let the reviewers sort it out. > > ... the SYN|ACK will always use the FIB from the interface where > > the SYN arrived (this is in tcp_syncache.c). >=20 > This isn't clear to me. The initial SYN will create a syncache entry in = the >=20 > if (tp->t_state =3D=3D TCPS_LISTEN && SOLISTENING(so)) >=20 > case in tcp_input_with_port(). In this case we define inc.inc_fibnum =3D > so->so_fibnum, i.e., the FIB of the listening socket. Then, > syncache_add() copies inc to sc_inc, so sc_inc.inc_fibnum for the > syncache entry comes from the listening socket, and syncache_respond() > sets the SYN|ACK mbuf FIB with M_SETFIB(m, sc->sc_inc.inc_fibnum). What > am I missing? (Yes, I should actually do an experiement to check the > behaviour.) this is exactly my understanding of that code. the FIB for the SYN|ACK will= be=20 the one from the interface who received the SYN. it's only later after we=20 receive the ACK that completes the 3-way handshake that we will begin to us= e=20 the FIB of the socket. in terms of your earlier question, this means if we'= re=20 going to drop a SYN because it came in on the wrong interface (some form of= =20 uRPF) then the firewall module (PF or IPFW) will be doing that to the SYN w= ell=20 before a syncache entry is ever created. this seems to imply that the chang= es=20 i'm proposing won't be able to permit connections to complete that would ha= ve=20 not have completed -- which you correctly predicted could be surprising. the kernel implementations of uRPF-like features are unaffected. /* * net.inet.ip.rfc1122_strong_es: the address matches, veri= fy * that the packet arrived via the correct interface. */ if (__predict_false(strong_es && ia->ia_ifp !=3D ifp)) { IPSTAT_INC(ips_badaddr); goto bad; } we're not changing the inbound packet's FIB. /* * net.inet.ip.source_address_validation: drop incoming * packets that pretend to be ours. */ if (V_ip_sav && !(ifp->if_flags & IFF_LOOPBACK) && __predict_false(in_localip_fib(ip->ip_src, ifp->if_fib)= )) =20 { IPSTAT_INC(ips_badaddr); goto bad; } unless i misunderstand, this only rejects packets if the source address is = the=20 same as one of the addresses of the interface it arrives on, whereas the=20 documentation made me expect that a source address which was any address of= =20 any interface would be the trigger. i think my proposal doesn't change this. therefore my focus was on the verrevpath, versrcreach, and antispoof featur= es=20 of IPFW; and the antispoof feature of PF. in those tools, the inbound SYN=20 would be stopped before it ever reached the SYN cache. what i then observed is that the only change resulting from my proposal wou= ld=20 be to the transmission of subsequent segments, which would use the interfac= e's=20 =46IB (if nonzero) rather than the socket's FIB (if zero). if upstream uRPF= =20 would have dropped these segments because they're coming from the wrong pla= ce,=20 then indeed failure would result, and even FIN and RST could not be=20 transmitted, which would mean the socket would die a slow death by eventual= =20 timeout. my proposal would prevent this, but i think any surprise in this c= ase=20 would be a welcome one. if actual segments could not be transmitted because= =20 the socket's FIB did not contain a route back to the connection's source, t= hen=20 failure would be more immediate but the resulting surprise (if any) no less= =20 positive. i hope that if that analysis is correct no-one will demand a socket option = or=20 sysctl. to be effective at fixing path symmetry for multihoming, this logic= =20 has to be the default. =2D-- having finished with accept(), i'm now testing the bind() and bindat()=20 changes, which ought to be effective for any UDP responder who uses a separ= ate=20 socket for each interface address it speaks through, which certainly includ= es=20 all NTP and DNS servers i'm aware of, and may also help QUIC. i've begun to wish for a setfib_fd(2) which would let sshd promote the FIB= =20 from an inbound connection to be process-wide (after fork() before exec().)= =20 that way "netstat -rn" would show the routing table actually being used by = the=20 stdin and stdout of that shell. or perhaps we could just rename SO_SETFIB t= o=20 be SO_FIB and allow getsockopt() to return the FIB? (i'm not sure why it wa= s=20 "set only" in the current implementation.) opinions welcomed, as before.=20 =2D-=20 Paul Vixie