From nobody Thu Dec 08 06:57:08 2022 X-Original-To: ipfw@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4NSQ2B12mXz4jv32 for ; Thu, 8 Dec 2022 06:57:22 +0000 (UTC) (envelope-from john@sanren.ac.za) Received: from mail-il1-x131.google.com (mail-il1-x131.google.com [IPv6:2607:f8b0:4864:20::131]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4NSQ284rRCz3rnd for ; Thu, 8 Dec 2022 06:57:20 +0000 (UTC) (envelope-from john@sanren.ac.za) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=sanren-ac-za.20210112.gappssmtp.com header.s=20210112 header.b=xWT4ImOx; spf=pass (mx1.freebsd.org: domain of john@sanren.ac.za designates 2607:f8b0:4864:20::131 as permitted sender) smtp.mailfrom=john@sanren.ac.za; dmarc=none Received: by mail-il1-x131.google.com with SMTP id s16so291650iln.4 for ; Wed, 07 Dec 2022 22:57:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sanren-ac-za.20210112.gappssmtp.com; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=HyuFvWI0PAV3NgC/gVBAsZ+MpoTG+9rTwD3mZCeLpmg=; b=xWT4ImOxvbTjrDTLqb82LLepD4oGbrZQN9+H7kk/fa3aCmGPK611JGYIK3MYMpPSFI 6emuL77Wg2RwdwjwEvAkELoPM23jxkFPJ7c5x2Mk8pcHxvDHddfvEF3FRU1/zBB8rIBf IdACYEUnKJ9XuaypHbU9XhEeFQZLTy6WGOWhNDljQqsX+LVA7jFeDGlFbPkD6njdQudJ xV1goRfmARhMMtejpRurUwRn7zkbqoIjlvTqe9BxDwdxCykzpypJuLBaND1HK+IctcXF BgFniZ9rGM6JKNDSE4KTJ/Yv9dbptOLrRJ5jcs8CBehZc2+cRfaweHgy8eZDjw7Xe5Ou l8fA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HyuFvWI0PAV3NgC/gVBAsZ+MpoTG+9rTwD3mZCeLpmg=; b=32NM2n/GmbLtAHjkg6zAoKIz+2G037gCtSsU8xxaag1ZpZZRKqq1Hf4wBl0xrAFGcX prOCoIYvd7nesAhffN3KwSb7XlEvOIoFZG7qkkpgDtQM/klnfA8w69q/3y2tqxD5/C6k w8fJhfF6krcL4XmctT8BYafBDqzvdN7NwXKkSC2hpq8plxSjR7tsM5w3sHr/N3oZAUqs 2rO6xSJoSpf90IN3+Nm4m/jFrlWQP8m/KhVKmHh2G05LT+CTaJQ7vunYPCvgcIsPTVMb MhADtrfXNKQQCXTCfe0l4m4T+gcoCo1q19rV/ebCFuPhQlhMLJ5PWuRDR/dqNhlDvMuZ 5uQw== X-Gm-Message-State: ANoB5pnq55DbysTY7opqzPl08t1yy+R3v6Qe0g6Qp4R41oBsBdwhjHpD SwlKIX+vg78S+spL0Y38O4TdZAZKOT4aWnkUZk7QclwZCKJhawEf X-Google-Smtp-Source: AA0mqf6YdegRbhGovFG3ouFJrD96dre1P7savOIvDX2yWzG4WUlAPmImzZQWBgd0Y3VElPMKXEiYNMI+OpidEJlSWYo= X-Received: by 2002:a05:6e02:c86:b0:303:cc0:689d with SMTP id b6-20020a056e020c8600b003030cc0689dmr21920840ile.73.1670482639544; Wed, 07 Dec 2022 22:57:19 -0800 (PST) List-Id: IPFW Technical Discussions List-Archive: https://lists.freebsd.org/archives/freebsd-ipfw List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-ipfw@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: John Hay Date: Thu, 8 Dec 2022 08:57:08 +0200 Message-ID: Subject: Re: ipfw nat and smaller wan mtu To: ipfw@freebsd.org Content-Type: multipart/alternative; boundary="000000000000557d3e05ef4b8c8d" X-Spamd-Result: default: False [-0.62 / 15.00]; URI_COUNT_ODD(1.00)[1]; HTTP_TO_IP(1.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; R_DKIM_ALLOW(-0.20)[sanren-ac-za.20210112.gappssmtp.com:s=20210112]; NEURAL_HAM_SHORT(-0.12)[-0.119]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::131:from]; MIME_TRACE(0.00)[0:+,1:+,2:~]; MLMMJ_DEST(0.00)[ipfw@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; DKIM_TRACE(0.00)[sanren-ac-za.20210112.gappssmtp.com:+]; ARC_NA(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; DMARC_NA(0.00)[sanren.ac.za]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; PREVIOUSLY_DELIVERED(0.00)[ipfw@freebsd.org]; RCVD_COUNT_TWO(0.00)[2] X-Rspamd-Queue-Id: 4NSQ284rRCz3rnd X-Spamd-Bar: / X-ThisMailContainsUnwantedMimeParts: N --000000000000557d3e05ef4b8c8d Content-Type: text/plain; charset="UTF-8" Hi, Adding this patch does make it work for me. There might be better ways to do it. I have tested with ping and ssh. In ping's case, ping reported: frag needed and DF set (MTU 1392) In ssh's case I could see with tcpdump that the "need to frag (mtu 1392)" was sent back and the next packet's length was adjusted. ##### 06:29:59.869677 IP (tos 0x48, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 1500) 10.10.1.3.64344 > 10.10.7.7.22: Flags [.], cksum 0xb64d (correct), seq 39:1487, ack 39, win 1027, options [nop,nop,TS val 260430893 ecr 926374970], length 1448 06:29:59.869954 IP (tos 0x0, ttl 63, id 62454, offset 0, flags [none], proto ICMP (1), length 596) 10.10.2.2 > 10.10.1.3: ICMP 10.10.7.7 unreachable - need to frag (mtu 1392), length 576 IP (tos 0x48, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 1500, bad cksum e081 (->19b7)!) 10.10.1.3.64344 > 10.10.7.7.22: Flags [.], seq 39:1487, ack 39, win 1027, options [nop,nop,TS val 260430893 ecr 926374970], length 1448 06:29:59.871301 IP (tos 0x48, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 1392) 10.10.1.3.64344 > 10.10.7.7.22: Flags [.], cksum 0x6841 (correct), seq 39:1379, ack 39, win 1027, options [nop,nop,TS val 260430893 ecr 926374970], length 1340 ##### --- sys/netinet/libalias/alias.c.orig 2022-05-12 04:54:03.000000000 +0000 +++ sys/netinet/libalias/alias.c 2022-12-08 05:42:25.127980000 +0000 @@ -365,6 +365,19 @@ lnk = NULL; if (lnk != NULL) { + /* + If the packet was locally generated, it will have a + loopback address as source, which will not be handled + correctly. For now use the destination address as source + address. The correct source address might be the the + interface address that the packet will be going out on. + */ + if (IN_LOOPBACK(ntohl(pip->ip_src.s_addr)) && + !IN_LOOPBACK(ntohl(pip->ip_dst.s_addr))) { + DifferentialChecksum(&pip->ip_sum, + &pip->ip_dst, &pip->ip_src, 2); + pip->ip_src = pip->ip_dst; + } if (ip->ip_p == IPPROTO_UDP || ip->ip_p == IPPROTO_TCP) { int accumulate, accumulate2; struct in_addr original_address; On Wed, 7 Dec 2022 at 16:33, John Hay wrote: > Hi, > > What would the proper ipfw rules be to make nat work and properly get the > icmp too big packets back to a local host if the wan interface needs a > smaller mtu? > > I'm using a FreeBSD machine as router/firewall, but its wan interface > needs a smaller mtu (1392) than the default ethernet mtu. I have replicated > this in a VM so I can test it. My simplified ipfw rules make it work for > packets that are smaller than the wan mtu: > > ##### > net.inet.ip.fw.one_pass=0 > net.inet.ip.fw.verbose=1 > ##### > fwcmd="/sbin/ipfw -q" > wan="vtnet0" > lan="vtnet1" > ${fwcmd} nat 123 config if ${wan} log > ${fwcmd} add 1000 count log all from any to any > ${fwcmd} add 5000 nat 123 ip4 from any to any via ${wan} > ${fwcmd} add 6000 allow log all from any to any > ##### > The wan ip of the firewall is 10.10.2.2 and the ip address of the host (on > the lan side) I'm testing from is 10.10.1.3. And I did a ping to 10.10.5.5, > which is on the other side of the wan interface. > > This works for packets smaller than the wan mtu. But if the packet is > larger than the wan mtu, the icmp too big is generated, but with 127.0.0.1 > as the source and the wan ip as the destination and then sent via lo0 and > it looks like this in the ipfw log: > > Dec 7 13:24:59 rtr kernel: ipfw: 1000 Count ICMP:3.4 127.0.0.1 10.10.2.2 > out via lo0 > > So I added a nat ipfw rule to catch that: > > ${fwcmd} add 5050 nat 123 ip4 from any to not 127.0.0.1 via lo0 > > That helped partly because it was then able to recover the address of the > host I was testing from and tried to send the packet out on the correct > interface (vtnet1). Unfortunately it still had the source address of > 127.0.0.1, which means it did not actually make it to the wire: > > ###### > Dec 7 14:17:31 rtr kernel: ipfw: 1000 Count ICMP:8.0 10.10.1.3 10.10.5.5 > in via vtnet1 > Dec 7 14:17:31 rtr kernel: ipfw: 6000 Accept ICMP:8.0 10.10.1.3 10.10.5.5 > in via vtnet1 > Dec 7 14:17:31 rtr kernel: ipfw: 1000 Count ICMP:8.0 10.10.1.3 10.10.5.5 > out via vtnet0 > Dec 7 14:17:31 rtr kernel: ipfw: 6000 Accept ICMP:8.0 10.10.2.2 10.10.5.5 > out via vtnet0 > Dec 7 14:17:31 rtr kernel: ipfw: 1000 Count ICMP:3.4 127.0.0.1 10.10.2.2 > out via lo0 > Dec 7 14:17:31 rtr kernel: ipfw: 6000 Accept ICMP:3.4 127.0.0.1 10.10.2.2 > out via lo0 > Dec 7 14:17:31 rtr kernel: ipfw: 1000 Count ICMP:3.4 127.0.0.1 10.10.2.2 > in via lo0 > Dec 7 14:17:31 rtr kernel: ipfw: 6000 Accept ICMP:3.4 127.0.0.1 10.10.1.3 > in via lo0 > Dec 7 14:17:31 rtr kernel: ipfw: 1000 Count ICMP:3.4 127.0.0.1 10.10.1.3 > out via vtnet1 > Dec 7 14:17:31 rtr kernel: ipfw: 6000 Accept ICMP:3.4 127.0.0.1 10.10.1.3 > out via vtnet1 > ###### > > Once I have this sorted, there seems to be a similar problem with nptv6. > > Regards > > John > > --000000000000557d3e05ef4b8c8d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

Adding this patch does m= ake it work for me. There might be better ways to do it. I have tested with= ping and ssh. In ping's case, ping reported:
frag needed and= DF set (MTU 1392)

In ssh's case I could see w= ith tcpdump that the "need to frag (mtu 1392)" was sent back and = the next packet's length was adjusted.

#####
06:29:59.869677 IP (tos 0x48, ttl 64, id 0, offset 0, flags [DF], = proto TCP (6), length 1500)
=C2=A0 =C2=A0 10.10.1.3.64344 > 10.10.7.7= .22: Flags [.], cksum 0xb64d (correct), seq 39:1487, ack 39, win 1027, opti= ons [nop,nop,TS val 260430893 ecr 926374970], length 1448
06:29:59.86995= 4 IP (tos 0x0, ttl 63, id 62454, offset 0, flags [none], proto ICMP (1), le= ngth 596)
=C2=A0 =C2=A0 10.10.2.2 > 10.1= 0.1.3: ICMP 10.10.7.7 unreachable - need to frag (mtu 1392), length 576=
IP (tos 0x48, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), lengt= h 1500, bad cksum e081 (->19b7)!)
=C2=A0 =C2=A0 10.10.1.3.64344 > = 10.10.7.7.22: Flags [.], seq 39:1487, ack 39, win 1027, options [nop,nop,TS= val 260430893 ecr 926374970], length 1448
06:29:59.871301 IP (tos 0x48,= ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 1392)
=C2=A0 = =C2=A0 10.10.1.3.64344 > 10.10.7.7.22: Flags [.], cksum 0x6841 (correct)= , seq 39:1379, ack 39, win 1027, options [nop,nop,TS val 260430893 ecr 9263= 74970], length 1340
#####

--- sy= s/netinet/libalias/alias.c.orig =C2=A0 2022-05-12 04:54:03.000000000 +0000<= br>+++ sys/netinet/libalias/alias.c =C2=A0 =C2=A0 =C2=A0 =C2=A02022-12-08 0= 5:42:25.127980000 +0000
@@ -365,6 +365,19 @@
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 lnk =3D NULL;
=C2=A0
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 if (lnk !=3D NULL) {
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 /*
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 If the packet was locally generated, it will have a
+ =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 loopback address as= source, which will not be handled
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 correctly. For now use the destination address = as source
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 address. The correct source address might be the the
+ =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 interface address that th= e packet will be going out on.
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 */
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (IN_= LOOPBACK(ntohl(pip->ip_src.s_addr)) &&
+ =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 !IN_LOOPBACK(ntohl(pip->ip_ds= t.s_addr))) {
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 DifferentialChecksum(&pip->ip_sum,
+ =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 &pip->ip_dst, &pip->ip_src, 2);
+ =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 pip->ip_= src =3D pip->ip_dst;
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 }
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (ip->= ;ip_p =3D=3D IPPROTO_UDP || ip->ip_p =3D=3D IPPROTO_TCP) {
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 i= nt accumulate, accumulate2;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 struct in_addr original_address;<= br>

On Wed, 7 Dec 2022 at 16:33, John Hay <john@sanren.ac.za> wrote:
Hi,

What would the proper ipfw rules be to make nat work and properly get= the icmp too big packets back to a local host if the wan interface needs a= smaller mtu?

I'm using a FreeBSD machine as r= outer/firewall, but its wan interface needs a smaller mtu (1392) than the d= efault ethernet mtu. I have replicated this in a VM so I can test it. My si= mplified ipfw rules make it work for packets that are smaller than the wan = mtu:

#####
net.inet.ip.fw.one_pass=3D0net.inet.ip.fw.verbose=3D1
#####
fwcmd=3D"/s= bin/ipfw -q"
wan=3D"vtnet0"
lan=3D"= vtnet1"
${fwcmd} nat 123 config if ${wan} log
${fw= cmd} add 1000 count log all from any to any
${fwcmd} add 5000 nat= 123 ip4 from any to any via ${wan}
${fwcmd} add 6000 allow log a= ll from any to any
#####
The wan ip of the firewall is = 10.10.2.2 and the ip address of the host (on the lan side) I'm testing = from is 10.10.1.3. And I did a ping to 10.10.5.5, which is on the other sid= e of the wan interface.

This works for packets= smaller than the wan mtu. But if the packet is larger than the wan mtu, th= e icmp too big is generated, but with 127.0.0.1 as the source and the wan i= p as the destination and then sent via lo0 and it looks like this in the ip= fw log:

Dec =C2=A07 13:24:59 rtr kernel: ipfw: 100= 0 Count ICMP:3.4 127.0.0.1 10.10.2.2 out via lo0

S= o I added a nat ipfw rule to catch that:

${fwcmd} = add 5050 nat 123 ip4 from any to not 127.0.0.1 via lo0

=
That helped partly because it was then able to recover the address of = the host I was testing from and tried to send the packet out on the correct= interface (vtnet1). Unfortunately it still had the source address of 127.0= .0.1, which means it did not actually make it to the wire:
######
Dec =C2=A07 14:17:31 rtr kernel: ipfw: 1000 Co= unt ICMP:8.0 10.10.1.3 10.10.5.5 in via vtnet1
Dec =C2=A07 14:17:31 rtr = kernel: ipfw: 6000 Accept ICMP:8.0 10.10.1.3 10.10.5.5 in via vtnet1
Dec= =C2=A07 14:17:31 rtr kernel: ipfw: 1000 Count ICMP:8.0 10.10.1.3 10.10.5.5= out via vtnet0
Dec =C2=A07 14:17:31 rtr kernel: ipfw: 6000 Accept ICMP:= 8.0 10.10.2.2 10.10.5.5 out via vtnet0
Dec =C2=A07 14:17:31 rtr kernel: = ipfw: 1000 Count ICMP:3.4 127.0.0.1 10.10.2.2 out via lo0
Dec =C2=A07 14= :17:31 rtr kernel: ipfw: 6000 Accept ICMP:3.4 127.0.0.1 10.10.2.2 out via l= o0
Dec =C2=A07 14:17:31 rtr kernel: ipfw: 1000 Count ICMP:3.4 127.0.0.1 = 10.10.2.2 in via lo0
Dec =C2=A07 14:17:31 rtr kernel: ipfw: 6000 Accept = ICMP:3.4 127.0.0.1 10.10.1.3 in via lo0
Dec =C2=A07 14:17:31 rtr kernel:= ipfw: 1000 Count ICMP:3.4 127.0.0.1 10.10.1.3 out via vtnet1
Dec =C2=A0= 7 14:17:31 rtr kernel: ipfw: 6000 Accept ICMP:3.4 127.0.0.1 10.10.1.3 out v= ia vtnet1
######

Once I have this sort= ed, there seems to be a similar problem with nptv6.

Regards

John

--000000000000557d3e05ef4b8c8d--