possible regression handling packet fragmentation in 14.0 with tftp/pxe
Date: Fri, 19 Apr 2024 13:39:51 UTC
Hello, I have found something that looks like a regression to me (but it may also be a bugfix, and I was just relying on the bug earlier :-). Anyway, I don't fully understand what is going on, maybe someone here has more insight than I do. I have various router appliances based on FreeBSD. They act as NAT-routers, dns/dhcp-servers and vpn-servers (using tinc in switch mode as vpn solution). I use these in different incarnations for many years now (since 8.something afaicr), the systems work fine up to 13.3. With 14.0 I hit a strange issue: Some of my LANs that FreeBSD is acting as NAT-gateway for (using pf for nat, including scrubbing) contain diskless machines that need to boot off a NFS-server that is located outside the LAN. To make this possible, The router and the NFS-server run a tinc-connection. On the router, tinc's virtual TAP-interface is bridged with the physical interface of the LAN: --- bridge0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=0 ether 58:9c:fc:10:ff:ed id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: tap0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 7 priority 128 path cost 2000000 member: ix3 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 4 priority 128 path cost 2000 groups: bridge nd6 options=9<PERFORMNUD,IFDISABLED> --- The remote server runs both nfsd for the diskless root and tftpd for PXE-booting. This was working fine up to 13.3. However, with the router under 14.0, the first step of the tftp-part (delivering pxelinux.0 from the syslinux package) fails and ends up in timeouts. For the following: 192.168.130.3 is the diskless client trying to boot (Linux) 192.168.130.253 is the server for nfsroot and tftp (FreeBSD) 192.168.130.254 is the router and dhcp-server (FreeBSD 13.3/14.0) The tftpd-server logs the follwoing events for this in /var/log/xferlog when the client tries to boot via pxe: --- Apr 19 11:37:40 192.168.130.253 tftpd[49562]: Filename: 'pxelinux.0' Apr 19 11:37:40 192.168.130.253 tftpd[49562]: Mode: 'octet' Apr 19 11:37:40 192.168.130.253 tftpd[49564]: Filename: 'pxelinux.0' Apr 19 11:37:40 192.168.130.253 tftpd[49564]: Mode: 'octet' Apr 19 11:37:40 192.168.130.253 tftpd[49564]: 192.168.130.3: read request for //pxelinux.0: success Apr 19 11:37:45 192.168.130.253 tftpd[49564]: receive_packet: timeout Apr 19 11:37:45 192.168.130.253 tftpd[49564]: Timeout #0 on ACK 1 Apr 19 11:37:50 192.168.130.253 tftpd[49564]: receive_packet: timeout Apr 19 11:37:50 192.168.130.253 tftpd[49564]: Timeout #1 on ACK 1 Apr 19 11:37:55 192.168.130.253 tftpd[49564]: receive_packet: timeout Apr 19 11:37:55 192.168.130.253 tftpd[49564]: Timeout #2 on ACK 1 Apr 19 11:38:00 192.168.130.253 tftpd[49564]: receive_packet: timeout Apr 19 11:38:00 192.168.130.253 tftpd[49564]: Timeout #3 on ACK 1 Apr 19 11:38:05 192.168.130.253 tftpd[49564]: receive_packet: timeout Apr 19 11:38:05 192.168.130.253 tftpd[49564]: Timeout #4 on ACK 1 Apr 19 11:38:10 192.168.130.253 tftpd[49564]: receive_packet: timeout Apr 19 11:38:10 192.168.130.253 tftpd[49564]: Timeout #5 send ACK 1 giving up --- A tcpdump for the MAC of the pxe client taken on the physical interface of the router looks like this: --- 11:37:36.843770 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:25:90:69:bf:ae, length 548 11:37:36.844639 IP 192.168.130.254.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length 357 11:37:40.853302 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:25:90:69:bf:ae, length 548 11:37:40.855024 IP 192.168.130.254.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length 357 11:37:40.855653 ARP, Request who-has 192.168.130.253 tell 192.168.130.3, length 46 11:37:40.856543 ARP, Reply 192.168.130.253 is-at 00:bd:df:ce:fa:03, length 28 11:37:40.856584 IP 192.168.130.3.2070 > 192.168.130.253.69: TFTP, length 27, RRQ "pxelinux.0" octet tsize 0 11:37:40.860701 IP 192.168.130.253.38476 > 192.168.130.3.2070: UDP, length 14 11:37:40.860737 IP 192.168.130.3.2070 > 192.168.130.253.38476: UDP, length 17 11:37:40.860908 IP 192.168.130.3.2071 > 192.168.130.253.69: TFTP, length 32, RRQ "pxelinux.0" octet blksize 1456 11:37:40.891419 IP 192.168.130.253.31448 > 192.168.130.3.2071: UDP, length 15 11:37:40.891455 IP 192.168.130.3.2071 > 192.168.130.253.31448: UDP, length 4 11:37:40.910020 IP 192.168.130.253.31448 > 192.168.130.3.2071: UDP, length 1460 11:37:40.910037 IP 192.168.130.253 > 192.168.130.3: ip-proto-17 11:37:45.910310 IP 192.168.130.253.31448 > 192.168.130.3.2071: UDP, length 1460 11:37:45.910327 IP 192.168.130.253 > 192.168.130.3: ip-proto-17 11:37:50.915422 IP 192.168.130.253.31448 > 192.168.130.3.2071: UDP, length 1460 11:37:50.915439 IP 192.168.130.253 > 192.168.130.3: ip-proto-17 11:37:55.919340 IP 192.168.130.253.31448 > 192.168.130.3.2071: UDP, length 1460 11:37:55.919359 IP 192.168.130.253 > 192.168.130.3: ip-proto-17 11:38:00.934017 IP 192.168.130.253.31448 > 192.168.130.3.2071: UDP, length 1460 11:38:00.934033 IP 192.168.130.253 > 192.168.130.3: ip-proto-17 11:38:05.943631 IP 192.168.130.253.31448 > 192.168.130.3.2071: UDP, length 1460 11:38:05.943651 IP 192.168.130.253 > 192.168.130.3: ip-proto-17 --- It looks like there are tftp packages transmitted that are somehow never picked up by the client. As 13.3 was running fine in this place, I compared the tcpdump output to what is happening there: --- 13:34:34.112855 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:25:90:69:bf:ae (oui Unknown), length 548 13:34:36.145073 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:25:90:69:bf:ae (oui Unknown), length 548 13:34:40.154596 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:25:90:69:bf:ae (oui Unknown), length 548 13:34:40.155930 ARP, Request who-has 192.168.130.253 tell 192.168.130.3, length 46 13:34:40.156176 ARP, Reply 192.168.130.253 is-at 00:bd:7b:2d:f7:05 (oui Unknown), length 28 13:34:40.156239 IP 192.168.130.3.2070 > 192.168.130.253.tftp: 27 RRQ "pxelinux.0" octet tsize 0 13:34:40.159338 IP 192.168.130.253.16697 > 192.168.130.3.2070: UDP, length 14 13:34:40.159406 IP 192.168.130.3.2070 > 192.168.130.253.16697: UDP, length 17 13:34:40.159574 IP 192.168.130.3.2071 > 192.168.130.253.tftp: 32 RRQ "pxelinux.0" octet blksize 1456 13:34:40.162327 IP 192.168.130.253.33393 > 192.168.130.3.2071: UDP, length 15 13:34:40.162388 IP 192.168.130.3.2071 > 192.168.130.253.33393: UDP, length 4 13:34:40.162708 IP 192.168.130.253.33393 > 192.168.130.3.2071: UDP, bad length 1460 > 1392 13:34:40.162758 IP 192.168.130.253 > 192.168.130.3: udp 13:34:40.162837 IP 192.168.130.3.2071 > 192.168.130.253.33393: UDP, length 4 13:34:40.163089 IP 192.168.130.253.33393 > 192.168.130.3.2071: UDP, bad length 1460 > 1392 13:34:40.163124 IP 192.168.130.253 > 192.168.130.3: udp 13:34:40.163670 IP 192.168.130.3.2071 > 192.168.130.253.33393: UDP, length 4 13:34:40.163920 IP 192.168.130.253.33393 > 192.168.130.3.2071: UDP, bad length 1460 > 1392 13:34:40.163956 IP 192.168.130.253 > 192.168.130.3: udp 13:34:40.164515 IP 192.168.130.3.2071 > 192.168.130.253.33393: UDP, length 4 13:34:40.164765 IP 192.168.130.253.33393 > 192.168.130.3.2071: UDP, bad length 1460 > 1392 [...] --- Although this reports "bad length" all the time (whatever this means), it works and transfers bootloader, initramfs, kernel etc. for diskless Linux machines in the LAN. But this suspiciously looked like MTU problems. The VPN only offers an MTU of 1425 by default, while tftp appears to use 1460. After some searching and reading I found that the original tftp default was 512 byte packets, and the client obviously requests larger packets for speed reasons explicitely with the "blksize 1456" command. Unfortunately, I found no way to configure the PXE firmware to use smaller packets. However, adding the "-o" option to FreeBSD's tftpd could disable all extra options and forced both the server and the client to user smaller packets. TFTP and PXE-booting were working fine again after that change. On the other hand, this feels like a workaround. What is the actual problem here, and why did the very same setup "just work" up to FreeBSD 13.3 on the router? The setup of pf.conf is quite minimal, the packet normalization part is just --- set block-policy return set optimization aggressive scrub in all --- Is this some kind of regression or rather the fix of a bug I was relying upon earlier? Any hints and insight would be greatly appreciated. cu Gerrit