Re: Pi3 answers ssh only if outbound ping is running on -current
Date: Sun, 13 Feb 2022 00:50:21 UTC
On 2022-Feb-12, at 13:32, Mark Millard <marklmi@yahoo.com> wrote: > On 2022-Feb-12, at 10:56, bob prohaska <fbsd@www.zefox.net> wrote: > >> For a few weeks now a Pi3 running -current will not respond to >> an incoming ssh connection unless an outbound ping process is running. >> >> Once the outbound ping is started via the serial console, incoming >> ssh connections are answered normally. Uname -a reports >> FreeBSD www.zefox.org 14.0-CURRENT FreeBSD 14.0-CURRENT #10 main-n253073-6db44b0158c: Sat Feb 12 04:30:21 PST 2022 bob@www.zefox.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64 >> >> A Pi4 running -current of a few days ago exhibits no such problems. >> >> Another Pi3 running stable/13 has been behaving in the same way. >> >> Both Pi3s successfully set time via ntp on reboot and will >> very briefly (one or two minutes) prompt for an ssh password, >> but no further progress is made and the login attempt times out. >> If the ssh login is attempted a second time, not even a password >> prompt comes back. >> >> Ping times (to an adjacent machine on the same subnet are >> 64 bytes from 50.1.20.26: icmp_seq=2 ttl=64 time=0.978 ms >> 64 bytes from 50.1.20.26: icmp_seq=3 ttl=64 time=0.967 ms >> 64 bytes from 50.1.20.26: icmp_seq=4 ttl=64 time=1.088 ms >> 64 bytes from 50.1.20.26: icmp_seq=5 ttl=64 time=0.983 ms >> 64 bytes from 50.1.20.26: icmp_seq=6 ttl=64 time=1.007 ms >> 64 bytes from 50.1.20.26: icmp_seq=7 ttl=64 time=1.075 ms >> 64 bytes from 50.1.20.26: icmp_seq=8 ttl=64 time=1.020 ms >> 64 bytes from 50.1.20.26: icmp_seq=9 ttl=64 time=1.044 ms >> 64 bytes from 50.1.20.26: icmp_seq=10 ttl=64 time=1.026 ms >> 64 bytes from 50.1.20.26: icmp_seq=11 ttl=64 time=0.908 ms >> >> That might be considered slow, but the correspondent machine >> is only a Pi2 running >> FreeBSD www.zefox.com 14.0-CURRENT FreeBSD 14.0-CURRENT #3 main-71d2d5adfe: Tue Dec 21 00:23:51 PST 2021 bob@www.zefox.com:/usr/obj/usr/freebsd-src/arm.armv7/sys/GENERIC arm >> >> If the outbound ping is started, an incoming ssh connection established >> and the outbound ping subsequently stopped the running ssh connection >> silently freezes; no disconnect, but no response, not even echo. Some >> tens of seconds later, all inputs were responded to. Tried a second time, >> the stoppage recurred, restarting the outbound ping eventually restored >> responsiveness. >> >> With the outbound ping stopped, an inbound ssh attempt silently failed: >> >> bob@raspberrypi:~ $ ssh -vvv 50.1.20.28 >> OpenSSH_7.9p1 Raspbian-10+deb10u2+rpt1, OpenSSL 1.1.1d 10 Sep 2019 >> debug1: Reading configuration data /etc/ssh/ssh_config >> debug1: /etc/ssh/ssh_config line 19: Applying options for * >> debug2: resolve_canonicalize: hostname 50.1.20.28 is address >> debug2: ssh_connect_direct >> debug1: Connecting to 50.1.20.28 [50.1.20.28] port 22. >> [enter key echoed] >> debug1: connect to address 50.1.20.28 port 22: Connection timed out >> ssh: connect to host 50.1.20.28 port 22: Connection timed out >> bob@raspberrypi:~ $ >> >> Thanks for reading and any insights. If I've omitted useful >> details or tests please indicate. >> > > You have made multiple reports to the arm list for this issue > without anyone having managed to help. This report does have > more comparative context, which might help someone help. > > It may be time to try other lists like freebsd-net and, > possibly, freebsd-hackers or freebsd-stable or > freebsd-current . > > However, the best thing no matter where you go would be > to (approximately) bisect toward the back-to-back FreeBSD > version-pair on, say, stable/13 at which the the problem > goes from not-there to happening. ( stable/13 changes > slower and so has fewer versions to deal with. Also its > KBI may grow but is constrained to otherwise be more > stable [ relative to releng/13.0 ]. So you are less > likely to run into version compatibility problems > for the below suggestion.) > > I'd recommend using kernel and world materials from: > > https://artifact.ci.freebsd.org/snapshot/stable-13/?C=M&O=D > > on a separate microsd card updated from a normal context, > avoiding builds. Remember that older stable/13 worlds can > run on newer kernels generally. So you might only need to > update the kernel after getting an initial, somewhat older > context in place. (It is not obvious if it is a kernel-only > problem or not.) If it is a kernel problem, you might be > able to put down a releng/13.0 world and never change it > during the approximate bisect activity. > > For what https://artifact.ci.freebsd.org/snapshot/ has > available, this avoids having to build the versions. > It also allows checking if your builds are behaving > differently than the official snapshots do. > > https://artifact.ci.freebsd.org/snapshot/ may not be able > to get you to the back-to-back FreeBSD version-pair: the > range might be wider. Sometimes the wider range is enough > by inspection of the types of commmits in the range. So > I'd report whatever range you find wihtout having done > any builds. > > I'll note that I have no problem with connecting via ssh > to a RPi3B running my build of (line split for readability): > > # uname -apKU > FreeBSD Rock64_RPi_4_3_2v1p2 14.0-CURRENT FreeBSD 14.0-CURRENT #28 > main-n252475-e76c0108990b-dirty: Sat Jan 15 23:39:27 PST 2022 > root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA53-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA53 > arm64 aarch64 1400047 1400047 > > I have no stable/13 context set up for a RPi3B, only > stable/13's that have an untuned ZFS context. Still, > I wonder if that might operate well enough to test > the issue, despite the 1 GiByte of RAM limitation. I > may test that later today. Other than needing to put in place my u-boot.bin build that has usb_pgood_delay=2000 built-in, I had no trouble with booting and ssh'ing in to (line split for readability): # uname -apKU FreeBSD CA72_4c8G_ZFS 13.0-STABLE FreeBSD 13.0-STABLE #25 stable/13-n249004-a5f698599560-dirty: Sun Jan 16 15:07:11 PST 2022 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/13S-CA72-nodbg-clang/usr/13S-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1300524 1300524 # ~/fbsd-based-on-what-commit.sh -C /usr/13S-src/ branch: stable/13 merge-base: a5f69859956049b5153b0e1b67f8f4a99622dc6f merge-base: CommitDate: 2022-01-15 12:55:32 +0000 a5f698599560 (HEAD -> stable/13, freebsd/stable/13) Ignore debugger-injected signals left after detaching n249004 (--first-parent --count for merge-base) SIDE NOTE After the above, my patched top reports: Mem: 32504Ki Active, 214888Ki Inact, 393248Ki Wired, 40960B Buf, 321468Ki Free, 75516Ki MaxObsActive, 394108Ki MaxObsWired, 469624Ki MaxObs(Act+Wir+Lndry) ARC: 316408Ki Total, 201575Ki MFU, 111090Ki MRU, 143360B Anon, 1024Ki Header, 2551Ki Other 259140Ki Compressed, 346379Ki Uncompressed, 1.34:1 Ratio Swap: 3584Mi Total, 3584Mi Free, 75516Ki MaxObs(Act+Lndry+SwapUsed), 469624Ki MaxObs(Act+Wir+Lndry+SwapUsed) So it is not an environment I'd want to do buildworld buildkernel on. But it looks to be usable for less memory intensive activities. END SIDE NOTE So I've looked and found (from today): https://artifact.ci.freebsd.org/snapshot/stable-13/371633ece3ae88e3b3d7a028c372d4ac4f72b503/arm64/aarch64/kernel.txz and downloaded it. Then I decided to try it with my normal boot media, leaving world as it is. So: # ls -Tld /boot/ker* drwxr-xr-x 2 root wheel 680 Jan 16 16:49:24 2022 /boot/kernel drwxr-xr-x 2 root wheel 680 Jan 4 23:08:57 2022 /boot/kernel.old # mv /boot/kernel /boot/kernorm # tar -xpf kernel.txz -C / # ls -Tld /boot/ker* drwxr-xr-x 2 root wheel 679 Feb 12 11:14:27 2022 /boot/kernel drwxr-xr-x 2 root wheel 680 Jan 4 23:08:57 2022 /boot/kernel.old drwxr-xr-x 2 root wheel 680 Jan 16 16:49:24 2022 /boot/kernorm (I choose to not replace the system's debug information --that is not stored under /boot/ but in with world files. So I did not download or install kernel-dbg.txz .) So now a reboot with loader defaults (for that boot environment in my context) will use the kernel that I got from: https://artifact.ci.freebsd.org/snapshot/stable-13/. . . [Hmm. Looks like the u-boot.bin is not sufficient to be sure that shutdown -r now will boot the RPi3B. From power-on seems to boot so far. I might need another built-in setting added (or more) in order to allow the RPi3B to shutdown -r now well for the USB3 NVMe based SSD media that I'm using.] Still no trouble connecting and logging-in via ssh. For reference (line split for readability): # uname -apKU FreeBSD CA72_4c8G_ZFS 13.0-STABLE FreeBSD 13.0-STABLE #0 371633e: Sat Feb 12 19:06:49 UTC 2022 root@FreeBSD-stable-13-aarch64-build.jail.ci.FreeBSD.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64 aarch64 1300525 1300524 (I do not have matching source at this point.) Recommended experiment . . . Since I have a context working based on the kernel in: https://artifact.ci.freebsd.org/snapshot/stable-13/371633ece3ae88e3b3d7a028c372d4ac4f72b503/arm64/aarch64/kernel.txz I recommend that you try that exact same kernel in your stable/13 context. I recommend renaming the existing /boot/kernel before expanding the kernel.txz into / and so causing a new /boot/kernel/ to be filled in. If that makes things work after rebooting, then your kernel can be blamed. (More investigation to know more about what is going on in your kernel build.) But if the above does not make things work, that points to investigating alternate worlds from: https://artifact.ci.freebsd.org/snapshot/stable-13/. . . That is a messier context. I only do that with media that I can delete everything on, such as an independent microsd card: chflags -R noschg /mnt/ ; rm -fr /mnt/ ; various tar -xpf ???.txz -C /mnt/ commands --while not booted from the microsd card. Repeat for each snapshot tried. There is a bias to the world not being newer than the kernel. But since stable/13 's 371633ece3ae seems to work in my context, you might be able to hold the kernel invariant and just try different world versions in this messier context. Also: You might be be to find: https://artifact.ci.freebsd.org/snapshot/stable-13/. . . materials for the specific builds that you have been working with and do comparison/contrast with the behavior of your builds that had issues. Note: The above does not consider other networking configuration issues --that might not even be on RPi* devices. I'm not networking literate overall. === Mark Millard marklmi at yahoo.com