Re: sshd signal 11 on -current

From: bob prohaska <fbsd_at_www.zefox.net>
Date: Fri, 19 Jan 2024 18:57:19 UTC
On Thu, Jan 18, 2024 at 02:11:03PM -0800, Mark Millard wrote:
> 
> This means I'm now focused on just:
> 
> MACHINE<->lan<->router<->switchA<->ns2.zefox.net
> 
> and possibly eliminating more stages as not required
> to get the problem.
> 
> Now can lan and/or router be eliminated by moving
> one of "Win10 laptp" or "pi4 RaspIS workstation"
> temporarily? Moving to switchA would be testing not
> having either lan or router involved:

Not easily. I'd hesitate to configure either host
for exposure to the public WAN. I could move ns2,
but would want to configure a replacement first.
 
> MACHINE<->switchA<->ns2.zefox.net
> 
> Does such a move lead to still having the MAC
> failure? To no MAC failure?
> 

ns2.zefox.net is on switchA with four other FreeBSD
machines. Two of them, www.zefox.org (aarch64-current) 
and www.zefox.net (armv7 12.4.4) accept and hold ssh
connections without error from any host on WAN or LAN.
They also run the "grep -i ssh /var/log/messages" command
but it dosen't elicit the "corrupt MAC..." message 
and disconnect. 

> (Switches, routers, and the like do sometimes have
> errors that mess up just some protocol, not
> everything.)
> 

But wouldn't that affect all hosts? in this case 
ns2.zefox.net and ns1.zefox.net seem to be affected. 
No other hosts (so far) have reported that particular
error. 

> > It's somewhat curious that going from RPi4 workstation
> > vi ssh to www.zefox.net and then ssh to ns2 does not
> > report corrupted MAC, but both machines run armv7
> > FreeBSD 12.4.4
> 
> So, listing the nested(!) ssh sequence more fully, that was(?):
> 
> "pi4 RasPiOS workstation"<->wifi<->lan<->router<->switchA<->www.zefox.net<->switchA<->ns2.zefox.net
> 
> Or, being more explicit about the nesting:
> 
> "pi4 RasPiOS workstation"<->wifi<->lan<->router<->switchA<->www.zefox.net
> 
> then, nested:
> 
> www.zefox.net<->switchA<->ns2.zefox.net
> 
> And it ends up getting the same result (no failure)
> as just doing:
> 
> www.zefox.net<->switchA<->ns2.zefox.net
> 
> without the involvement of any other MACHINE, if I
> understand right.
> 

Subject to the proviso that www.zefox.net is headless
and so must be accessed via ssh from some other host.
The choice of host seems to make no difference.

> Another related test would be by temporarily moving
> www.zefox.net to form one of:
> 
> www.zefox.net<->wifi<->lan<->router<->switchA<->www.zefox.net
> or:
> www.zefox.net<->lan<->router<->switchA<->www.zefox.net
> 
> Does such still not get a failure? Or does it then fail?
> 
> 
> > A three hop connection (RPiOS > www.zefox.net > ns2.zefox.net)
> > somehow inhibits the corrupted MAC error.  Evidently
> > there's something special going on among the hosts.
> > 
> >> Could you boot a FreeBSD microsd card in the pi4
> >> instead and try it as a FreeBSD system to see if
> >> it still has the problem (while in its usual
> >> place)? I'm still looking for the same hardware
> >> context but running a distinct but known OS
> >> context to see if the problem persists.
> >> 
> > 
> > Realistically I should probably just set up a microSD using
> > 14-Release and configure it as ns2.zefox.net.
> 
> That is going a different direction than I asked about.
> It does not eliminate RasPiOS from involvement on the
> same hardware it was originally used on.
> 
> Both types of tests have their uses. But my focus for
> now in this area is on the replacement of RasPiOS by
> a FreeBSD version to eliminate RasPiOS's involvement
> for some tests but using the same hardware as before
> the replacement (other than boot media).
> 

I'm missing the point. RasPiOS behaves much like
Win10, FreeBSD-current and an armv7 instance of
14.0-RELEASE-p4 FreeBSD elsewhere in my network. The
mischief seems tightly confined to the legacy 12.4.4
armv7 hosts ns1 and ns2.  

> armv7 is Tier 2 for 13.x and 14.x
> 
> armv7 is projected to stay tier 2 for the later official 15.x
> ( stable/15 and such ) but might not. It is possible that
> armv7 might only be supported via lib32/chroot/jail use for
> aarch64 that also supports EL0 AArch32 --and so AArch32/armv7
> could end up not being bootable any more by then.
> 
> aarch64 is Tier 1 for 13.x and 14.x
> aarch64 is projected to stay tier 1 for the later official 15.x
> aarch64 hardware that does not support AArch32 at all would
> still be Tier 1.
> (The aarch64 tier 1 claims are somewhat strong for embedded
> aarch64. A possibility being that, at some point, a 1 GiByte
> RAM aarch64 might not be able to self-host buildworld
> buildkernel or various port->package builds even with
> substantial swap space --but that would not be likely to
> change the Tier 1 status of aarch64, for example.)

That summary makes me think abandoning armv7 in favor of
aarch64 is best. Even if it can't self-host, packages and
updates will be available. 

I do apologize for the fuss of this thread.  When first 
reported I thought there might be something potentially 
malicious going on. Hobby hosts make good targets for 
experimental attacks and some of the sshd errors looked
suspicious to a naive eye. 

Thanks to all (especially Mark!) who followed this saga.

bob prohaska