Re: sshd signal 11 on -current

From: Mark Millard <marklmi_at_yahoo.com>
Date: Fri, 19 Jan 2024 19:40:55 UTC
On Jan 19, 2024, at 10:57, bob prohaska <fbsd@www.zefox.net> wrote:

> On Thu, Jan 18, 2024 at 02:11:03PM -0800, Mark Millard wrote:
>> 
>> This means I'm now focused on just:
>> 
>> MACHINE<->lan<->router<->switchA<->ns2.zefox.net
>> 
>> and possibly eliminating more stages as not required
>> to get the problem.
>> 
>> Now can lan and/or router be eliminated by moving
>> one of "Win10 laptp" or "pi4 RaspIS workstation"
>> temporarily? Moving to switchA would be testing not
>> having either lan or router involved:
> 
> Not easily. I'd hesitate to configure either host
> for exposure to the public WAN. I could move ns2,
> but would want to configure a replacement first.

You may need to do local experiments where the
dsl_modem is not connected? Or may be move
switchA to the other side of the router temporarily?

Problem isolation may require things be off the
public WAN some of the time.

>> MACHINE<->switchA<->ns2.zefox.net
>> 
>> Does such a move lead to still having the MAC
>> failure? To no MAC failure?
>> 
> 
> ns2.zefox.net is on switchA with four other FreeBSD
> machines. Two of them, www.zefox.org (aarch64-current) 
> and www.zefox.net (armv7 12.4.4) accept and hold ssh
> connections without error from any host on WAN or LAN.
> They also run the "grep -i ssh /var/log/messages" command
> but it dosen't elicit the "corrupt MAC..." message 
> and disconnect.

Which tells me nothing about if RasPiOS or Win10
would also work. Those are all FreeBSD-only
combinations.

Remember that macOS being involved also did not get
the failure despite its context also being:

MACHINE<->wifi<->lan<->router<->switchA<->ns2.zefox.net

The only difference vs. RasPiOS was it being a macOS
system and hardware.

>> (Switches, routers, and the like do sometimes have
>> errors that mess up just some protocol, not
>> everything.)
>> 
> 
> But wouldn't that affect all hosts?

That is what I'm trying to test, not assume. The
OS's need not do all the protocol the same. So
the switches need not be processing the exact same
packets.

> in this case 
> ns2.zefox.net and ns1.zefox.net seem to be affected. 
> No other hosts (so far) have reported that particular
> error.

If you are unable to isolate the smallest configurations
that fail and which Operating System combinations fail
I do not see how to get evidence for the range of
observed oddities.

>>> It's somewhat curious that going from RPi4 workstation
>>> vi ssh to www.zefox.net and then ssh to ns2 does not
>>> report corrupted MAC, but both machines run armv7
>>> FreeBSD 12.4.4
>> 
>> So, listing the nested(!) ssh sequence more fully, that was(?):
>> 
>> "pi4 RasPiOS workstation"<->wifi<->lan<->router<->switchA<->www.zefox.net<->switchA<->ns2.zefox.net
>> 
>> Or, being more explicit about the nesting:
>> 
>> "pi4 RasPiOS workstation"<->wifi<->lan<->router<->switchA<->www.zefox.net
>> 
>> then, nested:
>> 
>> www.zefox.net<->switchA<->ns2.zefox.net
>> 
>> And it ends up getting the same result (no failure)
>> as just doing:
>> 
>> www.zefox.net<->switchA<->ns2.zefox.net
>> 
>> without the involvement of any other MACHINE, if I
>> understand right.
>> 
> 
> Subject to the proviso that www.zefox.net is headless
> and so must be accessed via ssh from some other host.

No tip session attached to www.zefox.net ? That
connection type is not via ssh to www.zefox.net .

> The choice of host seems to make no difference.

macOS vs. RasPiOS/Win10 ?

>> Another related test would be by temporarily moving
>> www.zefox.net to form one of:
>> 
>> www.zefox.net<->wifi<->lan<->router<->switchA<->www.zefox.net
>> or:
>> www.zefox.net<->lan<->router<->switchA<->www.zefox.net
>> 
>> Does such still not get a failure? Or does it then fail?
>> 
>> 
>>> A three hop connection (RPiOS > www.zefox.net > ns2.zefox.net)
>>> somehow inhibits the corrupted MAC error.  Evidently
>>> there's something special going on among the hosts.
>>> 
>>>> Could you boot a FreeBSD microsd card in the pi4
>>>> instead and try it as a FreeBSD system to see if
>>>> it still has the problem (while in its usual
>>>> place)? I'm still looking for the same hardware
>>>> context but running a distinct but known OS
>>>> context to see if the problem persists.
>>>> 
>>> 
>>> Realistically I should probably just set up a microSD using
>>> 14-Release and configure it as ns2.zefox.net.
>> 
>> That is going a different direction than I asked about.
>> It does not eliminate RasPiOS from involvement on the
>> same hardware it was originally used on.
>> 
>> Both types of tests have their uses. But my focus for
>> now in this area is on the replacement of RasPiOS by
>> a FreeBSD version to eliminate RasPiOS's involvement
>> for some tests but using the same hardware as before
>> the replacement (other than boot media).
>> 
> 
> I'm missing the point. RasPiOS behaves much like
> Win10,

But not like macOS. As I understand, with macOS
involved, no errors have been able to be produced.

> FreeBSD-current and an armv7 instance of
> 14.0-RELEASE-p4 FreeBSD elsewhere in my network. The
> mischief seems tightly confined to the legacy 12.4.4
> armv7 hosts ns1 and ns2.  

Not true of using macOS involvement avoids the problem
completely.

>> armv7 is Tier 2 for 13.x and 14.x
>> 
>> armv7 is projected to stay tier 2 for the later official 15.x
>> ( stable/15 and such ) but might not. It is possible that
>> armv7 might only be supported via lib32/chroot/jail use for
>> aarch64 that also supports EL0 AArch32 --and so AArch32/armv7
>> could end up not being bootable any more by then.
>> 
>> aarch64 is Tier 1 for 13.x and 14.x
>> aarch64 is projected to stay tier 1 for the later official 15.x
>> aarch64 hardware that does not support AArch32 at all would
>> still be Tier 1.
>> (The aarch64 tier 1 claims are somewhat strong for embedded
>> aarch64. A possibility being that, at some point, a 1 GiByte
>> RAM aarch64 might not be able to self-host buildworld
>> buildkernel or various port->package builds even with
>> substantial swap space --but that would not be likely to
>> change the Tier 1 status of aarch64, for example.)
> 
> That summary makes me think abandoning armv7 in favor of
> aarch64 is best. Even if it can't self-host, packages and
> updates will be available.

You could have a builder RPi4B (or other) system with more
RAM, even 8 GiBytes.

> I do apologize for the fuss of this thread.  When first 
> reported I thought there might be something potentially 
> malicious going on. Hobby hosts make good targets for 
> experimental attacks and some of the sshd errors looked
> suspicious to a naive eye. 
> 
> Thanks to all (especially Mark!) who followed this saga.





===
Mark Millard
marklmi at yahoo.com