Re: sshd signal 11 on -current

From: Mark Millard <marklmi_at_yahoo.com>
Date: Wed, 17 Jan 2024 17:53:43 UTC
On Jan 17, 2024, at 09:34, Mark Millard <marklmi@yahoo.com> wrote:

> On Jan 17, 2024, at 08:00, bob prohaska <fbsd@www.zefox.net> wrote:
> 
>> A Pi4 running -current reported:
>> 
>> Jan 13 16:23:10 nemesis kernel: pid 53604 (sshd), jid 0, uid 22: exited on signal 11 (no core dump - bad address)
>> repeatedly.
> 
> I assume that the pid changed from message to message, in addition
> to the time but the rest of each message text matched exactly.
> 
>> There's no obvious  disruption of operation, existing
>> ssh connections seem undisturbed.
> 
> I'll 1st remind what a process tree for sshd looks like
> (you need not be using root and likely would be using tip
> instead of ps):
> 
> 1546  -  Is       0:00.00 |-- sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups (sshd)
> 1628  -  Ss       0:00.10 | |-- sshd: root@pts/1 (sshd)
> 1642  1  Ss       0:00.04 | | `-- -sh (sh)
> 9531  1  R+       0:00.00 | |   `-- ps -xd
> 7512  -  Is       0:00.02 | `-- sshd: root@pts/0 (sshd)
> 7515  0  Is+      0:00.01 |   `-- -sh (sh)
> 
> The lack of disruption indicates that one of the "@pts/"
> sshd's got the signal but the other @pts/ ones did not,
> nor did the "/usr/sbin/sshd [listener]" sshd get such a
> signal, if I understand right.
> 
> Too bad there are no core files. Also, the system may lack
> symbols or debug information to make backtraces readable
> (if there was a core to look at).
> 
>> The messages occur in a group of
>> about fifteen, one second apart. The machine has been up about
>> three days, with only one occurrence so far. 
>> 
>> Can't tell if this is new or old behavior, I've never manually 
>> checked /var/log/messages for sshd errors until now and didn't
>> save the security run email from the 13th.. 
>> 
>> Might it be of significance?
> 
> The "exited on signal 11" would possibly lead to the contained
> shell (tsch in your case?) being killed. It would not be via
> SIGHUP. The tip run might also be killed, leaving the lock file
> around. Trying:
> 
> # ps -xd
> 
> on nemesis before starting up a new tip should indicate if there
> is a tip running that is no longer (indirectly) under a
> "sshd: root@pts/" process.
> 
> 
> It is a very good find. At the moment I do not see a way to
> end up with a backtrace showing what was involved when the
> signal happened. But it highly likely that you have demonstrated
> the presence of an error of some kind:
> 
>     Num   Name             Default Action       Description
> . . .
>     11    SIGSEGV          create core image    segmentation violation
> 
> should not happen.
> 

FYI in case it is unclear:

Once the "sshd: root@pts/" process has been killed, it is
no longer attempting to read what is send by the other
side of the specific ssh connection. So the next write by
the other side will get a SIGPIPE that it ends up handling.
This ties back to prior notes.

===
Mark Millard
marklmi at yahoo.com