Re: Troubles building world on stable/13 [an experiment-environment that leaves existing things alone]

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 05 Feb 2022 02:54:03 UTC
On 2022-Feb-4, at 18:06, bob prohaska <fbsd@www.zefox.net> wrote:

> On Fri, Feb 04, 2022 at 05:00:05PM -0800, Mark Millard wrote:
>> On 2022-Feb-4, at 16:08, bob prohaska <fbsd@www.zefox.net> wrote:
>> 
>>> On Fri, Feb 04, 2022 at 02:44:01PM -0800, Mark Millard wrote:
>>>> On 2022-Feb-4, at 13:44, bob prohaska <fbsd@www.zefox.net> wrote:
>>>> 
>>> 
>>> It sounds like I simply have a corrupted c++. Perhaps just
>>> set the old version aside and copy from the chroot directory
>>> to /usr/bin ? Granted, other things might be wrong as well. 
>> 
>> I'm not so sure. My expectation is that if you first
>> do (presuming not already in place at the time):
>> 
>> # sysctl kern.elf64.aslr.enable=0
>> 
> On checking, that's already the case. I didn't change it
> knowingly, likely it's been zero all along.

So you get the failures even when:

# sysctl kern.elf64.aslr.enable
kern.elf64.aslr.enable: 0

?

That is different than in my context. I've never
gotten the failure for the above type of context.

It may be that for stable/13 's kernel the
default is 0 .

I did test and one can actually set:

kern.elf64.aslr.enable

from inside a chroot context, at least
when one generally works as root. It
changed the system's overall
kern.elf64.aslr.enable status.

>> and then to your buildworld buildkernel it will just work
>> -- using your exising c++ compiler (system clang/clang++).
>> 
> Well, that hasn't happened yet. On the theme that if a
> problem won't get better find out what makes it worse,
> I've set it to 1 and am re-running buildworld with -j1.

Okay. That you get the failures even when
kern.elf64.aslr.enable is 0 means that my
existing context for investigation is
still problematical.

>> 
>> It seems very odd that such a setting would "uncorrupt"
>> your clang/clang++ build (used under the name c++). I'm
>> not aware of the compiler doing anything like the ntpd
>> did, for which having ASLR enabled as a problem.
>> 
>> For far as I can tell, the setting changes the detailed
>> behavior of mmap calls (including implicit ones in
>> library code and such).
>> 
>> I've not found a way to look at the context just before
>> the failure (without disturbing things enough via debugger
>> activity that the failure does not happen). It is likely
>> that I'll not manage to get such evidence that includes
>> the failure.
>> 
>> I worry that the failures seen with your c++ involves a
>> kernel bug but I do not see a way to investigate that.
> 
> I share your feeling that something isn't right but am
> utterly ill equipped to posit what that might be. The 
> most obvious recent strangeness with outbound network
> traffic not working unless accompanied by an outbound
> ping is most peculiar. 
> 
> 
> Might this be a reason to try Peter Holm's stress2 suite? I
> haven't played with it in a long time, not sure it'll even
> compile now. "Success" in stress2 terms is a kernel panic.

main [so: 14] has:

# ls -Tld /usr/main-src/tools/test/stress2/
drwxr-xr-x  8 root  wheel  33 Apr 28 15:20:54 2021 /usr/main-src/tools/test/stress2/

But I'm not sure if it would be of any help or not.
It may not have tests for causing vm.aslr_restarts
to increment during operation and then seeing
what works vs. what does not.

stable/13 and before do not seem to have stress2/ .

>> Another option might be to use a copy of the
>> compiler from the chroot area to replace the
>> normal system's copies, possibly renaming the
>> old ones first (various names), including
>> deal with clang.debug as well. This presumes
>> that the 2 stable/13 builds are sufficiently
>> compatible for such a substitution to work.
> 
> That sounds worth a try if no better ideas emerge.
> 




===
Mark Millard
marklmi at yahoo.com