Re: Troubles building world on stable/13 [an experiment-environment that leaves existing things alone]

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 05 Feb 2022 03:18:06 UTC
On 2022-Feb-4, at 18:54, Mark Millard <marklmi@yahoo.com> wrote:

> On 2022-Feb-4, at 18:06, bob prohaska <fbsd@www.zefox.net> wrote:
> 
>> On Fri, Feb 04, 2022 at 05:00:05PM -0800, Mark Millard wrote:
>>> On 2022-Feb-4, at 16:08, bob prohaska <fbsd@www.zefox.net> wrote:
>>> 
>>>> On Fri, Feb 04, 2022 at 02:44:01PM -0800, Mark Millard wrote:
>>>>> On 2022-Feb-4, at 13:44, bob prohaska <fbsd@www.zefox.net> wrote:
>>>>> 
>>>> 
>>>> It sounds like I simply have a corrupted c++. Perhaps just
>>>> set the old version aside and copy from the chroot directory
>>>> to /usr/bin ? Granted, other things might be wrong as well. 
>>> 
>>> I'm not so sure. My expectation is that if you first
>>> do (presuming not already in place at the time):
>>> 
>>> # sysctl kern.elf64.aslr.enable=0
>>> 
>> On checking, that's already the case. I didn't change it
>> knowingly, likely it's been zero all along.
> 
> So you get the failures even when:
> 
> # sysctl kern.elf64.aslr.enable
> kern.elf64.aslr.enable: 0
> 
> ?
> 
> That is different than in my context. I've never
> gotten the failure for the above type of context.
> 
> It may be that for stable/13 's kernel the
> default is 0 .

I looked at the source and it does default to 0 for
stable/13 's source vintage that I have.

It is too late now for an immediate test, but at
some point after a reboot that has the value 0
in kern.elf64.aslr.enable still, try looking at:

# sysctl vm.aslr_restarts

before and after the .sh/.cpp testing that shows
failures. If the value becomes non-zero at any
point then some ASLR activity was attempted
despite the 0 in kern.elf64.aslr.enable .

> I did test and one can actually set:
> 
> kern.elf64.aslr.enable
> 
> from inside a chroot context, at least
> when one generally works as root. It
> changed the system's overall
> kern.elf64.aslr.enable status.
> 
>>> and then to your buildworld buildkernel it will just work
>>> -- using your exising c++ compiler (system clang/clang++).
>>> 
>> Well, that hasn't happened yet. On the theme that if a
>> problem won't get better find out what makes it worse,
>> I've set it to 1 and am re-running buildworld with -j1.
> 
> Okay. That you get the failures even when
> kern.elf64.aslr.enable is 0 means that my
> existing context for investigation is
> still problematical.
> 
>>> 
>>> It seems very odd that such a setting would "uncorrupt"
>>> your clang/clang++ build (used under the name c++). I'm
>>> not aware of the compiler doing anything like the ntpd
>>> did, for which having ASLR enabled as a problem.
>>> 
>>> For far as I can tell, the setting changes the detailed
>>> behavior of mmap calls (including implicit ones in
>>> library code and such).
>>> 
>>> I've not found a way to look at the context just before
>>> the failure (without disturbing things enough via debugger
>>> activity that the failure does not happen). It is likely
>>> that I'll not manage to get such evidence that includes
>>> the failure.
>>> 
>>> I worry that the failures seen with your c++ involves a
>>> kernel bug but I do not see a way to investigate that.
>> 
>> I share your feeling that something isn't right but am
>> utterly ill equipped to posit what that might be. The 
>> most obvious recent strangeness with outbound network
>> traffic not working unless accompanied by an outbound
>> ping is most peculiar. 
>> 
>> 
>> Might this be a reason to try Peter Holm's stress2 suite? I
>> haven't played with it in a long time, not sure it'll even
>> compile now. "Success" in stress2 terms is a kernel panic.
> 
> main [so: 14] has:
> 
> # ls -Tld /usr/main-src/tools/test/stress2/
> drwxr-xr-x  8 root  wheel  33 Apr 28 15:20:54 2021 /usr/main-src/tools/test/stress2/
> 
> But I'm not sure if it would be of any help or not.
> It may not have tests for causing vm.aslr_restarts
> to increment during operation and then seeing
> what works vs. what does not.
> 
> stable/13 and before do not seem to have stress2/ .
> 
>>> Another option might be to use a copy of the
>>> compiler from the chroot area to replace the
>>> normal system's copies, possibly renaming the
>>> old ones first (various names), including
>>> deal with clang.debug as well. This presumes
>>> that the 2 stable/13 builds are sufficiently
>>> compatible for such a substitution to work.
>> 
>> That sounds worth a try if no better ideas emerge.
>> 





===
Mark Millard
marklmi at yahoo.com