Re: Troubles building world on stable/13 [an experiment-environment that leaves existing things alone]

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 05 Feb 2022 01:00:05 UTC
On 2022-Feb-4, at 16:08, bob prohaska <fbsd@www.zefox.net> wrote:

> On Fri, Feb 04, 2022 at 02:44:01PM -0800, Mark Millard wrote:
>> On 2022-Feb-4, at 13:44, bob prohaska <fbsd@www.zefox.net> wrote:
>> 
>>> On Thu, Feb 03, 2022 at 02:05:38PM -0800, Mark Millard wrote:
>>> [chroot setup snipped]
>>>> 
>>>> It would be good to know what experiments produce relative to
>>>> failures vs. successes: all one? the other? a mix? Part of the
>>>> point here is to test builds from official FreeBSD build
>>>> servers instead of personal builds.
>>>> 
>>> 
>>> I placed the chroot directory under a regular (wheel group) login,
>>> but otherwise followed the instructions successfully, I think.
>>> Since all the installed files were owned by root, I used the
>>> root login to work in the chroot.
>>> 
>>> Five attempts to run the .sh/.cpp file produced all successful
>>> results.
>> 
>> Interesting. Currently it looks like your specific compiler build
>> and the ASLR (Address Space Layout Randomization) somehow
>> interact, leading to sometimes getting the SIGSEGV's.
>> 
>> I have only reproduced the problem with the copy of your c++
>> -- but it stops reproducing in my environment when I disable
>> the system's ASLR mode of operation.
>> 
> 
> It sounds like I simply have a corrupted c++. Perhaps just
> set the old version aside and copy from the chroot directory
> to /usr/bin ? Granted, other things might be wrong as well. 

I'm not so sure. My expectation is that if you first
do (presuming not already in place at the time):

# sysctl kern.elf64.aslr.enable=0

and then to your buildworld buildkernel it will just work
-- using your exising c++ compiler (system clang/clang++).

Note: There may be a way to set a specific file like
your your c++ to force ASLR to not be enabled for it
when it runs. But I've not researched that (yet?).

So far I've not had any example of failure with that
setting in place.

It seems very odd that such a setting would "uncorrupt"
your clang/clang++ build (used under the name c++). I'm
not aware of the compiler doing anything like the ntpd
did, for which having ASLR enabled as a problem.

For far as I can tell, the setting changes the detailed
behavior of mmap calls (including implicit ones in
library code and such).

I've not found a way to look at the context just before
the failure (without disturbing things enough via debugger
activity that the failure does not happen). It is likely
that I'll not manage to get such evidence that includes
the failure.

I worry that the failures seen with your c++ involves a
kernel bug but I do not see a way to investigate that.


>> You got later messages about the ASLR disabling experiments
>> that I did.
>> 
>>> Next I tried to use lldb. That produced the usual 
>>> preliminary output. However, on issuing the run command I got
>>> 
>>> error: DupDescriptor-open failed: No such file or directory
>>> 
>> 
>> That message happens when devfs has not been set up
>> for the dev directory inside the chroot. In my
>> instructions, before the chroot command, there was:
>> 
>> # mount -tdevfs devfs ~/13S-chroot/dev
>> 
>> that set up the dev that was in 13S-chroot/ . It
>> does not survive reboots and needs to be done again
>> after a reboot --from outside any chroot session.
>> In my context, the following shows some checkable
>> consequences of a correct, active devfs mount:
>> 
>> # df -m
>> Filesystem          1M-blocks   Used  Avail Capacity  Mounted on
>> /dev/gpt/Rock64root    823229 194087 563283    26%    /
>> devfs                       0      0      0   100%    /dev
>> 
>> # mount -t devfs devfs ~/13S-chroot/dev
>> 
>> # df -m
>> Filesystem          1M-blocks   Used  Avail Capacity  Mounted on
>> /dev/gpt/Rock64root    823229 194087 563283    26%    /
>> devfs                       0      0      0   100%    /dev
>> devfs                       0      0      0   100%    /root/13S-chroot/dev
>> 
> When repeated with the instructions followed correctly lldb works,
> the result is exit with status 0, success. Nothing to see here....

Good to know. Thanks.

I expect that you can use:

# sysctl kern.elf64.aslr.enable=0

to make progress (buildworld buildkernel). You
might be lucky enough that after installing
the update the problematical combination will
not happen.

Another option might be to use a copy of the
compiler from the chroot area to replace the
normal system's copies, possibly renaming the
old ones first (various names), including
deal with clang.debug as well. This presumes
that the 2 stable/13 builds are sufficiently
compatible for such a substitution to work.

===
Mark Millard
marklmi at yahoo.com