Re: Troubles building world on stable/13

From: Mark Millard <marklmi_at_yahoo.com>
Date: Thu, 03 Feb 2022 00:18:07 UTC
On 2022-Feb-2, at 14:32, bob prohaska <fbsd@www.zefox.net> wrote:

> The latest Pi3 single-user -j1 buildworld stopped with clang error 139.
> Running the .sh and .cpp files produced an error. The .sh was
> re-run under lldb and backtraced.

I'll see if I can find any interesting information based on
the "fault address: 0x5" and source code related to the
backtrace reporting .

> The output is at
> http://www.zefox.net/~fbsd/rpi3/clang_trouble/20220202/lldb_session
> along with the buildworld log and related files in the same directory.

Your 20220202/readme reports:

QUOTE
The first attempt to run  lldb-gtest-all-fe760c.sh  finished with exit
code zero.  A subsequent try produced the expected output reported
here in lldb_session.
END QUOTE

(You had also reported (off list) a recent prior failure where the
.sh/.cpp pair did not repeat the failure when you tired it.)

Well:

http://www.zefox.net/~fbsd/rpi3/clang_trouble/20220202/gtest-all-fe760c.o

seems to be a possibly valid compile result to go along with the
run (under lldb) that finished normally (exit code zero).

I can not see the dates/times on the files over the web interface.
Can you report the output of something like a

# ls -Tla

that lists the dates from the original *gtest-all* files (if they
have been preserved in some copies some place)? As stands I'm
making guesses based on not knowing the time order of the files.

> Hopefully it sheds a glimmer of light.

We will see if I notice anything intersting looking at
source code related to the frames of the backtrace for
the lldb based failure reporting.

The variable results for the (lldb) .sh/.cpp runs for the
same file pair suggests possibilities like race conditions,
use of uninitialized memory, use of deallocated-and-reused
memory (now in-use for something else), flaky hardware.

(That failure only sometimes happened with the .sh/cpp
pair means that no processing of include files was involved:
the .cpp of the pair is self contained.)

I'll note that the buildworld.log 's :

1.      /usr/obj/usr/src/arm64.aarch64/tmp/usr/include/private/gtest/internal/gtest-type-util.h:899:21: current parser token '{'
2.      /usr/obj/usr/src/arm64.aarch64/tmp/usr/include/private/gtest/internal/gtest-type-util.h:58:1: parsing namespace 'testing'

is the exact same places in the original source code
as was reported in other such logs for failures while
processing gtest-all.cc .

> Not sure what to try next. It is possible to build kernel-toolchain and
> new kernels,

kernel-toolchain builds a subset of the toolchain that
buildworld builds. I'm unsure if the buildworld
completed building what kernel-toolchain builds or not.

> if that might be useful.

For now, I need to explore source code.

> At present the machine reports
> bob@pelorus:/usr/src % uname -apKU
> FreeBSD pelorus.zefox.org 13.0-STABLE FreeBSD 13.0-STABLE #0 stable/13-n249120-dee0854a009: Sat Jan 22 23:32:23 PST 2022     bob@pelorus.zefox.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC  arm64 aarch64 1300525 1300523

Thanks for the reference to stable/13 's dee0854a009 .

(This was still the "boot -s" single user context for
the testing. Note is for anyone reading this.)

FYI: the variable result makes the corrupted-block
hypothesis less likely. You have seen the failure via
the original files and via the .cpp of the .sh/.cpp
pair. You appear to have had a successful build
with the .cpp pair --and an unsuccessful one. Also
no stage under lldb reported illegal instructions or
the like.

===
Mark Millard
marklmi at yahoo.com