Re: Ping troubles, was Re: Troubles building world on stable/13

From: Mark Millard <marklmi_at_yahoo.com>
Date: Fri, 21 Jan 2022 20:54:02 UTC
On 2022-Jan-21, at 08:34, bob prohaska <fbsd@www.zefox.net> wrote:

> On Thu, Jan 20, 2022 at 10:00:34PM -0800, Mark Millard wrote:
>> On 2022-Jan-20, at 19:16, bob prohaska <fbsd@www.zefox.net> wrote:
>> 
>>> ********************
>>> 
>>> PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
>>> Preprocessed source(s) and associated run script(s) are located at:
>>> c++: note: diagnostic msg: /tmp/gmock-all-836ef8.cpp
>>> c++: note: diagnostic msg: /tmp/gmock-all-836ef8.sh
>>> c++: note: diagnostic msg: 
>>> 
>>> ********************
>>> *** [gmock-all.o] Error code 139
>> 
>> So: SIGSEGV (signal 11)
>> 
> 
> Aha! I didn't make the connection at all.
> 
>>> make[4]: stopped in /usr/src/lib
>>> --- all_subdir_lib/clang ---
>>> 
>>> FWIW, filemon is enabled in /boot/loader.conf and the build command was
>>> make -j2 -DWITH_META_MODE  buildworld > buildworld.log
>>> 
> 
>> 
>> "uname -apKU" output from the building environment?
>> 
> 
> root@pelorus:/usr/src # uname -apKU
> FreeBSD pelorus.zefox.org 13.0-STABLE FreeBSD 13.0-STABLE #6 stable/13-n248948-9418a626103: Thu Jan 13 12:12:06 PST 2022     bob@pelorus.zefox.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC  arm64 aarch64 1300523 1300523
> 
>> Commit identification for the /usr/src/ for stable/13
>> that is being built?
> 
> 
> Not sure what's meant by "commit identification, hopefully it's somewhere in
> the uname output. Git status reports:

The commit-id 9418a626103 in the uname -apKU output
was for what you are already running. I was also
after the id for what you are trying to build.

I have multiple source trees around, one for stable/13 .
I use:

# more ~/fbsd-based-on-what-commit.sh 
#! /bin/sh
branch="`git $* branch --show-current`" \
&& echo "branch: $branch" \
&& base="`git $* merge-base freebsd/$branch HEAD`" \
&& git $* log --oneline --no-color $base..HEAD \
&& base_date="`TZ=UTC git $* log --format=fuller --date=iso-local --no-color $base^..$base | grep CommitDate:`" \
&& echo "merge-base: $base" \
&& echo "merge-base: $base_date" \
&& git $* log --oneline --no-color $base^..$base \
&& echo "n`git $* rev-list --first-parent --count $base` (--first-parent --count for merge-base)"

like so (for my stable/13 source tree, your path would
likely be /usr/src instead):

# ~/fbsd-based-on-what-commit.sh -C /usr/13S-src/
branch: stable/13
merge-base: a5f69859956049b5153b0e1b67f8f4a99622dc6f
merge-base: CommitDate: 2022-01-15 12:55:32 +0000
a5f698599560 (HEAD -> stable/13, freebsd/stable/13) Ignore debugger-injected signals left after detaching
n249004 (--first-parent --count for merge-base)

> root@pelorus:/usr/src # git status
> On branch stable/13
> Your branch is up to date with 'freebsd/stable/13'.
> 
> Untracked files:
>  (use "git add <file>..." to include in what will be committed)
>        buildkernel.log
>        buildscript
>        buildworld.log
>        installkernel.log
>        installmmcscript
>        installscript
>        installworld.log
>        mmcscript
>        poudriereupscript
>        poudup.log
>        rpi4script
> 
> 
> It took 2.17 seconds to enumerate untracked files. 'status -uno'
> may speed it up, but you have to be careful not to forget to add
> new files yourself (see 'git help status').
> nothing added to commit but untracked files present (use "git add" to track)

That output does not include a commit-id.

>> Any console messages? dmesg -a output of interest?
>> /var/log/messasges content of interest?
>> 
> 
> Nothing obvious, in particular no "killed, out of swap" type messages.
> 
>> Any messages of interest somewhat earlier in the
>> buildworld.log ?
>> 
> 
> Not that I can recognize. I started to put the buildworld.log file on my
> public webserver and was surprised to find that sftp didn't connect.
> Trying to connect from the server to pelorus so as to use get failed
> likewise. 
> 
> Next I tried to ping from the webserver to the stable/13 machine, no answer.
> Finally I started a ping from stable/13 to the webserver, at which point
> the opposing ping session woke up. That seems most strange.
> 
> With ping running once per second from webserver to stable/13 usually a 
> single packet is returned. Starting a ping in the reverse direction at
> 10 second intervals _usually_ results in a single packet reply; occasionally
> none or two. It isn't entirely consistent. 
> 
> Both machines are on wired public networks, so between them there is no
> NAT involved. Packet losses correspond roughly to rate; Most of the
> 1-second packets are lost, most of the 10-second packets are answered.  
> 
>> Does the problem repeat via using the files:
>> 
>> /tmp/gmock-all-836ef8.cpp
>> /tmp/gmock-all-836ef8.sh
>> 
> 
> Not sure how to try that, but it seems to repeat on a simple repeat of
> the buildworld command.

The .sh compiles the .cpp using the options involved when the
failure happened. Copy the files to an appropriate place and
then run the .sh script.

>> on that RPi3? Elsewhere that has more resources, such
>> as more RAM?
> 
> I've only this one machine running stable/13, but a Pi3 and a Pi4 running
> -current don't seem to be affected, nor do several pi2's running stable/12
> ARMv7.

System-clang is 13 on all those for now. The .sh and .cpp test
should be executable on all the machines.

> The troublesome machine has been updated many times using git pull followed
> by buildworld -DWITH_META_MODE. Might it be necessary to occasionally run
> one of the cleaning targets? In other words, could META_MODE permit obsolete 
> files to persist across builds and reboots? 
> 

I only rarely rm -fr in a build tree area to start from scratch.
Nothing wrong with such an experiment.

===
Mark Millard
marklmi at yahoo.com