Re: armv7 kyua runs via chroot on aarch64: zfs tests leave behind processes from timed out tests

From: Mark Millard <marklmi_at_yahoo.com>
Date: Mon, 07 Aug 2023 19:40:14 UTC
On Aug 7, 2023, at 11:29, Enji Cooper <yaneurabeya@gmail.com> wrote:
> 
> 
>> On Aug 3, 2023, at 10:20 AM, Mark Millard <marklmi@yahoo.com> wrote:
>> 
>> .. .
> 
> Hi Mark,
> Could you please submit bugs and CC freebsd-testing or another appropriate mailing list? It looks like there are some Arm64 architecture specific issues that need to be addressed based on the limited information I have from these emails.
> Do you have DEADLKRES/INVARIANTS/WITNESS compiled into your kernel? If not, could you please do that?
> If that doesn’t give any helpful hints, could you please panic the kernel and dump some debug info from ddb, e.g.,
> 1. alltrace
> 2. show allchains
> 3. show alllocks

I tend to submit only once/if I've done the work to
establish the problems in snapshots or other such
that do not involve my builds. Submitting only based
on my builds is a means of last resort for me. Also,
when a bunch of issues are showing up in the same
time frame, I tend to start with a subset and let
others sit for a time until I get to them. There are
a bunch pending at this point. I'm more willing to
report to the lists based on less information and
a longer time frame to having more information,
largely in case it prompts others that have related
observations or the like.

My recent submittals for cortex-A7 (armv7) kyua run
related panics are:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272965
("armv7 'Alignment Fault' on read panic during udp_input for
kyua's sys/netinet6/exthdr:exthdr ; other udp_input related
panics")

and:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272966
("armv7 Kernel page fault with non-sleepable locks held panic
during in6ifa_ifwithaddr for kyua's sys/netpfil/pf/killstate:v6;
more tests too")

An interesting oddity is that, in my somewhat older
environment, I do not get the udp_input related panics
(272965). I only got those with the snapshot (while
trying to gather information to create the other submital
[272966]).

Michal Meloun has been having me test patches related
to these. But I could only effectively test the 272966
cases in the context I have, given that "interesting
oddity". (Michal also originally thought the 2 reports
were duplicates of each other. But they are not, at
least in my builds.)

I'll note that the "non-sleepable locks" is more of
an identification of context than a cause: it is from
part of the code that handles an alignment abort (that
shows up in the console output somewhat later), or
so I've been told.

It is true that I normally run non-debug builds. But
my normal build procedure builds both ways and I
substitute (install) the relevant debug build (WITNESS
and such included) when I get to the point that I'd
use it to advantage. My testing of Michal's patches
are based on my debug-build context.

===
Mark Millard
marklmi at yahoo.com