Re: A native armv7 panic during kyua runs: sys/netinet6/exthdr:exthdr -> Fatal kernel mode data abort: 'Alignment Fault' on read

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 05 Aug 2023 21:04:41 UTC
On Aug 5, 2023, at 11:27, Michal Meloun <meloun.michal@gmail.com> wrote:

> Hi Mark,
> can you please try a this patch?
> https://github.com/strejda/tegra/commit/bd4390c5f6a8b66b2fc83966d4fadb945a19dc23

I'll take a stab at testing it.

But I'll note that description of the patch is somewhat odd:

QUOTE
Pack IP structures directly used for access packet data.
All structures used to access data in byte buffers shall be marked
as packed. Otherwise, this is undefined behavior - formally on
every platform.
END QUOTE

__packed (and whatever it might be a macro for) is not part of
any vintage of the C standard, not even as explicitly
implementation defined nor as explicitly undefined. (C23's
"attribute specifier sequence" notation use would give an
implementation defined status as an understand, but not via
explicit identification of the concept of packed in the standard.)
As far as the language is concerned, there is no guarantee that
a code generator will ensure to break things up into aligned
accesses with assembly of the overall value if the members are
not aligned in the first place, __packed or not. Nor does the
language guarantee pack of padding in the layout for __packed.

Past that, it is toolchain specific if __packed would avoid
unaligned accesses for simple member access notation bytes
yet also avoid having pad bytes. We will see for this context.
(My history suggests a lack of overall uniformity in the
interpretations given to declaring struct's as packed --or
analogous wording for other languages that are not explicit
about it.)

> I'm sorry, but I don't have the time or energy to fully test it... I only hope the actual patch is much easier than the one listed in PR271759.





> On 05.08.2023 8:11, Mark Millard wrote:
>> On Aug 4, 2023, at 20:58, Warner Losh <imp@bsdimp.com> wrote:
>>> It might make sense to work up a patch that skips this test on armv7 after filing a bug (the usual way)....
>>> 
>>> Warner
>> Actually, looking at the backtrace, it seems I've previously
>> listed the same sort of backtrace structure in:
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271759
>> comment 12. Hans Petter Selasky had been working on that
>> bugzilla entry. I'll add a note that this time I got it
>> with the built-in EtherNet instead of the dongle used
>> previously --and that sys/netinet6/exthdr:exthdr is a
>> way of producing the panic. [Done.]
>> In /usr/main-src/tests/sys/netinet6/exthdr.sh , commenting
>> out one line would disable the specific test (leading
>> whitespace might not be preserved below):
>> atf_init_test_cases()
>> {
>> #        atf_add_test_case "exthdr"
>> }
>> [FYI: All my kyua activity has been for FreeBSD main,
>> generally targeting contexts with some armv7 code
>> involved. It is associated with my having been an
>> tester of early lib32 drafts.]
>> I already have another commented out line for an armv7
>> panic (leading whitespace might not be preserved):
>> # git -C /usr/main-src/ diff tests/sys/net/
>> diff --git a/tests/sys/net/if_bridge_test.sh b/tests/sys/net/if_bridge_test.sh
>> index eb3a792df449..dcdac75103cd 100755
>> --- a/tests/sys/net/if_bridge_test.sh
>> +++ b/tests/sys/net/if_bridge_test.sh
>> @@ -675,7 +675,7 @@ atf_init_test_cases()
>>         atf_add_test_case "delete_with_members"
>>         atf_add_test_case "mac_conflict"
>>         atf_add_test_case "stp_validation"
>> -       atf_add_test_case "gif"
>> +#      atf_add_test_case "gif"
>>         atf_add_test_case "mtu"
>>         atf_add_test_case "vlan"
>>  }
>> In the original discovery, having if_bridge.ko already loaded was
>> important to getting the "gif" panic.
>> But I've not yet put effort into isolating a cleaner/simpler test
>> than I got the failure with. Nor have a done a range of comparisons
>> of differing contexts yet.
>> There are other armv7 related issues, one in particular
>> being:
>> A) All the long timeouts [300s+] are for *.py style tests. (Lots of
>>    these.)
>> B) All the *.py style tests that do not have long timeout have one of:
>>  ->  skipped: comment me to run the test
>>  ->  skipped: Current architecture 'armv7' not supported
>> __test_cases_list__  ->  broken: Test program did not exit cleanly
>> __test_cases_list__  ->  broken: Test case list wrote to stderr
>> The are about 10 of the "comment me" ones and 1 each of the other
>> (B) ones, if I remember right.
>> In other words, basically all the *.py based tests are broken or
>> skipped as kyua classifies things.
>> I've no clue yet if (A) is tied to the ports':
>> cryptography/hazmat/bindings/_openssl.abi3.so
>> openssl 3 incompatibility or not. But I've only seen the
>> issue in armv7 contexts so far.
>> I've spent time today on this issue but have made no progress
>> on identifying what leads to the kdump/truss-output being as
>> it is.
>> If the *.py tests were working, I'd not be surprised to then
>> find more armv7 panics than is now possible via the kyua tests.
>>> On Fri, Aug 4, 2023 at 12:59 AM Mark Millard <marklmi@yahoo.com> wrote:
>>> While discovered via an attempted overall kyua run, the following is
>>> sufficient to get the crash in my native armv7 context:
>>> 
>>> # /usr/bin/kyua test -k /usr/tests/Kyuafile sys/netinet6/exthdr:exthdr
>>> sys/netinet6/exthdr:exthdr  ->  Fatal kernel mode data abort: 'Alignment Fault' on read
>>> trapframe: 0xdfb97aa0
>>> FSR=00000001, FAR=db43ab76, spsr=60000013
>>> r0 =dfedd000, r1 =dfb97b34, r2 =00000000, r3 =00000000
>>> r4 =00000000, r5 =00000000, r6 =db43ab76, r7 =db43ab66
>>> r8 =c096383c, r9 =00000000, r10=db132400, r11=dfb97b60
>>> r12=00000000, ssp=dfb97b30, slr=c0b4e2c0, pc =c04e6b70
>>> 
>>> panic: Fatal abort
>>> cpuid = 0
>>> time = 1691131498
>>> KDB: stack backtrace:
>>> db_trace_self() at db_trace_self
>>>          pc = 0xc065f414  lr = 0xc007db80 (db_trace_self_wrapper+0x30)
>>>          sp = 0xdfb97858  fp = 0xdfb97970
>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
>>>          pc = 0xc007db80  lr = 0xc031a834 (vpanic+0x140)
>>>          sp = 0xdfb97978  fp = 0xdfb97998
>>>          r4 = 0x00000100  r5 = 0x00000000
>>>          r6 = 0xc07c369a  r7 = 0xc0b32e58
>>> vpanic() at vpanic+0x140
>>>          pc = 0xc031a834  lr = 0xc031a6f4 (vpanic)
>>>          sp = 0xdfb979a0  fp = 0xdfb979a4
>>>          r4 = 0xdfb97aa0  r5 = 0x00000013
>>>          r6 = 0xdb43ab76  r7 = 0x00000001
>>>          r8 = 0x00000001  r9 = 0xdfedd000
>>>         r10 = 0xdb43ab76
>>> vpanic() at vpanic
>>>          pc = 0xc031a6f4  lr = 0xc06849dc (abort_align)
>>>          sp = 0xdfb979ac  fp = 0xdfb979d8
>>>          r4 = 0x00000001  r5 = 0x00000001
>>>          r6 = 0xdfedd000  r7 = 0xdb43ab76
>>>          r8 = 0xdfb979a4  r9 = 0xc031a6f4
>>>         r10 = 0xdfb979ac
>>> abort_align() at abort_align
>>>          pc = 0xc06849dc  lr = 0xc0684a50 (abort_align+0x74)
>>>          sp = 0xdfb979e0  fp = 0xdfb979f8
>>>          r4 = 0x00000013 r10 = 0xdb43ab76
>>> abort_align() at abort_align+0x74
>>>          pc = 0xc0684a50  lr = 0xc06846a8 (abort_handler+0x45c)
>>>          sp = 0xdfb97a00  fp = 0xdfb97a98
>>>          r4 = 0x00000000 r10 = 0xdb43ab76
>>> abort_handler() at abort_handler+0x45c
>>>          pc = 0xc06846a8  lr = 0xc0661cc8 (exception_exit)
>>>          sp = 0xdfb97aa0  fp = 0xdfb97b60
>>>          r4 = 0x00000000  r5 = 0x00000000
>>>          r6 = 0xdb43ab76  r7 = 0xdb43ab66
>>>          r8 = 0xc096383c  r9 = 0x00000000
>>>         r10 = 0xdb132400
>>> exception_exit() at exception_exit
>>>          pc = 0xc0661cc8  lr = 0xc0b4e2c0 (__pcpu)
>>>          sp = 0xdfb97b30  fp = 0xdfb97b60
>>>          r0 = 0xdfedd000  r1 = 0xdfb97b34
>>>          r2 = 0x00000000  r3 = 0x00000000
>>>          r4 = 0x00000000  r5 = 0x00000000
>>>          r6 = 0xdb43ab76  r7 = 0xdb43ab66
>>>          r8 = 0xc096383c  r9 = 0x00000000
>>>         r10 = 0xdb132400 r12 = 0x00000000
>>> in6ifa_ifwithaddr() at in6ifa_ifwithaddr+0x30
>>>          pc = 0xc04e6b70  lr = 0xc04f9030 (ip6_input+0xd38)
>>>          sp = 0xdfb97b68  fp = 0xdfb97c28
>>>          r4 = 0xdb43ab76  r5 = 0xdb43ab5e
>>>          r6 = 0x00000000  r7 = 0xdb43ab66
>>> ip6_input() at ip6_input+0xd38
>>>          pc = 0xc04f9030  lr = 0xc046d66c (netisr_dispatch_src+0xf8)
>>>          sp = 0xdfb97c30  fp = 0xdfb97c58
>>>          r4 = 0xdb43ab00  r5 = 0x00000006
>>>          r6 = 0x00000007  r7 = 0xc0b49d50
>>>          r8 = 0xdafea0c0  r9 = 0xdb43ab00
>>>         r10 = 0x00000086
>>> netisr_dispatch_src() at netisr_dispatch_src+0xf8
>>>          pc = 0xc046d66c  lr = 0xc04641b0 (ether_demux+0x18c)
>>>          sp = 0xdfb97c60  fp = 0xdfb97c78
>>>          r4 = 0x00000006  r5 = 0x00001201
>>>          r6 = 0xdb132400  r7 = 0x000000ff
>>>          r8 = 0xdafea0c0  r9 = 0xdb43ab00
>>>         r10 = 0x00000086
>>> ether_demux() at ether_demux+0x18c
>>>          pc = 0xc04641b0  lr = 0xc0465880 (ether_nh_input+0x490)
>>>          sp = 0xdfb97c80  fp = 0xdfb97ce0
>>>          r4 = 0xdb132400  r5 = 0xdb43ab00
>>>          r6 = 0xdb43ab50 r10 = 0x00000086
>>> ether_nh_input() at ether_nh_input+0x490
>>>          pc = 0xc0465880  lr = 0xc046d66c (netisr_dispatch_src+0xf8)
>>>          sp = 0xdfb97ce8  fp = 0xdfb97d10
>>>          r4 = 0xdb43ab00  r5 = 0x00000005
>>>          r6 = 0x0000000c  r7 = 0xc0b49d30
>>>          r8 = 0xdafea0c0  r9 = 0xdb43ab00
>>>         r10 = 0xc098d18f
>>> netisr_dispatch_src() at netisr_dispatch_src+0xf8
>>>          pc = 0xc046d66c  lr = 0xc04645c4 (ether_input+0x50)
>>>          sp = 0xdfb97d18  fp = 0xdfb97d48
>>>          r4 = 0xdb43ab00  r5 = 0x00000000
>>>          r6 = 0x00008803  r7 = 0x00000000
>>>          r8 = 0xdafea0c0  r9 = 0xdb43ab00
>>>         r10 = 0xc098d18f
>>> ether_input() at ether_input+0x50
>>>          pc = 0xc04645c4  lr = 0xdffb3f08 ($a.10+0x108)
>>>          sp = 0xdfb97d50  fp = 0xdfb97d78
>>>          r4 = 0xdb132400  r5 = 0xdaff8b00
>>>          r6 = 0xdaff8b10  r7 = 0x00000000
>>>          r8 = 0x00000000 r10 = 0xc098d18f
>>> $a.10() at $a.10+0x108
>>>          pc = 0xdffb3f08  lr = 0xc038cb2c (taskqueue_run_locked+0x1c4)
>>>          sp = 0xdfb97d80  fp = 0xdfb97dd8
>>>          r4 = 0xe0145100  r5 = 0xdaff8b2c
>>>          r6 = 0xe0145150  r7 = 0x00000001
>>>          r8 = 0x00000000  r9 = 0xdfb97d90
>>>         r10 = 0x00000001
>>> taskqueue_run_locked() at taskqueue_run_locked+0x1c4
>>>          pc = 0xc038cb2c  lr = 0xc038e4e4 (taskqueue_thread_loop+0x1b0)
>>>          sp = 0xdfb97de0  fp = 0xdfb97e10
>>>          r4 = 0xe0145100  r5 = 0xe0145140
>>>          r6 = 0xc07af4c4  r7 = 0x00000000
>>>          r8 = 0xc098d18f  r9 = 0x00000100
>>>         r10 = 0xc0b228a0
>>> taskqueue_thread_loop() at taskqueue_thread_loop+0x1b0
>>>          pc = 0xc038e4e4  lr = 0xc02cdf0c (fork_exit+0xc0)
>>>          sp = 0xdfb97e18  fp = 0xdfb97e38
>>>          r4 = 0xdfedd000  r5 = 0xc0b224e0
>>>          r6 = 0xc038e334  r7 = 0xdffc4f54
>>>          r8 = 0xdfb97e40  r9 = 0xc098d191
>>> fork_exit() at fork_exit+0xc0
>>>          pc = 0xc02cdf0c  lr = 0xc0661c5c (swi_exit)
>>>          sp = 0xdfb97e40  fp = 0x00000000
>>>          r4 = 0xc038e334  r5 = 0xdffc4f54
>>>          r6 = 0xc0b45d84  r7 = 0xd73bcba0
>>>          r8 = 0x00000001 r10 = 0xc0b228a0
>>> swi_exit() at swi_exit
>>>          pc = 0xc0661c5c  lr = 0xc0661c5c (swi_exit)
>>>          sp = 0xdfb97e40  fp = 0x00000000
>>> KDB: enter: panic
>>> [ thread pid 0 tid 100230 ]
>>> 
>>> For reference:
>>> 
>>> # uname -apKU
>>> FreeBSD OPiP2E-RPi2v1p1 14.0-CURRENT FreeBSD 14.0-CURRENT armv7 1400093 #6 main-n264334-215bab7924f6-dirty: Tue Jul 25 23:11:39 PDT 2023     root@CA72-16Gp-ZFS:/usr/obj/BUILDs/main-CA7-nodbg-clang/usr/main-src/arm.armv7/sys/GENERIC-NODBG-CA7 arm armv7 1400093 1400093
>>> 
>>> The OrangePi+ 2Ed was the type of system booted and tested.
>>> 


===
Mark Millard
marklmi at yahoo.com