Re: A native armv7 panic during kyua runs: sys/netinet6/exthdr:exthdr -> Fatal kernel mode data abort: 'Alignment Fault' on read

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 05 Aug 2023 23:06:29 UTC
On Aug 5, 2023, at 14:40, Mark Millard <marklmi@yahoo.com> wrote:

> On Aug 5, 2023, at 14:04, Mark Millard <marklmi@yahoo.com> wrote:
> 
>> On Aug 5, 2023, at 11:27, Michal Meloun <meloun.michal@gmail.com> wrote:
>> 
>>> Hi Mark,
>>> can you please try a this patch?
>>> https://github.com/strejda/tegra/commit/bd4390c5f6a8b66b2fc83966d4fadb945a19dc23
>> 
>> I'll take a stab at testing it.
>> 
>> But I'll note that description of the patch is somewhat odd:
>> 
>> QUOTE
>> Pack IP structures directly used for access packet data.
>> All structures used to access data in byte buffers shall be marked
>> as packed. Otherwise, this is undefined behavior - formally on
>> every platform.
>> END QUOTE
>> 
>> __packed (and whatever it might be a macro for) is not part of
>> any vintage of the C standard, not even as explicitly
>> implementation defined nor as explicitly undefined. (C23's
>> "attribute specifier sequence" notation use would give an
>> implementation defined status as an understand, but not via
>> explicit identification of the concept of packed in the standard.)
>> As far as the language is concerned, there is no guarantee that
>> a code generator will ensure to break things up into aligned
>> accesses with assembly of the overall value if the members are
>> not aligned in the first place, __packed or not. Nor does the
>> language guarantee pack of padding in the layout for __packed.
>> 
>> Past that, it is toolchain specific if __packed would avoid
>> unaligned accesses for simple member access notation bytes
>> yet also avoid having pad bytes. We will see for this context.
>> (My history suggests a lack of overall uniformity in the
>> interpretations given to declaring struct's as packed --or
>> analogous wording for other languages that are not explicit
>> about it.)
>> 
>>> I'm sorry, but I don't have the time or energy to fully test it... I only hope the actual patch is much easier than the one listed in PR271759.
>> 
> 
> [Older history deleted.]
> 
> No notable change in behavior, I'm afraid . . .
> 
> sys/netinet6/exthdr:exthdr  ->  
> Fatal kernel mode data abort: 'Alignment Fault' on read
> trapframe: 0xc51fbaa0
> FSR=00000001, FAR=daed4476, spsr=60000013
> r0 =daf787c0, r1 =c51fbb34, r2 =00000000, r3 =00000000
> r4 =00000000, r5 =00000000, r6 =daed4476, r7 =daed4466
> r8 =c0965c3c, r9 =00000000, r10=db0e4400, r11=c51fbb60
> r12=00000000, ssp=c51fbb30, slr=c0b524c0, pc =c04e8e78
> 
> panic: Fatal abort
> cpuid = 1
> time = 1691270886
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
>         pc = 0xc0661824  lr = 0xc007db80 (db_trace_self_wrapper+0x30)
>         sp = 0xc51fb858  fp = 0xc51fb970
> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
>         pc = 0xc007db80  lr = 0xc031a834 (vpanic+0x140)
>         sp = 0xc51fb978  fp = 0xc51fb998
>         r4 = 0x00000100  r5 = 0x00000000
>         r6 = 0xc07c5a9a  r7 = 0xc0b36e58
> vpanic() at vpanic+0x140
>         pc = 0xc031a834  lr = 0xc031a6f4 (vpanic)
>         sp = 0xc51fb9a0  fp = 0xc51fb9a4
>         r4 = 0xc51fbaa0  r5 = 0x00000013
>         r6 = 0xdaed4476  r7 = 0x00000001
>         r8 = 0x00000001  r9 = 0xdaf787c0
>        r10 = 0xdaed4476
> vpanic() at vpanic
>         pc = 0xc031a6f4  lr = 0xc0686ddc (abort_align)
>         sp = 0xc51fb9ac  fp = 0xc51fb9d8
>         r4 = 0x00000001  r5 = 0x00000001
>         r6 = 0xdaf787c0  r7 = 0xdaed4476
>         r8 = 0xc51fb9a4  r9 = 0xc031a6f4
>        r10 = 0xc51fb9ac
> abort_align() at abort_align
>         pc = 0xc0686ddc  lr = 0xc0686e50 (abort_align+0x74)
>         sp = 0xc51fb9e0  fp = 0xc51fb9f8
>         r4 = 0x00000013 r10 = 0xdaed4476
> abort_align() at abort_align+0x74
>         pc = 0xc0686e50  lr = 0xc0686aa8 (abort_handler+0x45c)
>         sp = 0xc51fba00  fp = 0xc51fba98
>         r4 = 0x00000000 r10 = 0xdaed4476
> abort_handler() at abort_handler+0x45c
>         pc = 0xc0686aa8  lr = 0xc06640d8 (exception_exit)
>         sp = 0xc51fbaa0  fp = 0xc51fbb60
>         r4 = 0x00000000  r5 = 0x00000000
>         r6 = 0xdaed4476  r7 = 0xdaed4466
>         r8 = 0xc0965c3c  r9 = 0x00000000
>        r10 = 0xdb0e4400
> exception_exit() at exception_exit
>         pc = 0xc06640d8  lr = 0xc0b524c0 (__pcpu+0x200)
>         sp = 0xc51fbb30  fp = 0xc51fbb60
>         r0 = 0xdaf787c0  r1 = 0xc51fbb34
>         r2 = 0x00000000  r3 = 0x00000000
>         r4 = 0x00000000  r5 = 0x00000000
>         r6 = 0xdaed4476  r7 = 0xdaed4466
>         r8 = 0xc0965c3c  r9 = 0x00000000
>        r10 = 0xdb0e4400 r12 = 0x00000000
> in6ifa_ifwithaddr() at in6ifa_ifwithaddr+0x30
>         pc = 0xc04e8e78  lr = 0xc04fb338 (ip6_input+0xd38)
>         sp = 0xc51fbb68  fp = 0xc51fbc28
>         r4 = 0xdaed4476  r5 = 0xdaed445e
>         r6 = 0x00000000  r7 = 0xdaed4466
> ip6_input() at ip6_input+0xd38
>         pc = 0xc04fb338  lr = 0xc046d66c (netisr_dispatch_src+0xf8)
>         sp = 0xc51fbc30  fp = 0xc51fbc58
>         r4 = 0xdaed4400  r5 = 0x00000006
>         r6 = 0x00000001  r7 = 0xc0b4dd50
>         r8 = 0xdaf9d900  r9 = 0xdaed4400
>        r10 = 0x00000086
> netisr_dispatch_src() at netisr_dispatch_src+0xf8
>         pc = 0xc046d66c  lr = 0xc04641b0 (ether_demux+0x18c)
>         sp = 0xc51fbc60  fp = 0xc51fbc78
>         r4 = 0x00000006  r5 = 0x00001201
>         r6 = 0xdb0e4400  r7 = 0x000000ff
>         r8 = 0xdaf9d900  r9 = 0xdaed4400
>        r10 = 0x00000086
> ether_demux() at ether_demux+0x18c
>         pc = 0xc04641b0  lr = 0xc0465880 (ether_nh_input+0x490)
>         sp = 0xc51fbc80  fp = 0xc51fbce0
>         r4 = 0xdb0e4400  r5 = 0xdaed4400
>         r6 = 0xdaed4450 r10 = 0x00000086
> ether_nh_input() at ether_nh_input+0x490
>         pc = 0xc0465880  lr = 0xc046d66c (netisr_dispatch_src+0xf8)
>         sp = 0xc51fbce8  fp = 0xc51fbd10
>         r4 = 0xdaed4400  r5 = 0x00000005
>         r6 = 0x00000003  r7 = 0xc0b4dd30
>         r8 = 0xdaf9d900  r9 = 0xdaed4400
>        r10 = 0xc098f58f
> netisr_dispatch_src() at netisr_dispatch_src+0xf8
>         pc = 0xc046d66c  lr = 0xc04645c4 (ether_input+0x50)
>         sp = 0xc51fbd18  fp = 0xc51fbd48
>         r4 = 0xdaed4400  r5 = 0x00000000
>         r6 = 0x00008803  r7 = 0x00000000
>         r8 = 0xdaf9d900  r9 = 0xdaed4400
>        r10 = 0xc098f58f
> ether_input() at ether_input+0x50
>         pc = 0xc04645c4  lr = 0xdffb3f08 ($a.10+0x108)
>         sp = 0xc51fbd50  fp = 0xc51fbd78
>         r4 = 0xdb0e4400  r5 = 0xdaff4000
>         r6 = 0xdaff4010  r7 = 0x00000000
>         r8 = 0x00000000 r10 = 0xc098f58f
> $a.10() at $a.10+0x108
>         pc = 0xdffb3f08  lr = 0xc038cb2c (taskqueue_run_locked+0x1c4)
>         sp = 0xc51fbd80  fp = 0xc51fbdd8
>         r4 = 0xdaff2100  r5 = 0xdaff402c
>         r6 = 0xdaff2150  r7 = 0x00000001
>         r8 = 0x00000000  r9 = 0xc51fbd90
>        r10 = 0x00000001
> taskqueue_run_locked() at taskqueue_run_locked+0x1c4
>         pc = 0xc038cb2c  lr = 0xc038e4e4 (taskqueue_thread_loop+0x1b0)
>         sp = 0xc51fbde0  fp = 0xc51fbe10
>         r4 = 0xdaff2100  r5 = 0xdaff2140
>         r6 = 0xc07b18c4  r7 = 0x00000000
>         r8 = 0xc098f58f  r9 = 0x00000100
>        r10 = 0xc0b268a0
> taskqueue_thread_loop() at taskqueue_thread_loop+0x1b0
>         pc = 0xc038e4e4  lr = 0xc02cdf0c (fork_exit+0xc0)
>         sp = 0xc51fbe18  fp = 0xc51fbe38
>         r4 = 0xdaf787c0  r5 = 0xc0b264e0
>         r6 = 0xc038e334  r7 = 0xdffc4f54
>         r8 = 0xc51fbe40  r9 = 0xc098f591
> fork_exit() at fork_exit+0xc0
>         pc = 0xc02cdf0c  lr = 0xc066406c (swi_exit)
>         sp = 0xc51fbe40  fp = 0x00000000
>         r4 = 0xc038e334  r5 = 0xdffc4f54
>         r6 = 0xc0b48d84  r7 = 0xd73bd3e0
>         r8 = 0x00000001 r10 = 0xc0b268a0
> swi_exit() at swi_exit
>         pc = 0xc066406c  lr = 0xc066406c (swi_exit)
>         sp = 0xc51fbe40  fp = 0x00000000
> KDB: enter: panic
> [ thread pid 0 tid 100226 ]
> 
> 
> I've just restored the sources involved but still have
> the .diff I that got via github.
> 

By the way, your request lead me to looking at my
material in:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271759

some more. I've concluded it is not obvious to me if my
OrangePi+2Ed comments end up being evidence of an
independent issue vs. evidence of there being more than
potential if_ure issues involved. I ended up adding
Comment 35:


Hmm. Getting the problem on the built-in Ethernet did not
involve if_ure from what I can tell, looking at the dmsg -a
output:

. . .
awg0: <Allwinner Gigabit Ethernet> mem 0x1c30000-0x1c3ffff irq 27 on simplebus0
miibus0: <MII bus> on awg0
rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 0 on miibus0
rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
rgephy1: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0
rgephy1: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
. . .
Autoloading module: if_ure
ure0 on uhub6
ure0: <Realtek USB 10/100/1000 LAN, class 0/0, rev 2.10/30.00, addr 2> on usbus7
miibus1: <MII bus> on ure0
rgephy2: <RTL8251/8153 1000BASE-T media interface> PHY 0 on miibus1
rgephy2: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto
ue0: <USB Ethernet> on ure0

if_ure would only be involved with the dongle (ure0), for which no
Ethernet cable was attached.

So I'm unsure if what I've reported for the OrangePi+2Ed context overall
is:

A) An independent problem.
vs.
B) Evidence that the original problem involves more than just potential
   if_ure issues.


===
Mark Millard
marklmi at yahoo.com