arm SMP on Cortex-A15
Wojciech Macek
wma at semihalf.com
Mon Mar 24 13:38:09 UTC 2014
Without the unconditional invalidation, the panic shows up just at the
beginning, after rootfs is mounted and init scripts are running. When a
userspace process is exitting, its memory resources are freed - this is the
moment pmap_remove_pages fails due to tharanslation fault. It is the
"typical" crash I observed when TLB-cache holds an old entry. Below there
is a backtrace, but I doubt if it can be helpful.
Regarding old pte/tlb, the TLB cache contains entry from old process
context, when in-memory-PTE value is already correct - at least this was
the scenario when I debugged it last year. So, invalidating after *pte=0 is
definitely not our case. The issue shows up only on a15, where the
tlb-prefetcher can cache pte entries anytime.
I believe I don't have r263251 integrated. I'll give it a try - typically,
the tlb-caused crash appears only on pages containing shared libraries code
(with executable attr), so there is a chance Olivier's fix help.
The fault:
vm_fault(0xc5b894f0, 0, 2, 0) -> 1
Fatal kernel mode data abort: 'Translation Fault (P)'
trapframe: 0xef2cca40
FSR=00000817, FAR=00000030, spsr=60000013
r0 =00000000, r1 =c320a048, r2 =00000000, r3 =c3208074
r4 =c5b7cd08, r5 =c5b7cd04, r6 =c5b05800, r7 =c5b895ac
r8 =c320a044, r9 =fffffffe, r10=c5b895ac, r11=ef2ccae0
r12=00000000, ssp=ef2cca90, slr=c0604148, pc =c0628a60
[ thread pid 83 tid 100050 ]
Stopped at pmap_remove_pages+0x270: streq r3, [r0, #0x030]
db> bt
Tracing pid 83 tid 100050 td 0xc5bc4320
db_trace_self() at db_trace_self
pc = 0xc061f62c lr = 0xc024ddbc (db_hex2dec+0x498)
sp = 0xef2cc738 fp = 0xef2cc750
r10 = 0xc0708270
db_hex2dec() at db_hex2dec+0x498
pc = 0xc024ddbc lr = 0xc024d76c (db_command_loop+0x2f0)
sp = 0xef2cc758 fp = 0xef2cc7f8
r4 = 0x00000000 r5 = 0x00000000
r6 = 0xc0695cf1
db_command_loop() at db_command_loop+0x2f0
pc = 0xc024d76c lr = 0xc024d4dc (db_command_loop+0x60)
sp = 0xef2cc800 fp = 0xef2cc810
r4 = 0xc0666f88 r5 = 0xc067b997
r6 = 0xc0752954 r7 = 0xc0748f80
r8 = 0xef2cca40 r9 = 0xc07084e0
r10 = 0xc0748f84
db_command_loop() at db_command_loop+0x60
pc = 0xc024d4dc lr = 0xc024ffb8 (X_db_symbol_values+0x254)
sp = 0xef2cc818 fp = 0xef2cc938
r4 = 0x00000000 r5 = 0xef2cc820
r6 = 0xc0748fb0
X_db_symbol_values() at X_db_symbol_values+0x254
pc = 0xc024ffb8 lr = 0xc0430554 (kdb_trap+0x164)
sp = 0xef2cc940 fp = 0xef2cc968
r4 = 0x00000000 r5 = 0x00000817
r6 = 0xc0748fb0 r7 = 0xc0748f80
kdb_trap() at kdb_trap+0x164
pc = 0xc0430554 lr = 0xc0632ef0 (data_abort_handler+0x7dc)
sp = 0xef2cc970 fp = 0xef2cc988
r4 = 0xef2cca40 r5 = 0x600000d3
r6 = 0x00000030 r7 = 0x00000817
r8 = 0xc5b894f0 r9 = 0x00000001
r10 = 0xef2cca40
data_abort_handler() at data_abort_handler+0x7dc
pc = 0xc0632ef0 lr = 0xc0632cc0 (data_abort_handler+0x5ac)
sp = 0xef2cc990 fp = 0xef2cca38
r4 = 0x00000817 r5 = 0xc5bc4320
r6 = 0xc5a47a0c r7 = 0x00000004
data_abort_handler() at data_abort_handler+0x5ac
pc = 0xc0632cc0 lr = 0xc0621214 (exception_exit)
sp = 0xef2cca40 fp = 0xef2ccae0
r4 = 0xc5b7cd08 r5 = 0xc5b7cd04
r6 = 0xc5b05800 r7 = 0xc5b895ac
r8 = 0xc320a044 r9 = 0xfffffffe
r10 = 0xc5b895ac
exception_exit() at exception_exit
pc = 0xc0621214 lr = 0xc0604148 (PHYS_TO_VM_PAGE+0x48)
sp = 0xef2cca94 fp = 0xef2ccae0
r0 = 0x00000000 r1 = 0xc320a048
r2 = 0x00000000 r3 = 0xc3208074
r4 = 0xc5b7cd08 r5 = 0xc5b7cd04
r6 = 0xc5b05800 r7 = 0xc5b895ac
r8 = 0xc320a044 r9 = 0xfffffffe
r10 = 0xc5b895ac r12 = 0x00000000
pmap_remove_pages() at pmap_remove_pages+0x270
pc = 0xc0628a60 lr = 0xc05f2d08 (vmspace_exit+0xd8)
sp = 0xef2ccae8 fp = 0xef2ccb10
r4 = 0xc5b895a8 r5 = 0xc5bc4320
r6 = 0x00000001 r7 = 0xc5a47960
r8 = 0xc5b895ac r9 = 0xc5b894f0
r10 = 0xc0753be0
vmspace_exit() at vmspace_exit+0xd8
pc = 0xc05f2d08 lr = 0xc03a7348 (exit1+0x930)
sp = 0xef2ccb18 fp = 0xef2ccb70
r4 = 0xc5a479fc r5 = 0x00000004
r6 = 0xc583861c r7 = 0x00000001
r8 = 0xc5a47960 r9 = 0xc5bc4320
r10 = 0xc5a47a0c
exit1() at exit1+0x930
pc = 0xc03a7348 lr = 0xc03f1604 (sigexit+0x8c4)
sp = 0xef2ccb78 fp = 0xef2ccd68
r4 = 0x00000002 r5 = 0xc5bc4320
r6 = 0xc5a47960 r7 = 0xc5a47a0c
r8 = 0xc5bc4320 r9 = 0xc5b7a000
r10 = 0x00000002
sigexit() at sigexit+0x8c4
pc = 0xc03f1604 lr = 0xc03f23a0 (postsig+0x39c)
sp = 0xef2ccd70 fp = 0xef2cce18
r4 = 0x00000001 r5 = 0xc5bc4320
r6 = 0xc5a47960 r7 = 0xc5b7aab8
r8 = 0xc5a47a0c r9 = 0xc5b7a000
r10 = 0x00000002
postsig() at postsig+0x39c
pc = 0xc03f23a0 lr = 0xc044388c (ast+0x4f4)
sp = 0xef2cce20 fp = 0xef2cce58
r4 = 0x00000001 r5 = 0xc5bc4320
r6 = 0xc5a47960 r7 = 0xc5a47a0c
r8 = 0xc5a47a0c r9 = 0x01020804
r10 = 0x00000ab8
ast() at ast+0x4f4
pc = 0xc044388c lr = 0xc0621080 (swi_entry+0x6c)
sp = 0xef2cce60 fp = 0xbfffe438
r4 = 0x40000013 r5 = 0xc5bc4320
r6 = 0x00000001 r7 = 0x00000154
r8 = 0x20037008 r9 = 0xbfffee5c
r10 = 0xbfffea10
swi_entry() at swi_entry+0x6c
pc = 0xc0621080 lr = 0xc0621080 (swi_entry+0x6c)
sp = 0xef2cce60 fp = 0xbfffe438
Unable to unwind further
db>
2014-03-22 14:20 GMT+01:00 Ian Lepore <ian at freebsd.org>:
> On Fri, 2014-03-21 at 07:20 +0100, Wojciech Macek wrote:
> > No, changing flushD to flushID did not make any difference, but I think
> it
> > should be there - D-only flushing might not be sufficient.
> >
>
> Olivier reminded me right after I posted that: last week I made a change
> to cpufunc.c that makes flushD and flushID the same. So of course it
> made no difference. :) It really should be flushID though, in case
> that ever changes.
>
> You didn't say whether you have that change, which was r263251.
>
> > Currently, I'm running pmap_kernel_internal attached below. It is doing
> > unconditional flushID at the end, just like the old comment was saying :)
> > SMP seems to be stable.
> >
>
> That seems to say that somehow there is a valid TLB entry even though
> the old pte for that entry is zero. That means there's a problem
> somewhere else in the code, but I don't see it. It looks to me like we
> do a TLB flush everywhere that we zero out a pte.
>
> You said without the unconditional flush it panics at startup. Where in
> startup? Early, or after init is launched or what? Where does the
> panic backtrace to?
>
> If we've got some other pte/tlb maintenance problem, I'd hate to hide it
> with this unconditional flush and have it appear as some other problem
> later that will be even harder to track down.
>
> -- Ian
>
>
>
More information about the freebsd-arm
mailing list