[P2020] Infinite EXC_ISI on executing /sbin/init
Marcel Moolenaar
marcelm at juniper.net
Sat Jun 2 15:45:48 UTC 2012
All,
On a P2020 system (kernel configured without SMP -- see other email) we loose
forward progress due to a TLB issue. In a nutshell, this is what I'm seeing:
1. The kernel exits to execute the very first instruction of /sbin/init.
2. assumption: the kernel gets a TLB miss exception.
3. assumption: the miss cannot be handled so a fake TLB is created to
trigger an ISI.
4. The kernel gets an ISI and calls vm_fault().
The contents of TLB0 WRT to the process is:
125: ( ) [AS=0] sz = 0x00001000 tsz = 1 tid = 2 mas1 = 0x00020100 mas2(va) = 0x7fffd004 mas3(pa) = 0x01ebe03f
253: ( ) [AS=0] sz = 0x00001000 tsz = 1 tid = 3 mas1 = 0x00030100 mas2(va) = 0x7fffd004 mas3(pa) = 0xffff0000
380: ( ) [AS=0] sz = 0x00001000 tsz = 1 tid = 3 mas1 = 0x00030100 mas2(va) = 0x7fffc004 mas3(pa) = 0xffff0000
381: (V ) [AS=0] sz = 0x00001000 tsz = 1 tid = 3 mas1 = 0x80030100 mas2(va) = 0x7fffd004 mas3(pa) = 0x01ed300f
508: (V ) [AS=0] sz = 0x00001000 tsz = 1 tid = 3 mas1 = 0x80030100 mas2(va) = 0x7fffc004 mas3(pa) = 0x01ed400f
I don't see the fake TLB entry for init's entry point (0x0180000) so I'm
not sure (3) above happened.
5. mmu_booke_enter() is called, which flushes the TLB (i.e. removes the
fake entry and adds the real one to the PMAP's page tables.
6. assumptipn: the kernel exists from the ISI trap and gets a TLB miss
exception.
7. normally this can be handled and everything is fine, except what I'm
seeing is that the kernel gets another ISI -- so it looks we're back at
point 3. The TLB contents on second and subsequent ISI exceptions is
effectively the same as given at (4) above:
253: ( ) [AS=0] sz = 0x00001000 tsz = 1 tid = 3 mas1 = 0x00030100 mas2(va) = 0x7fffd004 mas3(pa) = 0xffff0000
380: ( ) [AS=0] sz = 0x00001000 tsz = 1 tid = 3 mas1 = 0x00030100 mas2(va) = 0x7fffc004 mas3(pa) = 0xffff0000
381: (V ) [AS=0] sz = 0x00001000 tsz = 1 tid = 3 mas1 = 0x80030100 mas2(va) = 0x7fffd004 mas3(pa) = 0x01ed300f
508: (V ) [AS=0] sz = 0x00001000 tsz = 1 tid = 3 mas1 = 0x80030100 mas2(va) = 0x7fffc004 mas3(pa) = 0x01ed400f
Questions:
1. Why don't I see the fake TLB 0 entry for init's entry point?
2. Assuming we're not looking at a TLB miss, what else can cause
the ISI? The RM states that endianness can be another reason
for the ISI, but I don't see anything wrong there.
BTW: I already looked at the I-cache synchronization logic and tweaked it.
No change. I also revisited the I-cache & D-cache enable & invalidate code
and tweaked that too. No change.
In short: I'm running out of ideas.
Could this be related to the other P2020 issue I described:
[P2020] FreeBSD cannot enable 2nd core.
They're both pretty weird and together could indicate some hardware
problem, right?
Then again: A FreeBSD 6.1 derived version boots at least UP...
--
Marcel Moolenaar
marcelm at juniper.net
More information about the freebsd-ppc
mailing list