Panic mounting root on BeagleBone Black

Thu Sep 12 15:44:24 UTC 2013

On Sep 12, 2013, at 8:55 AM, Ian Lepore wrote:

> On Wed, 2013-09-11 at 06:43 -0700, Tim Kientzle wrote:
>> Just built a new image for BBB from SVN r255438.
>> 
>> At the second boot, I got this:
>> 
>> Mounting local file systems:.
>> mmcsd0: Error indicated: 1 Timeout
>> g_vfs_done():mmcsd0s2a[READ(offset=2016903168, length=4096)]error = 5
>> vnode_pager_getpages: I/O read error
>> vm_fault: pager read error, pid 126 (ps)
>> mmcsd0: Error indicated: 1 Timeout
>> g_vfs_done():mmcsd0s2a[READ(offset=131072, length=32768)]error = 5
>> sdhci_ti0-slot0: Got data interrupt 0x00000010, but there is no active command.
>> sdhci_ti0-slot0: ============== REGISTER DUMP ==============
>> sdhci_ti0-slot0: Sys addr: 0x00000000 | Version:  0x00003101
>> sdhci_ti0-slot0: Blk size: 0x00000200 | Blk cnt:  0x00000010
>> sdhci_ti0-slot0: Argument: 0x0024679e | Trn mode: 0x0000193a
>> sdhci_ti0-slot0: Present:  0x01f70000 | Host ctl: 0x00000006
>> sdhci_ti0-slot0: Power:    0x0000000d | Blk gap:  0x00000000
>> sdhci_ti0-slot0: Wake-up:  0x00000000 | Clock:    0x00000007
>> sdhci_ti0-slot0: Timeout:  0x0000000d | Int stat: 0x00000000
>> sdhci_ti0-slot0: Int enab: 0x017f00fb | Sig enab: 0x017f00fb
>> sdhci_ti0-slot0: AC12 err: 0x00000000 | Slot int: 0x00000000
>> sdhci_ti0-slot0: Caps:     0x06e10080 | Max curr: 0x00000000
>> sdhci_ti0-slot0: ===========================================
>> 
>> …. few more similar messages, then ….
>> 
>> mmcsd0: Error indicated: 1 Timeout
>> g_vfs_done():mmcsd0s2a[WRITE(offset=20808192, length=512)]error = 5
>> g_vfs_done():mmcsd0s2a[WRITE(offset=1276346368, length=24576)]error = 5
>> panic: brelse: inappropriate B_PAGING or B_CLUSTER bp 0xcd148778
>> [bt snipped]
>> 
> 
> This was a single occurance, right?  Like you're not dead in the water
> or anything?
> 
> There's insanity in that info... the register dump shows a multi-block
> write (8kbytes) was set up, but the command that timed out was a read.
> If a prior write had timed out why isn't there a g_vfs_done() error
> logged for it?
> 
> I think what we really need is some better error recovery in the mmc and
> sd layers.  Retrying a failed IO is cheap and easy.  More complex
> recovery is possible too (power cycling and re-intializing the card
> and/or controller).  But that has its own difficulties -- what if the
> nature of the problem was that the user swapped cards? -- you don't want
> to retry a write under those conditions.

I'd disagree with this...  Retrying often is the wrong thing to do. If the write didn't work the first time, why would it work the second? Looks like a programming bug here in controlling the sdhci controller since we got errors, then we got an interrupt with no pending commands. This suggests that our timeout isn't quite right...

Warner