Panic mounting root on BeagleBone Black
Ian Lepore
ian at FreeBSD.org
Thu Sep 12 15:53:40 UTC 2013
On Thu, 2013-09-12 at 09:44 -0600, Warner Losh wrote:
> On Sep 12, 2013, at 8:55 AM, Ian Lepore wrote:
>
> > On Wed, 2013-09-11 at 06:43 -0700, Tim Kientzle wrote:
> >> Just built a new image for BBB from SVN r255438.
> >>
> >> At the second boot, I got this:
> >>
> >> Mounting local file systems:.
> >> mmcsd0: Error indicated: 1 Timeout
> >> g_vfs_done():mmcsd0s2a[READ(offset=2016903168, length=4096)]error = 5
> >> vnode_pager_getpages: I/O read error
> >> vm_fault: pager read error, pid 126 (ps)
> >> mmcsd0: Error indicated: 1 Timeout
> >> g_vfs_done():mmcsd0s2a[READ(offset=131072, length=32768)]error = 5
> >> sdhci_ti0-slot0: Got data interrupt 0x00000010, but there is no active command.
> >> sdhci_ti0-slot0: ============== REGISTER DUMP ==============
> >> sdhci_ti0-slot0: Sys addr: 0x00000000 | Version: 0x00003101
> >> sdhci_ti0-slot0: Blk size: 0x00000200 | Blk cnt: 0x00000010
> >> sdhci_ti0-slot0: Argument: 0x0024679e | Trn mode: 0x0000193a
> >> sdhci_ti0-slot0: Present: 0x01f70000 | Host ctl: 0x00000006
> >> sdhci_ti0-slot0: Power: 0x0000000d | Blk gap: 0x00000000
> >> sdhci_ti0-slot0: Wake-up: 0x00000000 | Clock: 0x00000007
> >> sdhci_ti0-slot0: Timeout: 0x0000000d | Int stat: 0x00000000
> >> sdhci_ti0-slot0: Int enab: 0x017f00fb | Sig enab: 0x017f00fb
> >> sdhci_ti0-slot0: AC12 err: 0x00000000 | Slot int: 0x00000000
> >> sdhci_ti0-slot0: Caps: 0x06e10080 | Max curr: 0x00000000
> >> sdhci_ti0-slot0: ===========================================
> >>
> >> …. few more similar messages, then ….
> >>
> >> mmcsd0: Error indicated: 1 Timeout
> >> g_vfs_done():mmcsd0s2a[WRITE(offset=20808192, length=512)]error = 5
> >> g_vfs_done():mmcsd0s2a[WRITE(offset=1276346368, length=24576)]error = 5
> >> panic: brelse: inappropriate B_PAGING or B_CLUSTER bp 0xcd148778
> >> [bt snipped]
> >>
> >
> > This was a single occurance, right? Like you're not dead in the water
> > or anything?
> >
> > There's insanity in that info... the register dump shows a multi-block
> > write (8kbytes) was set up, but the command that timed out was a read.
> > If a prior write had timed out why isn't there a g_vfs_done() error
> > logged for it?
> >
> > I think what we really need is some better error recovery in the mmc and
> > sd layers. Retrying a failed IO is cheap and easy. More complex
> > recovery is possible too (power cycling and re-intializing the card
> > and/or controller). But that has its own difficulties -- what if the
> > nature of the problem was that the user swapped cards? -- you don't want
> > to retry a write under those conditions.
>
> I'd disagree with this... Retrying often is the wrong thing to do. If the write didn't work the first time, why would it work the second? Looks like a programming bug here in controlling the sdhci controller since we got errors, then we got an interrupt with no pending commands. This suggests that our timeout isn't quite right...
>
> Warner
>
Retrying too often or endlessly is wrong, but IMO so is not retrying at
all, especially when the standard specifies error recovery strategies.
-- Ian
More information about the freebsd-arm
mailing list