Re: ARM64 system error
- In reply to: Andrew Turner : "Re: ARM64 system error"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 03 Aug 2022 16:50:49 UTC
> On Aug 3, 2022, at 12:28 , Andrew Turner <andrew@fubar.geek.nz> wrote: > > >> On 31 Jul 2022, at 17:55, John F Carr <jfc@mit.edu> wrote: >> >> My OverDrive 1000 (Cortex A57) running CURRENT just crashed with the unhelpful message "panic: Unhandled System Error". Is there any way to get better information? The ESR value bf000000 translates to "system error with implementation-defined code 0" so that's not much use. The instruction associated with the interrupt can't fault ("subs w22, w22, #0x1") so it must be an asynchronous error. On other systems I've seen bits you can test or registers you can read to get details. > > By my reading of the Cortex-A57 documentation [1] I think the ESR value shows the exception can be attributed to the current core, is containable to a given code sequence, and is a decode error. > > It’s likely due to msk_phy_readreg accessing the phy, but it doesn’t respond quickly enough. > > Does an older kernel boot? If so can you try bisecting to find which commit caused the panic. Thanks, I missed that bit of documentation. The same kernel worked after reboot with the same networking configuration. The theory of a slow response from an I/O device sounds good. Is there an easy way to trigger a system error to test error handling code? For example, I once debugged a machine check handler (IBM lingo) by using a control/debug register that could intentionally write bad ECC to RAM.