ThunderX Panic after r368370

Marcel Flores marcel at brickporch.com
Mon Dec 7 00:59:55 UTC 2020



> On Dec 6, 2020, at 3:51 AM, Michal Meloun <meloun.michal at gmail.com> wrote:
> 
> 
> 
> On 06.12.2020 10:47, Mark Millard wrote:
>> On 2020-Dec-6, at 00:17, Michal Meloun <meloun.michal at gmail.com> wrote:
>>> On 06.12.2020 3:21, Marcel Flores wrote:
>>>> Hi All,
>>>> Looks like the ThunderX started panicking at boot after r368370:
>>>> https://reviews.freebsd.org/rS368370
>>>> From a verbose boot, it looks like it bails in gic0 redistributor setup(?):
>>>> gic0: CPU29 Re-Distributor woke up
>>>> gic0: CPU24 enabled CPU interface via system registers
>>>> gic0: CPU17 enabled CPU interface via system registers
>>>> gic0: CPU29 enabled CPU interface via system registers
>>>> done
>>>> Full Verbose boot:
>>>> https://gist.github.com/mesflores/f026122495c8494d041bce04d30b15bb
>>>> I'm not really familiar with the details of the commit, but happy to test
>>>> anything if anyone has any ideas.
>>> 
>>> 
>>> Hi Marcel
>>> are you able to get crashdump and do backtrace?
>>> https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html#kerneldebug-obtain
>>> and
>>> https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html
>>> If not, I'll make some debug patch.
>>> 
>>> It's weird, even though GIC is potentially affected by my patch, in this case the cpuid numbering was not changed.
>> (I've no access to a ThunderX. I just looked for my own curiosity.
>> Sorry if this is obvious and so is noise.)
>> When I looked at the code it appeared to be the last "->" in
>> the following that was dereferencing the nullptr value (via [x8]
>> in assembler notation):
>> static uint64_t
>> its_cmd_prepare(struct its_cmd *cmd, struct its_cmd_desc *desc)
>> {
>>         uint64_t target;
>>         uint8_t cmd_type;
>>         u_int size;
>>         cmd_type = desc->cmd_type;
>>         target = ITS_TARGET_NONE;
>>         switch (cmd_type) {
>>         case ITS_CMD_MOVI:      /* Move interrupt ID to another collection */
>>                 target = desc->cmd_desc_movi.col->col_target;
>> . . .
>> In other words: it appeared to me that the above desc->cmd_desc_movi.col
>> evaluated as 0 when used in what was reported.
> This is very probably right analysis. But problem is that cmd_desc_movi.col should not be NULL, is initialized in its_cmd_movi from sc->sc_its_cols which should be allocated in gicv3_its_attach().
> 
> 
> Marcel, can you, please also try this debug patch?
> https://github.com/strejda/freebsd/commit/a25ed736644b05672e3e813891af213c280daac3
> Unfortunately, I have only single socket board with GIv3, Honeycomb, but it still boots fine.
> 
> Thanks, Michal

Debug patch output here (I also switched from GENERIC-NODEBUG to GENERIC):

https://gist.github.com/mesflores/27bd1cca45b04e5b938166c9f1f79a04

Having a little trouble getting the crashdump saved, but will update if I can sort it out.

-m



More information about the freebsd-arm mailing list