Re: network crash in nhop_free

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Mon, 30 Aug 2021 07:32:19 UTC
On 30/08/2021 10:28, Andriy Gapon wrote:
> On 01/08/2021 16:36, Alexander V. Chernikov wrote:
>>
>>
>>> On 10 Jul 2021, at 10:07, Andriy Gapon <avg@FreeBSD.org> wrote:
>>>
>>> On 09/07/2021 00:02, Alexander V. Chernikov wrote:
>>>> Hi Andriy,
>>>> Could you by any chance provide a bit more info on the system networking 
>>>> configuration and the steps leading to panic?
>>>> No chance for a coredump?
>>>> destroy_nhgrp() suggests that there was a multipath route (default?) that 
>>>> was deleted.
>>>> nhops are created with UMA_ALIGN_PTR, so I suspect there is a garbage inside 
>>>> nhgrp pointer..
>>>
>>> I've just reproduced the problem and got a crash dump.
>>> The new panic is a little bit different, but I think that it confirms your 
>>> analysis.
>>> Also, you are right about the multipath route, although its creation was not 
>>> intentional.
>>
>> Should be fixed by 
>> https://cgit.freebsd.org/src/commit/?id=054948bd81bb9e4e32449cf351b62e501b8831ff 
>> .
> 
> I have to report that, unfortunately, as of main 
> bb958dcf3d8af3a033dacbf8133681c9b0c73b2f I can still reproduce the same panic 
> using the same steps.
> To be clear, as I reported two similar but still distinct panics, it's the first 
> panic, "Misaligned access from kernel space!".
> 
> I should also add that the commit message does not really match my scenario.
> In my case routes do not change quite fast.  I have generous pauses between 
> starting and stopping ppp.
> I have a feeling that there must something more deterministic that leads to the 
> crash.

Some more details from the today's crash:

panic: Misaligned access from kernel space!
cpuid = 0
time = 1630308311
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x184
panic() at panic+0x44
align_abort() at align_abort+0xb8
handle_el1h_sync() at handle_el1h_sync+0x78
--- exception, esr 0x96000021
nhop_free() at nhop_free+0x100
destroy_nhgrp() at destroy_nhgrp+0x38
epoch_call_task() at epoch_call_task+0x158
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x178
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc8
fork_exit() at fork_exit+0x74
fork_trampoline() at fork_trampoline+0x14
Uptime: 11m17s
Dumping 150 out of 998 MB:..3%..11%..22%..32%..43%..51%..62%..72%..83%

get_curthread () at /usr/devel/git/rock/sys/arm64/include/pcpu.h:68
68      /usr/devel/git/rock/sys/arm64/include/pcpu.h: No such file or directory.
(kgdb) bt
#0  get_curthread () at /usr/devel/git/rock/sys/arm64/include/pcpu.h:68
#1  doadump (textdump=textdump@entry=1) at 
/usr/devel/git/rock/sys/kern/kern_shutdown.c:417
#2  0xffff0000003bebf0 in kern_reboot (howto=260) at 
/usr/devel/git/rock/sys/kern/kern_shutdown.c:504
#3  0xffff0000003bf10c in vpanic (fmt=<optimized out>, ap=...) at 
/usr/devel/git/rock/sys/kern/kern_shutdown.c:947
#4  0xffff0000003bee3c in panic (fmt=0x0) at 
/usr/devel/git/rock/sys/kern/kern_shutdown.c:871
#5  0xffff0000006c2054 in align_abort (td=<optimized out>, frame=<optimized 
out>, esr=2516582433, far=16045693110842147062, lower=<optimized out>) at 
/usr/devel/git/rock/sys/arm64/arm64/trap.c:212
#6  <signal handler called>
#7  atomic_fetchadd_32_llsc (p=0xdeadc0dedeadc0f6, val=4294967295) at 
/usr/devel/git/rock/sys/arm64/include/atomic.h:316
#8  atomic_fetchadd_32 (p=<optimized out>, val=4294967295) at 
/usr/devel/git/rock/sys/arm64/include/atomic.h:316
#9  refcount_releasen (count=0xdeadc0dedeadc0f6, n=1) at 
/usr/devel/git/rock/sys/sys/refcount.h:152
#10 refcount_release (count=0xdeadc0dedeadc0f6) at 
/usr/devel/git/rock/sys/sys/refcount.h:174
#11 nhop_free (nh=<optimized out>) at 
/usr/devel/git/rock/sys/net/route/nhop_ctl.c:669
#12 0xffff000000506268 in free_nhgrp_nhops (nhg_priv=0xffffa00000d31b98) at 
/usr/devel/git/rock/sys/net/route/nhgrp_ctl.c:423
#13 destroy_nhgrp (nhg_priv=0xffffa00000d31b98) at 
/usr/devel/git/rock/sys/net/route/nhgrp_ctl.c:380
#14 0xffff000000405434 in epoch_call_task (arg=<optimized out>) at 
/usr/devel/git/rock/sys/kern/subr_epoch.c:819
#15 0xffff000000408ee4 in gtaskqueue_run_locked 
(queue=queue@entry=0xffffa00000c03c00) at 
/usr/devel/git/rock/sys/kern/subr_gtaskqueue.c:371
#16 0xffff000000408c38 in gtaskqueue_thread_loop 
(arg=arg@entry=0xffff000089b69008) at 
/usr/devel/git/rock/sys/kern/subr_gtaskqueue.c:547
#17 0xffff00000037701c in fork_exit (callout=0xffff000000408b6c 
<gtaskqueue_thread_loop>, arg=0xffff000089b69008, frame=0xffff000087934990) at 
/usr/devel/git/rock/sys/kern/kern_fork.c:1087

-- 
Andriy Gapon