Re: network crash in nhop_free

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Mon, 30 Aug 2021 07:51:11 UTC
It seems that there is a mismatch between nhg_priv->nhg->nhops and 
nhg_priv->nhg_nh_weights.

(kgdb) p nhg_priv->nhg_nh_weights[0]
$8 = {nh = 0xffffa000369f1600, weight = 1}
(kgdb) p nhg_priv->nhg_nh_weights[1]
$9 = {nh = 0xffffa000369f1400, weight = 0}

(kgdb) p nhg_priv->nhg->nhops[0]
$10 = (struct nhop_object *) 0xffffa000369f1600
(kgdb) p nhg_priv->nhg->nhops[1]
$11 = (struct nhop_object *) 0xffffa00036bcce00

0xffffa000369f1600 is common between nhg_priv->nhg_nh_weights[0] and 
nhg_priv->nhg->nhops[0], but the other nhop is different.

nhg_priv->nhg->nhops[1] is okay, but nhg_priv->nhg_nh_weights[1].nh points to 
freed memory:
(kgdb) x/a nhg_priv->nhg_nh_weights[1].nh
0xffffa000369f1400:     0xdeadc0dedeadc0de

On 30/08/2021 10:32, Andriy Gapon wrote:
> Some more details from the today's crash:
> 
> panic: Misaligned access from kernel space!
> cpuid = 0
> time = 1630308311
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> vpanic() at vpanic+0x184
> panic() at panic+0x44
> align_abort() at align_abort+0xb8
> handle_el1h_sync() at handle_el1h_sync+0x78
> --- exception, esr 0x96000021
> nhop_free() at nhop_free+0x100
> destroy_nhgrp() at destroy_nhgrp+0x38
> epoch_call_task() at epoch_call_task+0x158
> gtaskqueue_run_locked() at gtaskqueue_run_locked+0x178
> gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc8
> fork_exit() at fork_exit+0x74
> fork_trampoline() at fork_trampoline+0x14
> Uptime: 11m17s
> Dumping 150 out of 998 MB:..3%..11%..22%..32%..43%..51%..62%..72%..83%
> 
> get_curthread () at /usr/devel/git/rock/sys/arm64/include/pcpu.h:68
> 68      /usr/devel/git/rock/sys/arm64/include/pcpu.h: No such file or directory.
> (kgdb) bt
> #0  get_curthread () at /usr/devel/git/rock/sys/arm64/include/pcpu.h:68
> #1  doadump (textdump=textdump@entry=1) at 
> /usr/devel/git/rock/sys/kern/kern_shutdown.c:417
> #2  0xffff0000003bebf0 in kern_reboot (howto=260) at 
> /usr/devel/git/rock/sys/kern/kern_shutdown.c:504
> #3  0xffff0000003bf10c in vpanic (fmt=<optimized out>, ap=...) at 
> /usr/devel/git/rock/sys/kern/kern_shutdown.c:947
> #4  0xffff0000003bee3c in panic (fmt=0x0) at 
> /usr/devel/git/rock/sys/kern/kern_shutdown.c:871
> #5  0xffff0000006c2054 in align_abort (td=<optimized out>, frame=<optimized 
> out>, esr=2516582433, far=16045693110842147062, lower=<optimized out>) at 
> /usr/devel/git/rock/sys/arm64/arm64/trap.c:212
> #6  <signal handler called>
> #7  atomic_fetchadd_32_llsc (p=0xdeadc0dedeadc0f6, val=4294967295) at 
> /usr/devel/git/rock/sys/arm64/include/atomic.h:316
> #8  atomic_fetchadd_32 (p=<optimized out>, val=4294967295) at 
> /usr/devel/git/rock/sys/arm64/include/atomic.h:316
> #9  refcount_releasen (count=0xdeadc0dedeadc0f6, n=1) at 
> /usr/devel/git/rock/sys/sys/refcount.h:152
> #10 refcount_release (count=0xdeadc0dedeadc0f6) at 
> /usr/devel/git/rock/sys/sys/refcount.h:174
> #11 nhop_free (nh=<optimized out>) at 
> /usr/devel/git/rock/sys/net/route/nhop_ctl.c:669
> #12 0xffff000000506268 in free_nhgrp_nhops (nhg_priv=0xffffa00000d31b98) at 
> /usr/devel/git/rock/sys/net/route/nhgrp_ctl.c:423
> #13 destroy_nhgrp (nhg_priv=0xffffa00000d31b98) at 
> /usr/devel/git/rock/sys/net/route/nhgrp_ctl.c:380
> #14 0xffff000000405434 in epoch_call_task (arg=<optimized out>) at 
> /usr/devel/git/rock/sys/kern/subr_epoch.c:819
> #15 0xffff000000408ee4 in gtaskqueue_run_locked 
> (queue=queue@entry=0xffffa00000c03c00) at 
> /usr/devel/git/rock/sys/kern/subr_gtaskqueue.c:371
> #16 0xffff000000408c38 in gtaskqueue_thread_loop 
> (arg=arg@entry=0xffff000089b69008) at 
> /usr/devel/git/rock/sys/kern/subr_gtaskqueue.c:547
> #17 0xffff00000037701c in fork_exit (callout=0xffff000000408b6c 
> <gtaskqueue_thread_loop>, arg=0xffff000089b69008, frame=0xffff000087934990) at 
> /usr/devel/git/rock/sys/kern/kern_fork.c:1087


-- 
Andriy Gapon