inp_gcmoptions / imo_gc_task refers to free()d memory
Chris Torek
torek at ixsystems.com
Mon Feb 8 23:29:17 UTC 2016
Looking for clues from multicast gurus :-)
I turned on all the memory debug options looking for a ZFS issue,
and promptly stumbled over an IP multicast bug that occurs when
creating and deleting bridge and other virtual interfaces quickly.
The actual crash is here (note, clang optimized out several stack
frames due to tail calls), along with the relevant info:
in_leavegroup_locked() at in_leavegroup_locked+0x6b/frame
0xfffffe0babd1aae0
inp_gcmoptions() at inp_gcmoptions+0x1e2/frame 0xfffffe0babd1ab20
taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame
0xfffffe0babd1ab80
taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame
0xfffffe0bab1dabb0
fork_exit() at ...
in_leavegroup_locked+0x6b: movq 0x10(%rax),%rax
rax 0xdeadc0dedeadc0de
That particular movq instruction is part of the
CURVNET_SET(inm->inm_ifp->if_vnet) macro call. Here's
gdb disassembly in the area (and gdb points to that line,
netinet/in_mcast.c line 1243 or so).
0x11c7 <in_leavegroup_locked+103>: mov 0x18(%r14),%rax
0x11cb <in_leavegroup_locked+107>: mov 0x10(%rax),%rax
0x11cf <in_leavegroup_locked+111>: test %rax,%rax
Here %r14 holds "inm", so it's the load to get if_vnet into %rax
that fails, not a load of ifp->foo (CURVNET_SET expands to an assert
that ifp is not NULL and that it has the right vnet magic number,
which all starts with the "test %rax" instruction).
The conclusion is that inm itself is a valid pointer but inm->inm_ifp
is 0xdeadc0dedeadc0de => we've done a free(inm, M_IPMADDR) already.
But, "struct in_multi" has a ref count. So I'm a bit at a loss as
to where the reference count failed us...
(I did find a thread in the middle of deleting a bridge interface, but
I think at this point that this is a red herring.)
BTW, this is the wrong place for this fix, but I'll attach it anyway,
since turning on memory debugging found this bug too, so if you turn
on memory debugging you might hang if you have a ZFS root :-)
(I need to make a bit of progress with my commit bit, and just
commit the fix below...)
Chris
commit cc2a7f115fe69a5d91e54654ba1f210b7db19df6
Author: Chris Torek <torek at ixsystems.com>
Date: Sun Feb 7 13:36:34 2016 -0800
taskqueue_drain_all: reload after sleeping
In taskqueue_drain_all(), after we use TQ_SLEEP to wait
for pending tasks to run, we must reload the tail of the
task queue, since it may be done now.
The symptom of this bug was a thread hanging in TQ_SLEEP if a task
lived in malloc()ed memory that was reused or (more likely) filled
with 0xdeadc0de on free() due to memory checking being turned on.
The latter would make the pending value become 49374 (0xc0de) and
we would wait forever.
diff --git a/sys/kern/subr_taskqueue.c b/sys/kern/subr_taskqueue.c
index f104bb5..55a1ca1 100644
--- a/sys/kern/subr_taskqueue.c
+++ b/sys/kern/subr_taskqueue.c
@@ -440,10 +440,9 @@ taskqueue_drain_all(struct taskqueue *queue)
WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, __func__);
TQ_LOCK(queue);
- task = STAILQ_LAST(&queue->tq_queue, task, ta_link);
- if (task != NULL)
- while (task->ta_pending != 0)
- TQ_SLEEP(queue, task, &queue->tq_mutex, PWAIT, "-", 0);
+ while ((task = STAILQ_LAST(&queue->tq_queue, task, ta_link)) != NULL &&
+ task->ta_pending != 0)
+ TQ_SLEEP(queue, task, &queue->tq_mutex, PWAIT, "-", 0);
taskqueue_drain_running(queue);
KASSERT(STAILQ_EMPTY(&queue->tq_queue),
("taskqueue queue is not empty after draining"));
More information about the freebsd-net
mailing list