three multicast fixes
Adrian Chadd
adrian.chadd at gmail.com
Wed Apr 26 21:53:27 UTC 2017
hiya,
can you throw these in bugs?
-a
On 17 April 2017 at 06:58, Chris Torek <torek at torek.net> wrote:
> The first is mostly cosmetic, it's just something I observed when
> I turned on multicast debug to find the bug that the second patch
> is for.
>
> The second patch is a kludge and if anyone has a better fix, have
> at it. :-) Note that it's also hard to provoke the bug. The only
> way I know to observe it somewhat reliably is to create and
> destroy bridge interfaces, using a debug kernel with memory debug
> enabled, while adding and deleting hardware interfaces to the
> bridge so that they repeatedly join and leave multicast groups.
> Eventually you hit the race and crash.
>
> The third is simple enough to observe. It may affect only the lagg
> driver, which implements SIOCDELMULTI by removing and re-adding all
> the multicast addresses other than the one actually deleted (which
> is of course really-removed by then). If running lagg on a VIMAGE
> kernel, leaving a multicast group causes this occasional crash:
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address = 0x28
> fault code = supervisor read data, page not present
> instruction pointer = 0x20:0xffffffff80adf6ea
> stack pointer = 0x28:0xfffffe009117e810
> frame pointer = 0x28:0xfffffe009117e8b0
> code segment = base 0x0, limit 0xfffff, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags = interrupt enabled, resume, IOPL = 0
> current process = 0 (thread taskq)
>
> rt_newmaddrmsg() at rt_newmaddrmsg+0x2a/frame 0xfffffe009117e8b0
> if_addmulti() at if_addmulti+0x24c/frame 0xfffffe009117e940
> lagg_ether_cmdmulti() at lagg_ether_cmdmulti+0x168/frame 0xfffffe009117e980
> lagg_ioctl() at lagg_ioctl+0xad/frame 0xfffffe009117ea60
> in_leavegroup() at in_leavegroup+0xb9/frame 0xfffffe009117eab0
> inp_gcmoptions() at inp_gcmoptions+0x1a8/frame 0xfffffe009117eb20
> taskqueue_run_locked() at taskqueue_run_locked+0x117/frame 0xfffffe009117eb80
> taskqueue_thread_loop() at taskqueue_thread_loop+0xc8/frame 0xfffffe009117ebb0
> fork_exit() at fork_exit+0x85/frame 0xfffffe009117ebf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe009117ebf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>
> Here, "curvnet" is NULL in the lagg_ioctl(), so that when
> rt_newmaddrmsg() does:
>
> if (V_route_cb.any_count == 0)
>
> we crash with the offset of the per-vnet route_cb data.
>
> (The patches are in Git format, I applied them to the read-only
> freebsd master tree on GitHub.)
>
> Chris
>
> From 65145b74d3dae68feddcf8a2aad235c3f0a981e9 Mon Sep 17 00:00:00 2001
> From: Chris Torek <chris.torek at gmail.com>
> Date: Mon, 8 Feb 2016 18:55:30 -0800
> Subject: [PATCH 1/3] ip multicast debug: fix strings vs defines
>
> Turning on multicast debug made multicast failure worse
> because the strings and #define values no longer matched
> up. Fix them, and make sure they stay matched-up.
> ---
> sys/netinet/in_mcast.c | 31 +++++++++++++++++++++----------
> 1 file changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/sys/netinet/in_mcast.c b/sys/netinet/in_mcast.c
> index c7fff9463128..eef560e7aca7 100644
> --- a/sys/netinet/in_mcast.c
> +++ b/sys/netinet/in_mcast.c
> @@ -2920,7 +2920,14 @@ sysctl_ip_mcast_filters(SYSCTL_HANDLER_ARGS)
>
> #if defined(KTR) && (KTR_COMPILE & KTR_IGMPV3)
>
> -static const char *inm_modestrs[] = { "un", "in", "ex" };
> +static const char *inm_modestrs[] = {
> + [MCAST_UNDEFINED] = "un",
> + [MCAST_INCLUDE] = "in",
> + [MCAST_EXCLUDE] = "ex",
> +};
> +_Static_assert(MCAST_UNDEFINED == 0 &&
> + MCAST_EXCLUDE + 1 == nitems(inm_modestrs),
> + "inm_modestrs: no longer matches #defines");
>
> static const char *
> inm_mode_str(const int mode)
> @@ -2932,16 +2939,20 @@ inm_mode_str(const int mode)
> }
>
> static const char *inm_statestrs[] = {
> - "not-member",
> - "silent",
> - "idle",
> - "lazy",
> - "sleeping",
> - "awakening",
> - "query-pending",
> - "sg-query-pending",
> - "leaving"
> + [IGMP_NOT_MEMBER] = "not-member",
> + [IGMP_SILENT_MEMBER] = "silent",
> + [IGMP_REPORTING_MEMBER] = "reporting",
> + [IGMP_IDLE_MEMBER] = "idle",
> + [IGMP_LAZY_MEMBER] = "lazy",
> + [IGMP_SLEEPING_MEMBER] = "sleeping",
> + [IGMP_AWAKENING_MEMBER] = "awakening",
> + [IGMP_G_QUERY_PENDING_MEMBER] = "query-pending",
> + [IGMP_SG_QUERY_PENDING_MEMBER] = "sg-query-pending",
> + [IGMP_LEAVING_MEMBER] = "leaving",
> };
> +_Static_assert(IGMP_NOT_MEMBER == 0 &&
> + IGMP_LEAVING_MEMBER + 1 == nitems(inm_statestrs),
> + "inm_statetrs: no longer matches #defines");
>
> static const char *
> inm_state_str(const int state)
> --
> 2.12.1
>
>
> From 099e1b2c059fe962c1dd91af0a6735f02f611ac3 Mon Sep 17 00:00:00 2001
> From: Chris Torek <torek at ixsystems.com>
> Date: Wed, 10 Feb 2016 18:41:39 -0800
> Subject: [PATCH 2/3] ip multicast: fix use-after-free
>
> During if_detach(), we get a race where a closing socket is
> releasing multicast data (via inp_freemoptions()) at the same
> time as igmp_ifdetach() is releasing all multicast data for
> the interface, resulting in a potential double teardown and
> double free.
>
> This bug has been present since late 2011:
>
> Author: jhb <jhb at FreeBSD.org>
>
> Defer the work of freeing IPv4 multicast options from a
> socket to an asychronous task. ...
>
> It is very hard to trip over. You must create and delete
> interfaces (bridges that join real interfaces are good
> candidates) repeatedly, and even then, if M_IPMADDR (in_multi
> data structure) memory is not reused for something else during
> the race, the reference count in inm->inm_refcount is an
> unsigned int, so it decrements from the left-over 0 to
> 4294967295, avoiding a second free.
>
> Turning on the memory debug options (scribbling values over
> freed memory) catches the problem more quickly, though you
> still must create and destroy interfaces that partcipate in
> multicast to see it.
>
> The fix here is a kludge, but should serve until the entire
> network locking code and up/downcall system is reworked.
> ---
> sys/netinet/in.c | 4 ++++
> sys/netinet/in_mcast.c | 11 +++++++++++
> sys/netinet/ip_var.h | 1 +
> 3 files changed, 16 insertions(+)
>
> diff --git a/sys/netinet/in.c b/sys/netinet/in.c
> index e1611d50b4f0..50cfb188c442 100644
> --- a/sys/netinet/in.c
> +++ b/sys/netinet/in.c
> @@ -1013,6 +1013,8 @@ in_purgemaddrs(struct ifnet *ifp)
> struct in_multi *inm, *tinm;
> struct ifmultiaddr *ifma;
>
> + inp_freemopt_wait(); /* XXX kludge */
> +
> LIST_INIT(&purgeinms);
> IN_MULTI_LOCK();
>
> @@ -1043,6 +1045,8 @@ in_purgemaddrs(struct ifnet *ifp)
> igmp_ifdetach(ifp);
>
> IN_MULTI_UNLOCK();
> +
> + inp_freemopt_wait(); /* XXX kludge */
> }
>
> struct in_llentry {
> diff --git a/sys/netinet/in_mcast.c b/sys/netinet/in_mcast.c
> index eef560e7aca7..1d1258701fa2 100644
> --- a/sys/netinet/in_mcast.c
> +++ b/sys/netinet/in_mcast.c
> @@ -1585,6 +1585,17 @@ inp_freemoptions(struct ip_moptions *imo)
> taskqueue_enqueue(taskqueue_thread, &imo_gc_task);
> }
>
> +/*
> + * Wait for inp_freemoptions() to complete any current work.
> + * Used because inp_freemoptions is no longer synchronous.
> + */
> +void
> +inp_freemopt_wait(void)
> +{
> +
> + taskqueue_drain(taskqueue_thread, &imo_gc_task);
> +}
> +
> static void
> inp_freemoptions_internal(struct ip_moptions *imo)
> {
> diff --git a/sys/netinet/ip_var.h b/sys/netinet/ip_var.h
> index f7e58d18635a..e51bcffc4fc2 100644
> --- a/sys/netinet/ip_var.h
> +++ b/sys/netinet/ip_var.h
> @@ -200,6 +200,7 @@ extern struct pr_usrreqs rip_usrreqs;
> #define V_drop_redirect VNET(drop_redirect)
>
> void inp_freemoptions(struct ip_moptions *);
> +void inp_freemopt_wait(void);
> int inp_getmoptions(struct inpcb *, struct sockopt *);
> int inp_setmoptions(struct inpcb *, struct sockopt *);
>
> --
> 2.12.1
>
>
> From e27ff26bd086c46f2b1cce8eb95f4625cc4333fc Mon Sep 17 00:00:00 2001
> From: Chris Torek <torek at ixsystems.com>
> Date: Sat, 5 Mar 2016 15:02:33 -0800
> Subject: [PATCH 3/3] ip_multicast: defer vnet restore in in_leavegroup
>
> Note: in_leavegroup() does all its real work in
> in_leavegroup_locked(). For discussion here the two]
> functions might as well be equivalent.
>
> In in_leavegroup_locked(), when we're shedding a multicast
> group, we may (or may not) delete it from an interface via
> the igmp_change_state() call. This is where we currently
> set the multicast's vnet, and then restore the old vnet on
> return.
>
> However, a few lines later we use inm_release_locked() to
> release the inet multicast data structure, and that in turn
> may -- not necessarily will, only if the inm really is being
> freed -- call if_delmulti_ifma(), which may -- not necessarily
> will, again -- call the interface's SIOCDELMULTI ioctl
> (if and only if there is an interface and this was the last
> ref to this multicast address).
>
> For (at least) the lagg interface, we still need the current
> vnet to be valid during the SIOCDELMULTI. So, don't restore
> the old vnet until we've not only finished the IGMP code but
> also inm_release_locked().
> ---
> sys/netinet/in_mcast.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sys/netinet/in_mcast.c b/sys/netinet/in_mcast.c
> index 1d1258701fa2..4549c952e058 100644
> --- a/sys/netinet/in_mcast.c
> +++ b/sys/netinet/in_mcast.c
> @@ -1275,12 +1275,12 @@ in_leavegroup_locked(struct in_multi *inm, /*const*/ struct in_mfilter *imf)
> CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__);
> CURVNET_SET(inm->inm_ifp->if_vnet);
> error = igmp_change_state(inm);
> - CURVNET_RESTORE();
> if (error)
> CTR1(KTR_IGMPV3, "%s: failed igmp downcall", __func__);
>
> CTR2(KTR_IGMPV3, "%s: dropping ref on %p", __func__, inm);
> inm_release_locked(inm);
> + CURVNET_RESTORE();
>
> return (error);
> }
> --
> 2.12.1
>
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
More information about the freebsd-net
mailing list