[FIX] dummynet breaks IP reassembly
Andre Oppermann
andre at freebsd.org
Wed Feb 15 12:27:36 PST 2006
Ruslan Ermilov wrote:
>
> Hi Andre,
>
> If I'm not mistaken, *you* converted ipfw to use pfil(9).
> During the conversion, the following bug was introduced.
>
> When forwarding fragmented packets through a dummynet pipe
> (ip_input -> ip_forward -> ip_output -> pipe -> ip_output)
> the last ip_output() in the chain that does the actual IP
> delivery sets ip_id of all fragments to different values,
> making it impossible to reassemble the packet at receive
> side.
>
> Example of forwarding a fragmented IP datagram from em0
> to xl0:
>
> # tcpdump -nvi em0 net 192.168.2
> tcpdump: listening on em0, link-type EN10MB (Ethernet), capture size 96 bytes
> 00:04:35.170994 IP (tos 0x0, ttl 64, id 2186, offset 0, flags [+], proto: ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 2819, seq 0, length 1480
> 00:04:35.171242 IP (tos 0x0, ttl 64, id 2186, offset 1480, flags [none], proto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp
>
> # tcpdump -nvi xl0 net 192.168.2
> tcpdump: listening on xl0, link-type EN10MB (Ethernet), capture size 96 bytes
> 00:04:35.171028 IP (tos 0x0, ttl 63, id 10560, offset 0, flags [+], proto: ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 2819, seq 0, length 1480
> 00:04:35.171266 IP (tos 0x0, ttl 63, id 10561, offset 1480, flags [none], proto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp
>
> Note that IDs were rewritten from the original 2186 and
> worse, they are different.
>
> In 4,x, the "flags" field (see the patch below) was used
> to keep the IP_FORWARDING bit passed to ip_output() by
> ip_forward(). This bit was kept in the dummynet packet
> tag (in the "flags" field) and later passed to the second
> ip_output() call, causing the latter to NOT touch the IP
> header again. This feature was lost, resulting in a bug.
>
> Since dummynet never calls ip_output() with unfilled
> header, it's safe to always call ip_output() from dummynet
> with the IP_FORWARDING bit set, to indicate it's forwarded
> from another ip_output() and so it shouldn't attempt to
> modify the header.
That is correct.
> Surprisingly, it "seemed" to work for packets exceeding
> MTU and originating from the dummynetting host, mainly
> because the packet wasn't fragmented while in dummynet.
> Yet, the ip_id field was always incremented in my tests
> (pipe with no bandwidth limitation), comparing it before
> and after the dummynet processing. Now, the ip_id is
> always preserved.
Packets that originated on the dummynetting host don't get
fragmented until they actually hit the if_output() in ip_output().
The entire over-sized packet stays intact while in dummynet
and only get fragmented later.
> # tcpdump -nvi em0 net 192.168.2
> tcpdump: listening on em0, link-type EN10MB (Ethernet), capture size 96 bytes
> 00:12:04.654669 IP (tos 0x0, ttl 64, id 2192, offset 0, flags [+], proto: ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 7939, seq 0, length 1480
> 00:12:04.654917 IP (tos 0x0, ttl 64, id 2192, offset 1480, flags [none], proto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp
>
> # tcpdump -nvi xl0 net 192.168.2
> tcpdump: listening on xl0, link-type EN10MB (Ethernet), capture size 96 bytes
> 00:12:04.654703 IP (tos 0x0, ttl 63, id 2192, offset 0, flags [+], proto: ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 7939, seq 0, length 1480
> 00:12:04.654939 IP (tos 0x0, ttl 63, id 2192, offset 1480, flags [none], proto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp
>
> (Note the ip_id is preserved when forwarding.)
>
> I'm not sure about the IPv6 portion but it's consistent
> with the normal forwarding path so I believe it's correct.
> Comments?
Looks correct. Dunno about the effect on IPv6. IIRC IPv6 doesn't
support fragmenting packets and always has to do PMTU discovery.
Thanks for tracking down and fixing this bug.
--
Andre
> %%%
> Index: ip_dummynet.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/netinet/ip_dummynet.c,v
> retrieving revision 1.98
> diff -u -p -r1.98 ip_dummynet.c
> --- ip_dummynet.c 3 Feb 2006 11:38:19 -0000 1.98
> +++ ip_dummynet.c 10 Feb 2006 21:20:59 -0000
> @@ -769,7 +769,7 @@ dummynet_send(struct mbuf *m)
> pkt = dn_tag_get(m);
> switch (pkt->dn_dir) {
> case DN_TO_IP_OUT:
> - ip_output(m, NULL, NULL, pkt->flags, NULL, NULL);
> + ip_output(m, NULL, NULL, IP_FORWARDING, NULL, NULL);
> break ;
> case DN_TO_IP_IN :
> ip = mtod(m, struct ip *);
> @@ -783,7 +783,7 @@ dummynet_send(struct mbuf *m)
> break;
>
> case DN_TO_IP6_OUT:
> - ip6_output(m, NULL, NULL, pkt->flags, NULL, NULL, NULL);
> + ip6_output(m, NULL, NULL, IPV6_FORWARDING, NULL, NULL, NULL);
> break;
> #endif
> case DN_TO_IFB_FWD:
> @@ -1129,7 +1129,6 @@ locate_pipe(int pipe_nr)
> * ifp the 'ifp' parameter from the caller.
> * NULL in ip_input, destination interface in ip_output,
> * rule matching rule, in case of multiple passes
> - * flags flags from the caller, only used in ip_output
> *
> */
> static int
> @@ -1213,8 +1212,6 @@ dummynet_io(struct mbuf *m, int dir, str
> pkt->dn_dir = dir ;
>
> pkt->ifp = fwa->oif;
> - if (dir == DN_TO_IP_OUT || dir == DN_TO_IP6_OUT)
> - pkt->flags = fwa->flags;
>
> if (q->head == NULL)
> q->head = m;
> Index: ip_dummynet.h
> ===================================================================
> RCS file: /home/ncvs/src/sys/netinet/ip_dummynet.h,v
> retrieving revision 1.38
> diff -u -p -r1.38 ip_dummynet.h
> --- ip_dummynet.h 29 Nov 2005 00:11:01 -0000 1.38
> +++ ip_dummynet.h 10 Feb 2006 21:13:45 -0000
> @@ -130,7 +130,6 @@ struct dn_pkt_tag {
>
> dn_key output_time; /* when the pkt is due for delivery */
> struct ifnet *ifp; /* interface, for ip_output */
> - int flags ; /* flags, for ip_output (IPv6 ?) */
> struct _ip6dn_args ip6opt; /* XXX ipv6 options */
> };
> #endif /* _KERNEL */
> Index: ip_fw.h
> ===================================================================
> RCS file: /home/ncvs/src/sys/netinet/ip_fw.h,v
> retrieving revision 1.103
> diff -u -p -r1.103 ip_fw.h
> --- ip_fw.h 13 Dec 2005 12:16:02 -0000 1.103
> +++ ip_fw.h 10 Feb 2006 21:15:41 -0000
> @@ -510,8 +510,6 @@ struct ip_fw_args {
> struct ip_fw *rule; /* matching rule */
> struct ether_header *eh; /* for bridged packets */
>
> - int flags; /* for dummynet */
> -
> struct ipfw_flow_id f_id; /* grabbed from IP header */
> u_int32_t cookie; /* a cookie depending on rule action */
> struct inpcb *inp;
> %%%
>
> Cheers,
> --
> Ruslan Ermilov
> ru at FreeBSD.org
> FreeBSD committer
>
> --------------------------------------------------------------------------------
> Part 1.2Type: application/pgp-signature
More information about the freebsd-net
mailing list