PF or "traceroute -e -P TCP" bug?
Rostislav Krasny
rosti.bsd at gmail.com
Mon Aug 21 15:10:14 UTC 2006
On Mon, 21 Aug 2006 11:23:50 +0200
Daniel Hartmeier <daniel at benzedrine.cx> wrote:
> [ I'm CC'ing Crist, maybe he can explain why -e behaves like it does ]
>
> On Fri, Aug 18, 2006 at 11:57:56PM +0300, Rostislav Krasny wrote:
>
> > I've tried the new "-e" traceroute option on today's RELENG_6 and
> > found following problem:
> >
> > > traceroute -nq 1 -e -P TCP -p 80 216.136.204.117
>
> As I understand the -e option, that should send a sequence of TCP SYNs
> with
>
> - constant source port (randomly picked per invokation)
> - constant destination port 80
> - increasing TTL per probe
>
> Assuming you pass the packets with pf, it matters whether you create
> state or not. Filtering statelessly (without 'keep state'), there should
> be no problem at all. I assume you're filtering statefully.
I don't use 'keep state' in any pf rule. But I use a nat rule like this:
nat on $ext_if from $internal_net to any -> ($ext_if)
and according to 'pfctl -s state' any NAT-ed TCP connection creates a
state. For example, during the above traceroute:
self tcp 192.168.1.2:34345 -> xxx.xxx.xxx.xxx:50646 -> 216.136.204.117:80 SYN_SENT:CLOSED
> With constant source and destination ports, the first probe should
> create a state entry and all further probes (of the same traceroute
> invokation) should match that state entry.
>
> What you changed in your patch is switching to a sequential (instead of
> constant) source port. This forces creation of one state per probe,
> treating each probe as a separate connection.
Correct.
> I don't think that's in
> the spirit of the -e option. There's really no need for that, once the
> underlying problem is fixed.
>
> So, why doesn't -e without your patch produce probes that all match a
> single state entry?
By the way, I asked a friend from IRC to try "traceroute -e -P TCP"
through his router which does NATing by natd and it worked there.
> Look at how the TCP sequence numbers are generated across the probes:
>
> tcp->th_seq = (tcp->th_sport << 16) | (tcp->th_dport +
> (fixedPort ? outdata->seq : 0));
>
> This is the problem. traceroute increments the sequence number with each
> probe. I don't know why that is done. Why not use the same th_seq for
> all probes, like an ISN (initial sequence number) would be re-used in
> retransmissions in a real TCP handshake?
>
> If you create state on the first TCP SYN pf sees, pf will note the ISN
> from the traceroute side. When pf sees further SYNs from that side, it
> will deal with them like with any client retransmitting the SYN of the
> handshake (before the peer replies with a SYN+ACK, giving its side's
> ISN). Subsequent TCP SYNs with different ISN matching the address/port
> pairs will be blocked by pf.
>
> If this happens on the IP forwarding path (i.e. pf blocks the packet
> outgoing), the stack produces the ICMP host unreachable error that shows
> up as "!H" in traceroute. I assume you have a "pass out on $ext_if keep
> state" rule, and don't filter on the internal interface. If you add
> stateful filtering on the internal interface, I think you'll find that
> subsequent TCP SYNs are blocked without eliciting the ICMP error.
>
> I suggest traceroute with -e uses fixed th_seq, as in
>
> - tcp->th_seq = (tcp->th_sport << 16) | (tcp->th_dport +
> - (fixedPort ? outdata->seq : 0));
> + tcp->th_seq = (tcp->th_sport << 16) tcp->th_dport;
Even if I add accidentally deleted '|' it doesn't fix the problem:
> traceroute -nq 1 -e -P TCP -p 80 www.freebsd.org
traceroute to www.freebsd.org (216.136.204.117), 64 hops max, 52 byte packets
1 192.168.1.1 0.525 ms
2 10.0.0.138 2.122 ms
3 *
4 *
5 *
6 *
7 *
8 *
9 *
10 152.63.3.122 191.562 ms
11 *
12 *
^C
I can decrease number of the "*" hops by -w option:
> traceroute -nq 1 -e -w 10 -P TCP -p 80 www.freebsd.org
traceroute to www.freebsd.org (216.136.204.117), 64 hops max, 52 byte packets
1 192.168.1.1 0.506 ms
2 10.0.0.138 1.886 ms
3 *
4 *
5 *
6 *
7 212.143.12.45 151.282 ms
8 *
^C
According to repeatedly ran 'pfctl -s state | grep 216.136.204.117'
it really has some relation to TCP states in the pf. Before the
212.143.12.45 hop the state closed and after that hop a new state
created.
And by the way, I think a tcp_check() function checks tcp->th_seq
incorrectly:
tcp->th_seq == (ident << 16) | (port + seq)
In original version or after my patch it should be changed to this:
tcp->th_seq == (htons(ident) << 16) | (port + (fixedPort ? seq : 0))
and after your patch to this:
tcp->th_seq == (htons(ident) << 16) | port
It looks like the return value of the tcp_check() isn't used anywhere
anyway.
More information about the freebsd-net
mailing list