Handling 100.000 packets/sec or more
Tom Pavel
pavel at NetworkPhysics.COM
Wed Jan 14 14:04:42 PST 2004
>>>>> On Wed, 14 Jan 2004, Richard Wendland <richard at starburst.demon.co.uk> wri
tes:
> > device polling(8) really does help _alot_ for packet floods/storms.
> > for device polling to work properly (imho) you would need to set HZ
> > to 1000.
> > I dont recommend any higher HZ on a PIII.
>
> Incidentally, setting HZ > 1000 would cause FreeBSD TCP to not comply
> with RFC1323, as it would make the TCP timestamp option clock tick faster
> than 1ms. RFC1323 4.2.2 specifies the clock rate to be in the range
> 1 ms to 1 sec per tick.
>
> Really the TCP timestamp option clock should be divorced from HZ before
> too long, as a time will come when people will want HZ > 1000.
>
> Actually a bit faster tick-rate is unlikely to run into much trouble in
> practice, but it will cause the PAWS algorithm to stop a long running
> TCP connection, see 4.2.3 of RFC1323.
>
> Richard
The PAWS thing is real. Idle SSH or telnet connections can easily get
hosed by wraparound if you crank up HZ too much. We encountered this
at Network Physics.
I had been meaning to submit a PR about this (and probably several
others as well) for quite a while now, but I always got distracted by
some other urgent matter... However, given the prod, I was able to
dig up the fix we used for this particular problem. Pretty sure these
diffs will not apply cleanly, even to -stable, but no doubt the gist
of the idea should be clear enough. Hopefully, this can save someone
some work on getting a fix into the tree.
Tom Pavel
Network Physics
pavel at networkphysics.com / pavel at alum.mit.edu
Index: tcp_input.c
===================================================================
RCS file: /u1/Repo/FreeBSD/sys/netinet/tcp_input.c,v
retrieving revision 1.41
retrieving revision 1.42
diff -u -r1.41 -r1.42
--- tcp_input.c 2 Apr 2002 23:27:33 -0000 1.41
+++ tcp_input.c 3 Apr 2002 22:24:24 -0000 1.42
@@ -1185,7 +1185,7 @@
*/
if ((to.to_flag & TOF_TS) != 0 &&
SEQ_LEQ(th->th_seq, tp->last_ack_sent)) {
- tp->ts_recent_age = ticks;
+ GETCURTS(tp->ts_recent_age);
tp->ts_recent = to.to_tsval;
}
@@ -1228,9 +1228,12 @@
&& ((!(sack_check(tp))) ||
to.to_tsecr)
#endif
- )
- tcp_xmit_timer(tp, ticks - to.to_tsecr + 1);
- else {
+ ) {
+ u_long cur_ts, rtt_ticks;
+ GETCURTS(cur_ts);
+ rtt_ticks = TSTMPTOTICK (cur_ts - to.to_tsecr);
+ tcp_xmit_timer(tp, rtt_ticks + 1);
+ } else {
#ifdef LTSTMP
tcp_xmit_timer(tp, tp->t_rtttime);
#else
@@ -1941,9 +1944,11 @@
*/
if ((to.to_flag & TOF_TS) != 0 && tp->ts_recent &&
TSTMP_LT(to.to_tsval, tp->ts_recent)) {
+ u_long cur_ts;
/* Check to see if ts_recent is over 24 days old. */
- if ((int)(ticks - tp->ts_recent_age) > TCP_PAWS_IDLE) {
+ GETCURTS(cur_ts);
+ if ((int)(cur_ts - tp->ts_recent_age) > TCP_PAWS_IDLE) {
/*
* Invalidate ts_recent. If this segment updates
* ts_recent, the age will be reset later and ts_recent
@@ -2120,7 +2125,7 @@
*/
if ((to.to_flag & TOF_TS) != 0 &&
SEQ_LEQ(th->th_seq, tp->last_ack_sent)) {
- tp->ts_recent_age = ticks;
+ GETCURTS(tp->ts_recent_age);
tp->ts_recent = to.to_tsval;
}
@@ -2754,9 +2759,12 @@
/* bug fix from Mark Allman */
&& ((!sack_check(tp)) || to.to_tsecr)
#endif
- )
- tcp_xmit_timer(tp, ticks - to.to_tsecr + 1);
- else {
+ ) {
+ u_long cur_ts, rtt_ticks;
+ GETCURTS(cur_ts);
+ rtt_ticks = TSTMPTOTICK (cur_ts - to.to_tsecr);
+ tcp_xmit_timer(tp, rtt_ticks + 1);
+ } else {
#ifdef LTSTMP /* use local timestamp */
tcp_xmit_timer(tp, tp->t_rtttime);
@@ -3293,7 +3301,7 @@
if (th->th_flags & TH_SYN) {
tp->t_flags |= TF_RCVD_TSTMP;
tp->ts_recent = to->to_tsval;
- tp->ts_recent_age = ticks;
+ GETCURTS(tp->ts_recent_age);
}
break;
Index: tcp_output.c
===================================================================
RCS file: /u1/Repo/FreeBSD/sys/netinet/tcp_output.c,v
retrieving revision 1.32
retrieving revision 1.33
diff -u -r1.32 -r1.33
--- tcp_output.c 3 Apr 2002 01:55:20 -0000 1.32
+++ tcp_output.c 3 Apr 2002 22:24:24 -0000 1.33
@@ -616,7 +616,8 @@
/* Form timestamp option as shown in appendix A of RFC 1323. */
*lp++ = htonl(TCPOPT_TSTAMP_HDR);
- *lp++ = htonl(ticks);
+ GETCURTS(*lp);
+ *lp++ = htonl(*lp);
*lp = htonl(tp->ts_recent);
optlen += TCPOLEN_TSTAMP_APPA;
}
Index: tcp_seq.h
===================================================================
RCS file: /u1/Repo/FreeBSD/sys/netinet/tcp_seq.h,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -r1.2 -r1.3
--- tcp_seq.h 16 Jul 2001 18:18:44 -0000 1.2
+++ tcp_seq.h 3 Apr 2002 22:24:24 -0000 1.3
@@ -88,8 +88,19 @@
(tp)->iss
#endif
-#define TCP_PAWS_IDLE (24 * 24 * 60 * 60 * hz)
- /* timestamp wrap-around time */
+/* clock macros for RFC1323 timestamps */
+#define TSTMP_UNITS (10) /* in ms (RFC1323 says 1-1000 ms) */
+#define GETCURTS(ts) \
+ do { \
+ struct timeval tv; \
+ getmicrouptime(&tv); \
+ (ts) = (u_long)tv.tv_sec * 1000 + tv.tv_usec / 1000; \
+ (ts) /= TSTMP_UNITS; \
+ } while (0)
+#define TSTMPTOTICK(ts) (((int64_t)(ts))*hz*TSTMP_UNITS/1000)
+
+#define TCP_PAWS_IDLE (24 * 24 * 60 * 60 * 1000/TSTMP_UNITS)
+ /* timestamp wrap-around time (24 days in 10ms units) */
#ifdef _KERNEL
extern tcp_cc tcp_ccgen; /* global connection count */
More information about the freebsd-net
mailing list