question about trimning data "len" conditions in TSO in tcp_output.c
Cui, Cheng
Cheng.Cui at netapp.com
Wed Apr 13 15:18:43 UTC 2016
Hello Hans,
Does my previous email reach you well?
Thanks,
--Cheng Cui
NetApp Scale Out Networking
On 4/10/16, 4:44 PM, "Cui, Cheng" <Cheng.Cui at netapp.com> wrote:
>Hi Hans,
>
>I would continue this discussion with a different change. The piece of
>change is
>here and also I attached the patch "change.patch" against the FreeBSD HEAD
>code-line.
>
>diff --git a/sys/netinet/tcp_output.c b/sys/netinet/tcp_output.c
>index 2043fc9..43b0737 100644
>--- a/sys/netinet/tcp_output.c
>+++ b/sys/netinet/tcp_output.c
>@@ -939,23 +939,15 @@ send:
> * emptied:
> */
> max_len = (tp->t_maxseg - optlen);
>- if ((off + len) < sbavail(&so->so_snd)) {
>+ if (len > (max_len << 1)) {
> moff = len % max_len;
> if (moff != 0) {
> len -= moff;
> sendalot = 1;
> }
> }
>-
>- /*
>- * In case there are too many small fragments
>- * don't use TSO:
>- */
>- if (len <= max_len) {
>- len = max_len;
>- sendalot = 1;
>- tso = 0;
>- }
>+ KASSERT(len >= max_len,
>+ ("[%s:%d]: len < max_len", __func__, __LINE__));
> /*
> * Send the FIN in a separate segment
>
>
>
>
>I think this change could save additional loops that send single MSS-size
>packets. So I think some CPU cycles can be saved as well, due to this
>change
>reduced software sends and pushed more data to offloading sends.
>
>Here is my test. The iperf command I choose pushes 100Mbytes data to the
>wire by setting the default TCP sendspace to 1MB and recvspace to 2MB. I
>tested this TCP connection performance on a pair of 10Gbps FreeBSD 10.2
>nodes
>(s1 and r1) with a switch in between. Both nodes have TSO and delayed ACK
>enabled.
>
>root at s1:~ # ping -c 3 r1
>PING r1-link1 (10.1.2.3): 56 data bytes
>64 bytes from 10.1.2.3: icmp_seq=0 ttl=64 time=0.045 ms
>64 bytes from 10.1.2.3: icmp_seq=1 ttl=64 time=0.037 ms
>64 bytes from 10.1.2.3: icmp_seq=2 ttl=64 time=0.038 ms
>
>--- r1-link1 ping statistics ---
>3 packets transmitted, 3 packets received, 0.0% packet loss
>round-trip min/avg/max/stddev = 0.037/0.040/0.045/0.004 ms
>
>1M snd buffer/2M rcv buffer
>sysctl -w net.inet.tcp.hostcache.expire=1
>sysctl -w net.inet.tcp.sendspace=1048576
>sysctl -w net.inet.tcp.recvspace=2097152
>
>iperf -s <== iperf command at receiver
>iperf -c r1 -m -n 100M <== iperf command at sender
>
>root at s1:~ # iperf -c r1 -m -n 100M
>------------------------------------------------------------
>Client connecting to r1, TCP port 5001
>TCP window size: 1.00 MByte (default)
>------------------------------------------------------------
>[ 3] local 10.1.2.2 port 22491 connected with 10.1.2.3 port 5001
>[ ID] Interval Transfer Bandwidth
>[ 3] 0.0- 0.3 sec 100 MBytes 2.69 Gbits/sec
>[ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
>
>root at r1:~ # iperf -s
>------------------------------------------------------------
>Server listening on TCP port 5001
>TCP window size: 2.00 MByte (default)
>------------------------------------------------------------
>[ 4] local 10.1.2.3 port 5001 connected with 10.1.2.2 port 22491
>[ ID] Interval Transfer Bandwidth
>[ 4] 0.0- 0.3 sec 100 MBytes 2.62 Gbits/sec
>
>Each test sent 100MBytes of data, and I collected the packet trace from
>both
>nodes by tcpdump. I did this test twice to confirm the result can be
>reproduced.
>
>From the trace files of both nodes before my code change, I see a lot of
>single-MSS size packets. See the attached trace files in
>"before_change.zip".
>For example, in a sender trace file I see 43480 single-MSS size
>packets(tcp.len==1448) out of 57005 packets that contain data(tcp.len >
>0).
>That's 76.2%.
>
>And I did the same iperf test and gathered trace files. I did not find
>many single-MSS packets this time. See the attached trace files in
>"after_change.zip". For example, in a sender trace file I see zero
>single-MSS
>size packets(tcp.len==1448) out of 35729 data packets(tcp.len > 0).
>
>Compared with the receiver traces, I did not see significant more
>fractional
>packets received after change.
>
>I also did tests using netperf, although I did not get enough 95%
>confidence for
>every test on snd/rcv buffer size. Attached are my netperf result on
>different
>snd/rcv buffer size before and after the change (netperf_before_change.txt
>and
>netperf_after_change.txt), which also look good.
>
>used netperf command:
>netperf -H s1 -t TCP_STREAM -C -c -l 400 -i 10,3 -I 95,10 -- -s
>${LocalSndBuf} -S ${RemoteSndBuf}
>
>
>Thanks,
>--Cheng Cui
>NetApp Scale Out Networking
>
More information about the svn-src-head
mailing list