FreeBSD 7.0 / Recv-Q full ? / win 0 ?
Andreas Carbin
andreas.carbin at run.se
Sun Nov 23 18:00:50 PST 2008
Hello all,
I have the following issue with my (quite newly installed) FreeBSD 7.0 machines:
(I use "FreeBSD 7.0-RELEASE-p5 #0: Wed Oct 1 07:51:58 UTC 2008" on Dell PowerEdge 2970.)
When I copy large files with SCP from one host to another the destination host's recieve queue seems to fill up after a random number of seconds (10 - 300) with about 89.000 bytes, and the destination host sends Window Size = 0 to the sender. This means no data is transferred and the connection has "locked up" in some way (true?).
This almost always happens when I copy a file from one host to another where there is a WAN connection between them. I have checked firewall rules - these are open to almost any traffic. (I have seen it happen between two locally connected machines also.) When copying with SCP starts, it runs perfectly at about 10 megabyte/s (100Mbit/s WAN network). A 3 GB file may succeed <5%. Error occurrs in about 10 to 300 seconds - then all payload data traffic stops. The TCP connection is still open.
My guess was that maybe we get errors when copying this fast "close to thoeretical limit", so I used "scp -l <num>" where I specified <num> as 50 and 5 Mbit/s. This reduces speed perfectly, but gives me the same errors as in full speed.
I have also tried (with no good results):
* net.inet.tcp.rfc1323 (on and off)
* net.inet.tcp.tso (on and off)
* RCXSUM and TXCSUM on and off
* change from on-board bce0 / Broadcom NetXtreme II BCM5708 1000Base-T to em0 / Intel(R) PRO/1000 Network Connection Version - 6.7.3
* setting net.inet.tcp.recvbuf_max: 16777216
* setting net.inet.tcp.sendbuf_max: 16777216
One really strange thing is that I can make the copy continue (!) with full data transfer if I truss the ssh process on the destination machine. So if I truss with output to /dev/null in the background all the copy completes (!!!!).
This is a tcpdump on destination host of SCP's TCP connection when no data is transferred:
15:56:17.798079 IP sender_host.51296 > destination_host.ssh: . 8:9(1) ack 1 win 33304 <nop,nop,timestamp 1435178754 1291017157>
15:56:17.897407 IP destination_host.ssh > sender_host.51296: . ack 9 win 0 <nop,nop,timestamp 1291022157 1435178754>
15:56:22.797808 IP sender_host.51296 > destination_host.ssh: . 9:10(1) ack 1 win 33304 <nop,nop,timestamp 1435183754 1291022157>
15:56:22.897457 IP destination_host.ssh > sender_host.51296: . ack 10 win 0 <nop,nop,timestamp 1291027157 1435183754>
15:56:27.797913 IP sender_host.51296 > destination_host.ssh: . 10:11(1) ack 1 win 33304 <nop,nop,timestamp 1435188754 1291027157>
15:56:27.897508 IP destination_host.ssh > sender_host.51296: . ack 11 win 0 <nop,nop,timestamp 1291032157 1435188754>
15:56:32.798016 IP sender_host.51296 > destination_host.ssh: . 11:12(1) ack 1 win 33304 <nop,nop,timestamp 1435193754 1291032157>
15:56:32.897559 IP destination_host.ssh > sender_host.51296: . ack 12 win 0 <nop,nop,timestamp 1291037157 1435193754>
15:56:37.798119 IP sender_host.51296 > destination_host.ssh: . 12:13(1) ack 1 win 33304 <nop,nop,timestamp 1435198754 1291037157>
15:56:37.897610 IP destination_host.ssh > sender_host.51296: . ack 13 win 0 <nop,nop,timestamp 1291042157 1435198754>
Does enyone have an idea what this might be?
The error occurs when the receiving host is a FreeBSD 7.0 host (the sender can be 7.0 or 6.2 accoriding to my tests).
Thank you,
//Andreas
-------------------------------------------------------
Andreas Carbin
RUN Communications AB
http://www.run.se
E-mail: andreas.carbin at run.se
-------------------------------------------------------
More information about the freebsd-net
mailing list