buildup of Windows time_wait talking to fbsd 4.10
Lars Erik Gullerud
lerik at nolink.net
Mon Jan 10 18:23:45 PST 2005
On Mon, 10 Jan 2005, Len Conrad wrote:
> We have a windows mailserver that relays its outbound to a fbsd gateway. We
> changed to a different fbsd gateway running 4.10. Windows then began having
> trouble sending to 4.10. Windows "netstat -an" shows dozens of lines like
> this:
>
> source IP desitination IP
> ======================================================================
> TCP 10.1.16.3:1403 192.168.200.59:25 TIME_WAIT
[snip]
> Eventually, the windows SMTP logs line like "cannot connect to remote IP" or
> "address already in use" because no local tcp/ip sockets are available, we
> think.
>
> The new gateway/fbsd 4.10 "sockstat -4" shows no corresponding tcp
> connections when the Windows server is showing as above. On the fbsd 4.10
> machines, smtp logs, syslog, and dmesg show no errors.
>
> We switch the windows box to smtp gateway towards the old box/fbsd 4.7, all
> is cool.
OK, let me play a wild hunch here - if you look at netstat -na output on
the 4.7 machine (the one that works) when you are using that one, you see
a large number of connections in the TIME_WAIT state on that side, while
none on the Windows-server?
I had a similar situation with an application we use that also opens a
large number of TCP sessions from a Windows server to a FreeBSD server -
that suddenly stopped working when the application in question was
upgraded on the server it connected to. In our case, it turns it it was a
timing issue that changed on the new version of the application.
When a TCP connection is closing, one side of the connection typically
initiates the close, and sends a FIN,ACK packet to the other side. After
going through the steps of closing down the socket, the side that
initiated the close, will leave the socket in TIME-WAIT state for 2 MSL
(Maximum Segment Lifetime - which defaults to 2 mins, so 4 min wait) -
while the other end transitions to CLOSED state (and tears down the
socket) immediately, without this wait period. (The exception being if
both ends send FIN,ACK at the same time, in which case they both go to
TIME-WAIT).
What happened with in our case, on the old version of the application,
was that as soon as the client started to log off the session, the
server-side application (on the FreeBSD server) would initiate closing of
the TCP-session, and thereby being the originator (and getting a large
number of sessions in TIME-WAIT - which was not a problem for the BSD
box). While the Windows machine closed it's socket immediately and was
happy all the time.
However, after we upgraded the application, when the client logged off
at the application level, the server-side app would first take 2-3 seconds
to process various shutdown-related activities, and the client end (on
the Windows machine) got "impatient" and initiated the TCP session close
from it's side. Leaving all the TIME-WAIT sockets hanging on the Windows
side, rather than the FreeBSD side.
Now, newer versions of Windows have a ridiculously low number of max
simultaneous connections configured, and we started seeing exactly the
same kinds of errors you are describing, due to a large number of
TIME-WAIT sockets. We had to adjust the server-side application to tear
down the TCP socket first, THEN do its internal shutdown processing, in
order to not leave the Windows client in a jam. The alternative was to
increase the number of simultaneous connections on the Windows machine,
which involves some registry black magic, and we found this to be the
easier way out (then - we will probably hack the Windows regkeys if we
start seeing the issue again).
You didn't mention what MTA you are using, so I don't know if this is a
similar (application-level) issue, or if it's FreeBSD 4.10 that causes
some additional delay before initiating a TCP CLOSE, but either way, this
might be the behaviour you are observing, in which case you will need to
figure out how to get the FreeBSD side to tear down the connection, or
preferably you should look at tuning some registry stuff on your
Windows server - like setting the MSL time (default 2 minutes) to a much
lower value, and perhaps upping the no. of max simultaneous connections.
HTH,
/leg
More information about the freebsd-net
mailing list