Debugging dropped shell connections over a VPN
Paul Keusemann
pkeusem at visi.com
Tue Jul 26 18:35:20 UTC 2011
On 07/26/11 08:05, Gary Palmer wrote:
> On Tue, Jul 26, 2011 at 06:53:59AM -0500, Paul Keusemann wrote:
>> Again, sorry for the sluggish response.
>>
>> On 07/20/11 15:15, Gary Palmer wrote:
>>> On Tue, Jul 12, 2011 at 02:26:34PM -0500, Paul Keusemann wrote:
>>>> On 07/07/11 14:39, Chuck Swiger wrote:
>>>>> On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote:
>>>>>> My setup is something like this:
>>>>>> - My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris
>>>>>> machines running various OS versions.
>>>>>> - My gateway / firewall machine is running FreeBSD-8.1-RELEASE-p1 with
>>>>>> ipfw, nat and racoon for the firewall and VPN.
>>>>>>
>>>>>> The problem is that rlogin, ssh and telnet connections over the VPN get
>>>>>> dropped after some period of inactivity.
>>>>> You're probably getting NAT timeouts against the VPN connection if it is
>>>>> left idle. racoon ought to have a config setting called natt_keepalive
>>>>> which sends periodic keepalives-- see whether that's disabled.
>>>>>
>>>>> Regards,
>>>> Thanks for the suggestions Chuck, sorry it's taken so long to respond
>>>> but I had to reconfigure and rebuild my kernel to enable IPSEC_NAT_T in
>>>> order to try this out.
>>>>
>>>> One thing that I did not explicitly mention before is that I am routing
>>>> a network over the VPN.
>>> Hi Paul,
>>>
>>> Even if you are not being NAT'd on the VPN there may be a firewall (or
>>> other active network component like a load balancer) with an
>>> overflowing state table somewhere at the remote end. We see this
>>> frequently where I work with customer networks and the firewall/VPN/network
>>> admin denies that its a time out issue so there is likely some device in
>>> the network that has a state table and if the connection is idle for a
>>> few minutes it gets dropped.
>> Hmmm, this seems likely. Have you had any luck in finding the culprit
>> and resolving the problem?
> Unfortunately no. We know the problem exists but as a vendor we have
> very little success in getting the customer to identify the problematic
> device inside their network as it only seems to affect our connections
> to them when we are helping them with problems, so there is almost
> always something more important going on and the timeout issue gets put
> on the back burner and forgotten. We've worked around it in some
> places by using the ssh 'ServerAliveInterval' directive to make ssh
> send packets and keep the session open even if we're idle, but that
> doesn't always work.
OK, I found the ClientAliveInterval, and ClientAliveCountMax setting in
the ssh_config man page. I assume these are what you are referring to.
I tried setting ClientAliveInterval to 15 seconds with
ClientAliveCountMax set to 3 and this seems to help. I've only tried
this a couple of times but I have seen an ssh session stay alive for
over an hour. The bad news is that the sessions are still getting
dropped, at least now I know when it happens. Now I'm getting the
following message:
Received disconnect from 10.64.20.69: 2: Timeout, your session not
responding.
From a quick perusal of the openssh source, it is not obvious whether
this message is coming from the client or the server side. Initially,
because the keep alive timer is a server side setting, I assumed the
message was coming from the server side but if the session is not
responding how is the message getting to the client? If it is a client
side problem, then I have much more flexibility to fix. All I can do is
whine about server side problems.
Paul
> Gary
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>
--
Paul Keusemann pkeusem at visi.com
4266 Joppa Court (952) 894-7805
Savage, MN 55378
More information about the freebsd-net
mailing list