pf state disappearing [ adaptive timeout bug ]
Matthew Grooms
mgrooms at shrew.net
Fri Jan 22 22:02:19 UTC 2016
On 1/22/2016 3:35 PM, Nick Rogers wrote:
> On Thu, Jan 21, 2016 at 11:44 AM, Matthew Grooms <mgrooms at shrew.net> wrote:
>
>> # pfctl -si
>> Status: Enabled for 0 days 02:25:41 Debug: Urgent
>>
>> State Table Total Rate
>> current entries 77759
>> searches 483831701 55352.0/s
>> inserts 825821 94.5/s
>> removals 748060 85.6/s
>> Counters
>> match 27118754 3102.5/s
>> bad-offset 0 0.0/s
>> fragment 0 0.0/s
>> short 0 0.0/s
>> normalize 0 0.0/s
>> memory 0 0.0/s
>> bad-timestamp 0 0.0/s
>> congestion 0 0.0/s
>> ip-option 6655 0.8/s
>> proto-cksum 0 0.0/s
>> state-mismatch 0 0.0/s
>> state-insert 0 0.0/s
>> state-limit 0 0.0/s
>> src-limit 0 0.0/s
>> synproxy 0 0.0/s
>>
>> # pfctl -st
>> tcp.first 120s
>> tcp.opening 30s
>> tcp.established 86400s
>> tcp.closing 900s
>> tcp.finwait 45s
>> tcp.closed 90s
>> tcp.tsdiff 30s
>> udp.first 600s
>> udp.single 600s
>> udp.multiple 900s
>> icmp.first 20s
>> icmp.error 10s
>> other.first 60s
>> other.single 30s
>> other.multiple 60s
>> frag 30s
>> interval 10s
>> adaptive.start 90000 states
>> adaptive.end 120000 states
>> src.track 0s
>>
>> I think there may be a problem with the code that calculates adaptive
>> timeout values that is making it way too aggressive. If by default it's
>> supposed to decrease linearly between %60 and %120 of the state table max,
>> I shouldn't be loosing TCP connections that are only idle for a few minutes
>> when the sate table is < %70 full. Unfortunately that appears to be the
>> case. At most this should have decreased the 86400s timeout by %17 to
>> 72000s for established TCP connections.
> That doesn't make sense to me either. Even if the math is off by a factor
> of 10 the state should live for about 24 minutes.
>
>> I've tested this for a few hours now and all my idle SSH sessions have
>> been rock solid. If anyone else is scratching their head over a problem
>> like this, I would suggest disabling the adaptive timeout feature or
>> increasing it to a much higher value. Maybe one of the pf maintainers can
>> chime in and shed some light on why this is happening. If not, I'm going to
>> file a bug report as this certainly feels like one.
>>
> Did you go with making adaptive timeout less aggressive or disable it
> entirely? I would think that if adaptive timeout is really that broken more
> people would notice this problem, especially myself since I have many
> servers running a very short tcp.established timeout, but the fact that you
> are noticing this kind of weirdness has me concerned about how the adaptive
> setting is affecting my environment.
I increased the value to 90K for the 10K limit. Yes, it's concerning.
Today I setup a test environment at about 1/10th the connections to see
if I could reproduce the issue on a smaller scale, but had no luck. I'm
trying to find a cmd line test program that will generate enough tcp
connections so I can reproduce it on a similar scale to my production
environment. So far I haven't found anything that will do the trick. I
may end up rolling my own. I'll reply back to the list if I can find a
way to reproduce this.
Thanks again,
-Matthew
More information about the freebsd-net
mailing list