FreeBSD boxes as a 'router'...

Wed Nov 21 02:16:57 UTC 2012

I may be misstating.

Specifically under high burst floods either routed or being dropped by pf we would see the system go unresponsive to user-level applications / SSH for example.

The system would still function but it was inaccessible. To clarify as well this was any number of floods or attacks to any ports, the behavior remained. (These were not SSH ports being hit)

Now we did a lot of sysctl resource tuning to correct this with some floods but high rate would still cause the behavior. Other times the system would simply drop all traffic (like a buffer filled or max connections) but it was not either case. 

The attacks were also well within bandwidth capabilities for the pipe and network gear.

All of these issues stopped upon adding polling or the overall threshold was increased tremendously with polling.

Yet, polling has some downsides not necessarily due to FreeBSD but application issues. Haproxy is one example where we had handshake/premature connections terminated with polling. Those issues were not present with polling disabled. 

So that is my reasoning for saying that it was perfect for some things and not for others.

In the end, we spent years tinkering and it was always satisfactory but never perfect. Finally we grew to the point of replacing the edge with MX80's and left BSD to load balancing and the like. This finally resolved all issues for us.

Albeit, we were a DDoS mitigation company running high PPS and lots of bursting. BSD was beautiful until we ended up needing 10Gps+ on the edge and it was time to go Juniper.

I still say BSD took us from nothing to a $30M company. So despite something's requiring tinkering with I think it is still worth the effort to put in the testing to find what is best for your gear and environment.

I got off-track but we did find one other thing. We found ipfw did seem to reduce load on the interrupts (likely because we couldn't do near the scrubbing with it vs pf) at any rate less filtering may also fix the issue with the op. 

Your forwarding - we found doing forwarding via a simple pf rule and a GRE tunnel to an app server or by using a tool like haproxy on the router itself seemed to reduce a large majority of our original stability issues (verses pure fw-based packet forwarding)

*I also agree because as I mentioned in a previous email... (To me) our overall PPS seemed to decrease from FBSD 7 to 9. No idea why but we seemed to begin having less effect with polling as we seemed to get with polling on 7.4.

Not to say that this wasn't due to error on our part  or some issue with the Juniper switches but we seemed to just run into more issues with newer releases when it came to performance with Intel 1Gbps NIC's. this later caused us to move more app servers to Linux because we never could get to the bottom of some of those things. We do intend to revisit BSD with our new CDN company to see if we can restandardize it for high volume traffic servers.

Best,
Kevin 

On Nov 20, 2012, at 7:19 PM, "Adrian Chadd" <adrian at freebsd.org> wrote:

> Ok, so since people are talking about it, and i've been knee deep in
> at least the older intel gige interrupt moderation - at maximum pps,
> how exactly is the interrupt moderation giving you a livelock
> scenario?
> 
> The biggest benefit I found when doing some forwarding work a few
> years ago was to write a little daemon that actually sat there and
> watched the interrupt rates and packet drop rates per-interface - and
> then tuned the interrupt moderation parameters to suit. So at the
> highest pps rates I wasn't swamped with interrupts.
> 
> I think polling here is hiding some poor choices in driver design and
> network stack design..
> 
> 
> 
> adrian