Diagnose co-location networking problem

Bill Vermillion bv at wjv.com
Wed Dec 27 06:07:22 PST 2006


Earlier in the linear time track, on approximately Tue, Dec 26, 2006 at 18:45 ,
Stephan Wehner divulged this public information:

> I just got a server and put it in a co-location.

> It runs RELEASE FreeBSD 6.1-RELEASE #0, pound, lighttpd and ruby
> on rails.

> Most of the times I find the server responds nicely. But periodically
> it doesn't respond properly when accessing its webpages: Type URL in
> browser, hit return, no page appears. Try again and again and after a
> few times it appears.

That sounds like a transport problem between your machine and the
server.  It could be anywhere on the link.  Is the colo doing any
rate-limiting?

I see this now and then with dropped packets from my machine to my
servers.  And I control the colo with a rack we have in the Level 3
space so I can trace the problems.  One of the strangest - with
intermittent long delays in packet returns made me think I had a
problem with Level 3.

I contact the NOC in the Denver area, and they checked, and saw no
problems on their net, but they checked further, and what was
happening was the my packets were a different route back to me than
going to the server. [this is not a bug but it doesn't happen very
often - usually when someone screws things up in routers].

Packets left Orlando via Sprint, went to Texas, crossed over to
Level 3 there, back to Orlando and my rack, and then they would go
out onto Level 3, and then go to a Sprinr router in Washington
and come back through Atlanta.   

So the first thing I'd suggest is checking your connections via
traceroute. And >>IF<< your provider does not block RECORD ROUTE
and if the hop count is under 8 - you can try  ping -R .

That will show you the IP addresses from which the packets are
leaving, as opposed to the addresses they are going to.

> Other sites are accessible during these problematic times. Also, in
> parallel I am connected to the server through ssh, and there are not
> problems with that. Even during those times when the web pages don't
> appear, I can type and see the result.

When you way 'other sites are accessible' do you mean other sites
on your machine, or other sites on the 'net.  And what about other
sites that are located in that colo that you don't control?

> Before installing it at the datacentre, the server was working without
> problems on the local network.

Well there is always the chance the moving it created a problem -
something shook loose.  I've had the reverse when I was heading up
a recording studio.  Some of the early digital equipment we had
would get flaky.  We'd ship it by FedEX to the factory, and they'd
find nothing, but change out something that may have caused it.

Three times FedEX cured the problem in shipping - and each time
another piece was changed.  Finally - on number 4 - it worked at
the factory, but they changed ALL the internal cables - and that
fixed it permanently.  It was the vibration in shipment that
temporarily fixed things - but shipping an item out wasn't what I
call a good fix :-)

> So I am thinking the problem may be with the co-location operation.

As above - it could be the colo - or it could be your network
connections to the colo.


Bill
-- 
Bill Vermillion - bv @ wjv . com


More information about the freebsd-net mailing list