Long Day's Journey into <Bleep>
Jerome Herman
jherman at dichotomia.fr
Fri Jun 10 23:53:17 UTC 2011
On 09/06/2011 02:56, Gary Kline wrote:
> Well, people,
>
> It's been a long, long century. I've been down for 5 days.
> Couldn't understand _why_ I couldn't ping anywhere [expect the
> Server itself]. Finally, tho, it became more and more likely that
> my FreeBSD was fine ... even tho I kept stripping the most likely
> problem points. My large 16-port LinkSys router was either *it* or
> it was some kind of bug unknown to geekdom. After a friend bought
> me a new (and tiny) 8-port switch, yes! I could ping everywhere.
>
> I'm still bringing back the dozens of things I removed from ethic.
> And testing new ideas. But I have a general question: have any of
> you wizards who run your own domains or otherwise use a switch [or
> hub] *ever* had it just-quit?! It is solid-state. Yes, the box is
> within my feet/foot reach. I have accidently kicked it i suppose,
> but still.
>
> After wandering in the wilderness for 5 days,<<mmph>>, dunno.
>
> gary
>
> PS: yes, this is a serious question. 1) I like things-Cisco, and
> LinkSys. I just bought this switch about 2.5 years ago, so I really
> am looking for feedback.
>
> PPS: Another question to ask about upgrading is next.
>
>
I had a lot of faulty switch, either going all out by themselves or
doing stranger things.
The most common thing is of course the defective port - One port will
start spurting errors and eventually die, with little to no impact on
the rest of the ports. (easy to detect : ping on one port vs ping on an
other port)
Another common error is the "I want full duplex" error. The switch will
announce itself as full duplex before falling back to half duplex
immediately. Most of the time the port will act fine, but under heavy
load you will have a nice panel of network error happening one after the
other. (Also easy to detect : force connected elements to half duplex
for test, if everything starts working again you got your problem)
Of course there is also the problem with "not so anti-loopback" switches
- that cause packets to go round and round and round and round. (ping
will be very inconsistent in its timing, going from a few ms to entire
seconds)
On pure level 2 switches I had few other problems - though two took me
days to figure out :
1 - Faulty power source : The switch could simply not bear full load
anymore. Various errors, packet corruption, DHCP errors, misrouting and
so on. When tested port by port, functions by functions the switch would
work wonders. I spent an entire week testing every boxes for
virus/trojan/rootkits/DHCP rogue servers. The problem was only solved
after I changed every element of the network one by one. Final
diagnostic made by Netgear
2 - Memory corruption (suspected, not validated) : Everything would work
fine from 9 A.M to 3 to 4 P.M for an entire branch, then the network
would slow to a crawl. Rebooting the switches would solve the problem
for a while and then it would be nightmare again after less than an
hour. Some boxes would complain about duplicate IP addresses. We managed
to find that most of the defective IP addresses converged to just one
switch - from there we theorized that there was a problem with the ARP
cache of the switch that would make it explode after a sufficient number
of updates (since there was a lot of VPN connection made after 3PM, we
imagined that it was the triggering factor). We took of the switch and
replaced it, but no light came from the manufacturer to either confirm
or infirm our theories.
Jerome
More information about the freebsd-questions
mailing list