Some days, it doesn't pay to upgrade ...

Sat Mar 3 03:13:29 UTC 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Based on the suggestion by someone on this list, I setup a screen session with 
top running, to watch things ... again, after 3 days, the server goes 'out of 
process' ... this time, of course, I could get in to look around and kill off 
processes ...

from what I can tell, a process that all it does is:

ping -c 1 <host> with a 300 sec timeout that runs once a minute started to 'run 
over top of' each other out of cron ... the host that it is pinging is on the 
same switch and has been running fine for 20 days now, and it wasn't until I 
did the last upgrade on teh server causing the problems that these problems 
started ...

Coincidence? :)

I'm going to fix the script so that it doesn't try to run over itself ... 
anyone konw of a problem with the fxp driver in 6-STABLE that might cause the 
ping to hang?

- --On Thursday, March 01, 2007 09:51:13 +1100 Antony Mawer 
<fbsd-stable at mawer.org> wrote:

> On 27/02/2007 11:59 PM, Marc G. Fournier wrote:
>> After 155 days of problem free uptime, I upgraded my 6-STABLE system the
>> other  day to the latest cvsup ... 3 days later, the whole thing hung solid
>> with:
>>
>>
>> Feb 27 04:32:49 mars uptimec: The server requested that we do a new login
>> Feb 27 04:33:00 mars kernel: maxproc limit exceeded by uid 0, please see
>> tuning(7) and login.conf(5).
>> Feb 27 04:33:10 mars kernel: maxproc limit exceeded by uid 60, please see
>> tuning(7) and login.conf(5).
>>
>> Stupid question: why isn't there some mechanism that prevents new processes
>> from starting up, instead of locking up the whole server?  I'm not asking
>> for  the evilness of Linux, where it arbitrarily kills off existing
>> processes, but  if maxproc is hit, why continue to try and start up new ones?
>
> What do you define as 'hung solid'? You are unable to get in via SSH? Or at a
> console via iLO/etc?
>
> I've seen this on some of our 6.0-RELEASE machines (along with maxpipekva
> exhausted errors), and you can't SSH in from that point... because sshd forks
> to handle the connection, and all available process slots are used up.
>
> I've thought about writing a background daemon to monitor the logs for signs
> of this (or even to just try and create a short-lived child process by
> fork()ing every 5 minutes or so), and dump information to disk then reboot
> the system when this occurs... it's a work-around for something that
> "shouldn't happen", but it does anyway... once I'm able to identify _what_ is
> causing the build-up of processes, then I might be able to do something about
> killing them...!!!
>
>
> It's quite deceptive from an end-user point of view, because things like
> Apache that are already keep running, so all they see are strange bits and
> pieces that don't work... and as always, its one of those things that only
> happens on some clients machines, but never on any of our test machines...
>
> --Antony
>
>
> PS. I haven't disappeared off the face of the earth.. though close.. my
> fiance and I have been busy planning the wedding, and wound up buying a house
> at the same time..!! Will catch up shortly once I get a chance to come up for
> air!!

- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy at hub.org                              MSN . scrappy at hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFF6Ofd4QvfyHIvDvMRAmoqAJ9ka8ZQxq0Ciidyy4R60bTmYfxeggCeLz7i
/De9C0Hmdqb22nErxhyUaZA=
=Seo0
-----END PGP SIGNATURE-----