cron pile up! Lot's of "cron: running job (cron)"
Kevin Kinsey
kdk at daleco.biz
Thu Dec 13 14:41:10 PST 2007
Rudy wrote:
> Dan Nelson wrote:
>> In the last episode (Dec 03), Support (Rudy) said:
>>> Below is part of the cron... Seems like any random cronjob can get
>>> clogged up... load varies from 0.2 to 1.0 on this dual-core box. I
>>> rebooted the box -- cron's continue to slowly pile up.
>>>
>>> One of the cronjobs that is 'stuck' is this one:
>>> /root/bin/raid-status.sh
>>> which can be found here:
>>> http://www.monkeybrains.net/~rudy/example/raid_status.html
>>>
>>> Forgot to mention, I am running:
>>> 6.2-STABLE FreeBSD 6.2-STABLE #3: Thu May 31 01:18:15 PDT 2007
>>>
>>> OH, ps shows this:
>>> 58383 ?? D 0:00.00 cron: running job (cron)
>>> 58384 ?? IVs 0:00.00 cron: running job (cron)
>>
>> In general, when troubleshhoting, "ps axlw" is a more useful command.
>> It adds among other columns, the MWCHAN one, which details exactly why
>> a process is stuck in the D state.
>> Anyway, cron does a fork and then a vfork creating a child and a
>> grandchild process. I'm sort of surprised at the amount of code
>> between vfork and exec in the grandchild in
>> /src/usr.sbin/cron/cron/do_command.c . Since process 3 is actually
>> using process 2's address space one must be extremely careful not to
>> modify static variables or change other global state that would affect
>> the parent once it resumes execution, and all the logging,
>> environment-setting, and user-context calls are certain to mess with
>> the parent's state, especially with nss modules in the mix. I'd
>> personally recompile cron with all vforks replaced with fork and see
>> what happens.
>>
>> It couldn't hurt to update to a newer kernel version along the RELENG_6
>> branch as a test, I guess. Note that your uname will change to
>> 6.3-PRERELEASE, but apart from causing lsof to complain, you should be
>> okay.
>>
>>> /var/log/cron has this entry:
>>> Dec 3 20:16:00 pita /usr/sbin/cron[58384]: (root) CMD
>>> (/root/bin/raid-status.sh CRON)
>>>
>>> BUT there is no 'raid-status.sh' stuck in the "ps axw". Seems like
>>> the vfork set off the cronjob, it ran, but then cron didn't 'stop'
>>> executing. Any debuggin tips?
>>
>> Can you tell if raid-status.sh ever ran? i.e. is process 2
>> stuck at the start of vfork or at the end.
>
> I added this line to the top of my cronjob:
> logger -t DEBUG "$0: $$"
> and cron seems stuck BEFORE the script is ever run. Whether it sticks
> or not appears random, as plenty of log lines are showing up with the
> output of the logger command in my /var/log/messages.
>
> # tail /var/log/messages
> Dec 13 11:16:00 pita DEBUG: /root/bin/raid-status.sh: 64414
> Dec 13 12:00:00 pita DEBUG: /root/bin/raid-status.sh: 80115
> Dec 13 12:00:00 pita DEBUG: /root/bin/raid-status.sh: 80119
> Dec 13 12:11:00 pita DEBUG: /root/bin/raid-status.sh: 84283
>
> Here is the ps output:
> # ps axlw
> UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
> 0 85939 82253 0 8 0 2148 1560 ppwait D ?? 0:00.00
> cron: running job (cron)
> 0 85940 85939 0 4 0 2148 1560 sbwait IVs ?? 0:00.00
> cron: running job (cron)
> # grep 85940 /var/log/cron
> Dec 13 12:16:00 pita /usr/sbin/cron[85940]: (root) CMD
> (/root/bin/raid-status.sh CRON)
>
> - Rudy
Just as a favor to an old coot, could you change your
crontab entry to read like this:
*/16 * * * * "/root/bin/raid-status.sh"
and see if it makes any difference?
Kevin Kinsey
--
There are never any bugs you haven't found yet.
More information about the freebsd-questions
mailing list