Periodic jobs lockf timeout

Borja Marcos borjam at sarenet.es
Tue Oct 24 15:06:09 UTC 2017



> On 24 Oct 2017, at 16:41, Alan Somers <asomers at freebsd.org> wrote:
> 
> On Tue, Oct 24, 2017 at 3:07 AM, Borja Marcos <borjam at sarenet.es> wrote:
> Are you talking about the lockf in /usr/sbin/periodic?  It already has
> a timeout of 0, which should prevent overlapping periodic jobs.  Or is
> there some other lockf involved?  Without knowing which lockf you're
> talking about, I can't understand your problem.

Sorry, my explanation was awful now that I read it again. Yes, I mean the lockf in /usr/sbin/periodic. And
no, I didn’t mean that jobs overlap (certainly they don’t thanks to the lockf) but they can pile up. Today I had
a machine with three daily jobs waiting to start because the first one had been running for four days (a combination
of lots of files and datasets, heavy system load, ZFS pool almost full…) 

The problem with a timeout of 0 is that it’s unlimited. In case something is wrong you can end up with a growing queue of
daily periodic jobs waiting to run. Imagine you have a very high system load for several days and for some reason the daily job
won’t complete. Next day a new daily job will try to start but it will have to wait for the first one to finish. And so on.

The proposal is to replace the “0” timeout for lockf with a sane timeout so that it will attempt to run it, but give up in
case it can’t be done in a reasonable time. The timeout shouldn’t be long actually. If periodic must wait in order to
start a job it means that you have a serious performance problem and it’s pointless to keep your machine doing “find”
24/7.

Given the nature of the periodic jobs I don’t think it should be a problem to attempt to run them in a best effort basis
rather than guaranteing that they will eventually even if awfully late.

I would add a configurable timeout for /usr/sbin/periodic. I think it’s better done with a different variable for each 
class and their default values can be 0 so that nothing changes.

daily_start_timeout
weekly_start_timeout
monthly_start_timeout



> The anticongestion_sleeptime variable is unrelated to lockf.

Understood, I stand corrected. I assumed it was. 

Hope it’s better now. It’s pretty easy to do but I’m interested on the opinions on this matter :)


Thank you!





Borja.


More information about the freebsd-security mailing list