Timekeeping [Was: Re: cvs commit: src/usr.bin/vmstat vmstat.c
src/usr.bin/w w.c]
Bruce Evans
bde at zeta.org.au
Sat Oct 22 03:17:27 PDT 2005
On Fri, 21 Oct 2005, Poul-Henning Kamp wrote:
> In message <01DFB595-5279-4D3A-BEDA-5F0285E9519B at xcllnt.net>, Marcel Moolenaar
> writes:
>
>>> I think we need the definition to consider if (process- ?)state is
>>> retained while the system is unconcious or not.
>>
>> I'm not sure. I think that might be what makes the definition
>> complex.
>
> Actually I don't think it does, it simplifies it.
I agree. Except for statistics progams, it is necessary to keep as much
history as practical; in particular, don't forgot the original boot time,
and keep supporting averages since boot in vmstat and systat.
> If a process survives across the "unconcious" period, then it follows
> that CLOCK_MONOTONIC cannot be reset to zero in relation to the
> unconcious period.
What is survival? Everything might be restarted virtually.
> But we are only just scratching the surface here, there are tons of
> ambiguities we need to resolve, for instance:
>
> select(...., {3m0s})
> suspend
> [ 2 minutes pass ]
> resume
>
> When does select time out ?
>
> One minute after the resume ?
>
> Three minutes after the resume ?
>
> Right after the resume with a special errno ?
As close as possible to 3m0s after select() was called.
There are many longstanding bugs in this area. I remember the following:
- the stillborn non-option APM_FIXUP_CALLTODO attempts to fix some of
them, by reducing all timeouts by the suspend time. (It was stillborn
because it is for the pre-callwheel implementation of timeouts but was
committed after callwheel timeouts, so it never compiled in any committed
version. The uselessness of APM_FIXUP_CALLTODO was hidden by not making
it a normal option.)
The problem of wrong timeouts after suspend is very old. Not fixing it
avoids thundering herds of timeout expiries after suspend.
- nanosleep(), select() and poll() use getnanouptime(), getmicrouptime() and
getmicrouptime() to not-so-carefully check that the timeout has expired
after they wake up (the wakeup is sometimes early or late due to minor
inaccuracies; when it is early, we detect that not-so-carefully and go
back to sleep; when it is late, we can't recover so we should request
the timeout to always be a little early so that we can be as close to
on time as possible). These syscalls should use non-get*() versions
and non-*uptime() versions so that they actually know if the timeout
expired. Using *uptime() doesn't work because it doesn't count suspend
time. Using non-*uptime() doesn't quite work either, since the system's
best idea of the real time may jump backwards. A monotonic clock that
jumps forwards by the suspend time is needed.
- realitimexpire() has the same bug as nanosleep() and friends. The very
name of this function shows that it should not be using *uptime().
According to setitimer(2), "ITIMER_REAL decrements in real time".
Using get*() in it is more justified than in nanosleep() since it is
lower level so its efficiency may be important.
> Some code should obviously know about the suspend/resume event,
> dhclient, wep, wpa, bgpd, sshd, just to mention a few
Code like cron should get enough notification be having timeouts expires
as soon as possible after resume (if they would have expired during the
suspend interval if there was no suspend). Such code can then check the
actual time on the correct clock like nanosleep() and friends to see if
a critical time has been reached.
Bruce
More information about the cvs-src
mailing list