kqueue periodic timer confusion
John Baldwin
jhb at freebsd.org
Fri Jul 13 14:42:07 UTC 2012
On Friday, July 13, 2012 8:22:13 am Davide Italiano wrote:
> On Thu, Jul 12, 2012 at 5:25 PM, John Baldwin <jhb at freebsd.org> wrote:
> > On Thursday, July 12, 2012 11:08:47 am Davide Italiano wrote:
> >> On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin <jhb at freebsd.org> wrote:
> >> > On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote:
> >> >> On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
> >> >> > On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
> >> >> > > On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
> >> >> > > > Hi,
> >> >> > > >
> >> >> > > > Sorry about this repost but I'm confused about the responses I
received
> >> >> > > > in my last post so I'm looking for some clarification.
> >> >> > > >
> >> >> > > > Specifically, I though I could use the kqueue timer as
essentially a
> >> >> > > > "drop in" replacement for linuxfd_create/read, but was surprised
that
> >> >> > > > the accuracy of the kqueue timer is much less than what I need
for my
> >> >> > > > application.
> >> >> > > >
> >> >> > > > So my confusion at this point is whether this is consider to be
a bug or
> >> >> > > > "feature"?
> >> >> > > >
> >> >> > > > Here's some test code if you want to verify the problem:
> >> >> > > >
> >> >> > > > #include <stdio.h>
> >> >> > > > #include <stdlib.h>
> >> >> > > > #include <string.h>
> >> >> > > > #include <unistd.h>
> >> >> > > > #include <errno.h>
> >> >> > > > #include <sys/types.h>
> >> >> > > > #include <sys/event.h>
> >> >> > > > #include <sys/time.h>
> >> >> > > >
> >> >> > > > int
> >> >> > > > main(void)
> >> >> > > > {
> >> >> > > > int i,msec;
> >> >> > > > int kq,nev;
> >> >> > > > struct kevent inqueue;
> >> >> > > > struct kevent outqueue;
> >> >> > > > struct timeval start,end;
> >> >> > > >
> >> >> > > > if ((kq = kqueue()) == -1) {
> >> >> > > > fprintf(stderr, "kqueue error!? errno = %s",
> >> >> > strerror(errno));
> >> >> > > > exit(EXIT_FAILURE);
> >> >> > > > }
> >> >> > > > EV_SET(&inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0,
20, 0);
> >> >> > > >
> >> >> > > > gettimeofday(&start, 0);
> >> >> > > > for (i = 0; i < 50; i++) {
> >> >> > > > if ((nev = kevent(kq, &inqueue, 1, &outqueue, 1,
NULL)) ==
> >> >> > -1) {
> >> >> > > > fprintf(stderr, "kevent error!? errno =
%s",
> >> >> > strerror(errno));
> >> >> > > > exit(EXIT_FAILURE);
> >> >> > > > } else if (outqueue.flags & EV_ERROR) {
> >> >> > > > fprintf(stderr, "EV_ERROR: %s\n",
> >> >> > strerror(outqueue.data));
> >> >> > > > exit(EXIT_FAILURE);
> >> >> > > > }
> >> >> > > > }
> >> >> > > > gettimeofday(&end, 0);
> >> >> > > >
> >> >> > > > msec = ((end.tv_sec - start.tv_sec) * 1000) + (((1000000
+
> >> >> > end.tv_usec - start.tv_usec) / 1000) - 1000);
> >> >> > > >
> >> >> > > > printf("msec = %d\n", msec);
> >> >> > > >
> >> >> > > > close(kq);
> >> >> > > > return EXIT_SUCCESS;
> >> >> > > > }
> >> >> > > >
> >> >> > > >
> >> >> > >
> >> >> > > What you are seeing is "just the way FreeBSD currently works."
> >> >> > >
> >> >> > > Sleeping (in most all of its various forms, and I've just looked
at the
> >> >> > > kevent code to verify this is true there) is handled by converting
the
> >> >> > > amount of time to sleep (usually specified in a timeval or
timespec
> >> >> > > struct) to a count of timer ticks, using an internal routine
called
> >> >> > > tvtohz() in kern/kern_time.c. That routine rounds up by one tick
to
> >> >> > > account for the current tick. Whether that's a good idea or not
(it
> >> >> > > probably was once, and probably not anymore) it's how things
currently
> >> >> > > work, and could explain the fairly consistant +1ms you're seeing.
> >> >> >
> >> >> > This is all true, but mostly irrelevant for his case. EVFILT_TIMER
> >> >> > installs a periodic callout that executes KNOTE() and then resets
itself (via
> >> >> > callout_reset()) each time it runs. This should generally be closer
to
> >> >> > regulary spaced intervals than something that does:
> >> >> >
> >> >>
> >> >> In what way is it irrelevant? That is, what did I miss? It appears
to
> >> >> me that the next callout is scheduled by calling timertoticks()
passing
> >> >> a count of milliseconds, that count is converted to a struct timeval
and
> >> >> passed to tvtohz() which is where the +1 adjustment happens. If you
ask
> >> >> for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
> >> >> There is some time, likely a small number of microseconds, that you've
> >> >> consumed of the current tick, and that's what the +1 in tvtohz() is
> >> >> supposed to account for according to the comments.
> >> >>
> >> >> The tvtohz() routine both rounds up in the usual way
(value+tick-1)/tick
> >> >> and then adds one tick on top of that. That seems not quite right to
> >> >> me, except that it is a way to g'tee that you don't return early, and
> >> >> that is the one promise made by sleep routines on any OS; those
magical
> >> >> "at least" words always appear in the docs.
> >> >>
> >> >> Actually what I'm missing (that I know of) is how the scheduler works.
> >> >> Maybe the +1 adjustment to account for the fraction of the current
tick
> >> >> you've already consumed is the right thing to do, even when that
> >> >> fraction is 1uS or less of a 1mS tick. That would depend on scheduler
> >> >> behavior that I know nothing about.
> >> >
> >> > Ohhhhh. My bad, sorry. You are correct. It is a bug to use +1 in
this
> >> > case. That is, the +1 makes sense when you are computing a one-time
delta
> >> > for things like nanosleep(). It is incorrect when computing a periodic
> >> > delta such as for computing the interval for an itimer (setitimer) or
> >> > EVFILT_TIMER().
> >> >
> >> > Hah, setitimer()'s callout (realitexpire) uses tvtohz - 1:
> >> >
> >> > sys/kern/kern_time.c:
> >> >
> >> > /*
> >> > * Real interval timer expired:
> >> > * send process whose timer expired an alarm signal.
> >> > * If time is not set up to reload, then just return.
> >> > * Else compute next time timer should go off which is > current time.
> >> > * This is where delay in processing this timeout causes multiple
> >> > * SIGALRM calls to be compressed into one.
> >> > * tvtohz() always adds 1 to allow for the time until the next clock
> >> > * interrupt being strictly less than 1 clock tick, but we don't want
> >> > * that here since we want to appear to be in sync with the clock
> >> > * interrupt even when we're delayed.
> >> > */
> >> > void
> >> > realitexpire(void *arg)
> >> > {
> >> > struct proc *p;
> >> > struct timeval ctv, ntv;
> >> >
> >> > p = (struct proc *)arg;
> >> > PROC_LOCK(p);
> >> > kern_psignal(p, SIGALRM);
> >> > if (!timevalisset(&p->p_realtimer.it_interval)) {
> >> > timevalclear(&p->p_realtimer.it_value);
> >> > if (p->p_flag & P_WEXIT)
> >> > wakeup(&p->p_itcallout);
> >> > PROC_UNLOCK(p);
> >> > return;
> >> > }
> >> > for (;;) {
> >> > timevaladd(&p->p_realtimer.it_value,
> >> > &p->p_realtimer.it_interval);
> >> > getmicrouptime(&ctv);
> >> > if (timevalcmp(&p->p_realtimer.it_value, &ctv, >)) {
> >> > ntv = p->p_realtimer.it_value;
> >> > timevalsub(&ntv, &ctv);
> >> > callout_reset(&p->p_itcallout, tvtohz(&ntv) -
1,
> >> > realitexpire, p);
> >> > PROC_UNLOCK(p);
> >> > return;
> >> > }
> >> > }
> >> > /*NOTREACHED*/
> >> > }
> >> >
> >> > Paul, try this patch for sys/kern/kern_event.c. It uses the same
approach as
> >> > seitimer() above:
> >> >
> >> > Index: kern_event.c
> >> > ===================================================================
> >> > --- kern_event.c (revision 238365)
> >> > +++ kern_event.c (working copy)
> >> > @@ -538,7 +538,7 @@ filt_timerexpire(void *knx)
> >> >
> >> > if ((kn->kn_flags & EV_ONESHOT) != EV_ONESHOT) {
> >> > calloutp = (struct callout *)kn->kn_hook;
> >> > - callout_reset_curcpu(calloutp, timertoticks(kn-
>kn_sdata),
> >> > + callout_reset_curcpu(calloutp, timertoticks(kn-
>kn_sdata) - 1,
> >> > filt_timerexpire, kn);
> >> > }
> >> > }
> >> >
> >> > --
> >> > John Baldwin
> >> > _______________________________________________
> >> > freebsd-hackers at freebsd.org mailing list
> >> > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> >> > To unsubscribe, send any mail to "freebsd-hackers-
unsubscribe at freebsd.org"
> >>
> >> John,
> >> I don't think it's good to decrease by a unit the 'ticks' you pass to
> >> callout_reset_* KPI.
> >> If this have to be fixed it should be fixed at the callout level and
> >> not at the consumer level. In other words, subsystems that makes use
> >> of callout_reset_* should not deal with the inherent limitations of
> >> callout precision, as it is right now.
> >
> > Given that you are reworking callout already, it would seem a bit odd
> > to rework callout a separate time just to fix this bug. A simple fix
> > like this (which follows the same pattern we already use for setitimer())
> > is something we can easily merge back to 8 and 9. Reworking callout in
> > a different way than you are already doing, OTOH, is not something we can
> > merge trivially.
> >
> > --
> > John Baldwin
>
> I understand what you mean. Indeed I hadn't thought about the
> difficulties of merging that work back. Only a thing: if I'd were you
> before committing I'd add a comment explaining the reasons of the
> negative correction in the timeout value passed.
Ok. Likely what I will do is just point the reader at the comments for
realitexpire() which already cover this case in detail.
--
John Baldwin
More information about the freebsd-hackers
mailing list