[RFC/RFT] calloutng
Davide Italiano
davide at freebsd.org
Fri Dec 14 14:42:20 UTC 2012
On Fri, Dec 14, 2012 at 3:21 PM, Oliver Pinter <oliver.pntr at gmail.com> wrote:
> Hi!
> 635 - return tticks;
> 636 + getbinuptime(&pbt);
> 637 + bt.sec = data / 1000;
> 638 + bt.frac = (data % 1000) * (uint64_t)1844674407309000LL;
> 639 + bintime_add(&bt, &pbt);
> 640 + return bt;
> 641 }
> What is this 1844674407309000LL constant?
> 783 @@ -275,7 +288,7 @@
> 784 do {
> 785 th = timehands;
> 786 gen = th->th_generation;
> 787 - bintime2timeval(&th->th_offset, tvp);
> 788 + Bintime2timeval(&th->th_offset, tvp);
> 789 } while (gen == 0 || gen != th->th_generation);
> 790 }
> 791
> Capital B is there possible a typo?
Hi Oliver,
thanks for reporting. Yes, both are typos.
The costant is /* 18446744073709 = int(2^64 / 1000000) */ used to
convert from timeval to bintime.
> On 12/14/12, Davide Italiano <davide at freebsd.org> wrote:
>> On Fri, Dec 14, 2012 at 1:57 PM, Davide Italiano <davide at freebsd.org>
>> wrote:
>>> On Fri, Dec 14, 2012 at 7:41 AM, Luigi Rizzo <rizzo at iet.unipi.it> wrote:
>>>> On Fri, Dec 14, 2012 at 12:12 AM, Davide Italiano <davide at freebsd.org>
>>>> wrote:
>>>>> Hi.
>>>>> This patch takes callout(9) and redesign the KPI and the
>>>>> implementation. The main objective of this work is making the
>>>>> subsystem tickless. In the last several years, this possibility has
>>>>> been discussed widely (http://markmail.org/message/q3xmr2ttlzpqkmae),
>>>>> but until now noone really implemented that.
>>>>> If you want a complete history of what has been done in the last
>>>>> months you can check the calloutng project repository
>>>>> http://svnweb.freebsd.org/base/projects/calloutng/
>>>>> For lazy people, here's a summary:
>>>> thanks for the work and the detailed summary.
>>>> Perhaps it would be useful if you could provide a few high level
>>>> details on the use and performance of the new scheme, such as:
>>>> - is the old callout KPI still available ? (i am asking because it would
>>>> help maintaining third party kernel modules that are expected to
>>>> work on different FreeBSD releases)
>>> Obviously the old KPI is still available. callout(9) is a very popular
>>> interface and I don't think removing the old interface is a good idea,
>>> because could make unhappy some vendor when its code doesn't build
>>> anymore on FreeBSD.
>>>> - do you have numbers on what is the fastest rate at which callouts
>>>> can be fired (e.g. say you have a callout which increments a
>>>> counter and schedules the next callout in (struct bintime){0,1} ) ?
>> Right now, all the services rely on the old interface. This means they
>> cannot be fired before 1 tick has elapsed, e.g. considering hz = 1000
>> on most of the machines, 1 millisecond.
>> Now that nanosleep() relies on the new interface, we measured 4-5
>> microseconds latency for the processing before the callout is actually
>> fired. I can't say if we can still lower this value, but I cannot
>> imagine, for now, a consumer that actually request a shorter timeout.
>>>> - is there a possibility that if callout requests are too close to each
>>>> other (e.g. the above test) the thread dispatching callouts will
>>>> run forever ? if so, is there a way to make such thread yield
>>>> after a while ?
>> Most of the processing is still done in a SWI thread, "at a later
>> time", so I don't think this is a problem.
>>>> - since you mentioned nanosleep() poll() and select() have been
>>>> ported to the new callout, is there a way to guarantee that user
>>>> using these functions with a very short timeout are actually
>>>> descheduled as opposed to "interval too short, don't bother" ?
>>>> - do you have numbers on how many calls per second we can
>>>> have for a process that does
>>>> for (;;) { nanosleep(min_value_that_causes_descheduling);
>> I don't follow you here.
>>>> I also have some comments on the diff:
>>>> - can you provide a diff -p ?
>>>> - for several functions the only change is the name of an argument
>>>> from "busy" to "us". Can you elaborate the reason for the change,
>>>> and whether "us" means microseconds or the pronoun ?)
>>> Please see r242905 by mav at .
>>>> Finally, a more substantial comment:
>>>> - a lot of functions which formerly had only a "timo" argument
>>>> now have "timo, bt, precision, flags". Take seltdwait() as an example.
>>> seltdwait() is not part of the public KPI. It has been modified to
>>> avoid code duplication. Having seltdwait() and seltdwait_bt(), i.e.
>>> two separate functions, even though we could share most of the code is
>>> not a clever approach, IMHO.
>>> As I told before, seltdwait() is not exposed so we can modify its
>>> argument without breaking anything.
>>>> It seems that you have been undecided between two approaches:
>>>> for some of these functions you have preserved the original function
>>>> that deals with ticks and introduced a new one that deals with the
>>>> bintime,
>>>> whereas in other cases you have modified the original function to add
>>>> "bt, precision, flags".
>>> I'm not. All the functions which are part of the public KPI (e.g.
>>> condvar(9), sleepq(9), *) are still available. *_flags variants have
>>> been introduced so that consumers can take advantage of the new
>>> 'precision tolerance mechanism' implemented. Also, *_bt variants have
>>> been introduced. I don't see any "undecision" between the two
>>> approaches.
>>> Please note that now the callout backend deals with bintime, so every
>>> time callout_reset_on() is called, the 'tick' argument passed is
>>> silently converted to bintime.
>>>> I would suggest a more uniform approach, namely:
>>>> - preserve all the existing functions (T) that take a timeout in
>>>> ticks;
>>>> - add a new set of corresponding functions (BT) that take
>>>> bt, precision, flags _instead_ of the ticks
>>>> - the functions (T) make immediately the conversion from ticks to
>>>> bintime(s), using macros or inline
>>>> - optionally, convert kernel components to the new (BT) functions
>>>> where this makes sense (e.g. we can exploit the finer-granularity
>>>> of the new calls, etc.)
>> This is the strategy we followed.
>>>> cheers
>>>> luigi
>>>> 1) callout(9) is not anymore constrained to the resolution a periodic
>>>>> "hz" clock can give. In order to do that, the eventtimers(4) subsystem
>>>>> is used as backend.
>>>>> 2) Conversely from what discussed in past, we maintained the callwheel
>>>>> as underlying data structure for keeping track of the outstading
>>>>> timeouts. This choice has a couple of advantages, in particular we can
>>>>> still take benefits from the O(1) average complexity of the wheel for
>>>>> all the operations. Also, we thought the code duplication that would
>>>>> arise from the use of a two-staged backend for callout (e.g. use wheel
>>>>> for coarse resolution event and another data structure, such as an
>>>>> heap for high resolution events), is unacceptable. In fact, as long as
>>>>> callout gained the ability to migrate from a cpu to another having a
>>>>> double backend would mean doubling the code for the migration path.
>>>>> 3) A way to dispatch interrupts from hardware interrupt context has
>>>>> been implemented, using special callout flag. This has limited
>>>>> applicability, but avoid the dispatching of a SWI thread for handling
>>>>> specific callouts, avoiding the wake up of another CPU for processing
>>>>> and a (relatively useless) context switch
>>>>> 4) As long as new callout mechanism deals with bintime and not anymore
>>>>> with ticks, time is specified as absolute and not relative anymore. In
>>>>> order to get current time binuptime() or getbinuptime() is used, and a
>>>>> sysctl is introduced to selectively choose the function to use, based
>>>>> on a precision threshold.
>>>>> 5) A mechanism for specifying precision tolerance has been
>>>>> implemented. The callout processing mechanism has been adapted and the
>>>>> callout data structure augmented so that the codepath can take
>>>>> advantage and aggregate events which overlap in time.
>>>>> The new proposed KPI for callout is the following:
>>>>> callout_reset_bt_on(..., struct bintime time, struct bintime pr, ...,
>>>>> int
>>>>> flags)
>>>>> where ‘time’ argument represets the time at which the callout should
>>>>> fire, ‘pr’ represents the precision tolerance expressed as an absolute
>>>>> value, and ‘flags’, which could be used to specify new features, i.e.
>>>>> for now, the possibility to run the callout from fast interrupt
>>>>> context.
>>>>> The old KPI has been extended introducing the callout_reset_flags()
>>>>> function, which is the same of callout_reset*(), but takes an
>>>>> additional argument ‘int flags’ that can be used in the same fashion
>>>>> of the ‘flags’ argument for the new KPI. Using the ‘flags’ consumers
>>>>> can also specify relative precision tolerance in terms of power-of-two
>>>>> portion of the timeout passed as ticks.
>>>>> Using this strategy, the new precision mechanism can be used for the
>>>>> existing services without major modifications.
>>>>> Some consumers have been ported to the new KPI, in particular
>>>>> nanosleep(), poll(), select(), because they take immediate advantage
>>>>> from the arbitrary precision offered by the new infrastructure.
>>>>> For some statistics about the outcome of the conversion to the new
>>>>> service, please refer to the end of this e-mail:
>>>>> http://lists.freebsd.org/pipermail/freebsd-arch/2012-July/012756.html
>>>>> We didn't measure any significant performance regressions with
>>>>> hwmpc(4), using some benckmarks programs:
>>>>> http://people.freebsd.org/~davide/poll_test/poll_test.c
>>>>> http://people.freebsd.org/~mav/testsleep.c
>>>>> http://people.freebsd.org/~mav/testidle.c
>>>>> We tested the code on amd64, MIPS and arm. Any kind of testing or
>>>>> comment would be really appreciated. The full diff of the work against
>>>>> HEAD can be found at: http://people.freebsd.org/~davide/calloutng.diff
>>>>> If noone have objections, we plan to merge the repository to HEAD in a
>>>>> week or so.
>>>>> Thanks,
>>>>> Davide
>>>>> _______________________________________________
>>>>> freebsd-current at freebsd.org mailing list
>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>>>>> To unsubscribe, send any mail to
>>>>> "freebsd-current-unsubscribe at freebsd.org"
>>>> --
>>>> -----------------------------------------+-------------------------------
>>>> Prof. Luigi RIZZO, rizzo at iet.unipi.it . Dip. di Ing. dell'Informazione
>>>> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa
>>>> TEL +39-050-2211611 . via Diotisalvi 2
>>>> Mobile +39-338-6809875 . 56122 PISA (Italy)
>>>> -----------------------------------------+-------------------------------
>> _______________________________________________
>> freebsd-current at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
More information about the freebsd-arch
mailing list