cvs commit: src/sys/sys time.h src/sys/kern kern_time.c
Robert Watson
rwatson at FreeBSD.org
Sun Nov 27 13:17:48 GMT 2005
On Sun, 27 Nov 2005, Bruce Evans wrote:
>> Add experimental low-precision clockid_t names corresponding to these
>> clocks, but implemented using cached timestamps in kernel rather than
>> a full time counter query.
>
> These existence of these interfaces is a mistake even in the kernel. On
> all machines that I've looked at, the calls to the high-precision
> binuptime() outnumber calls to all the other high-level timecounter
> routines combined by a large factor. E.g., on pluto1.freebsd.org (which
> seems typical) now, after an uptime of ~8 days, there have been ~1200
> million calls to binuptime(), ~124 million calls to getmicrouptime(),
> ~72 million calls to gtemicrotime(), and relatively few other calls.
>
> Thus we get a small speedup at a cost of some complexity and large
> inerface bloat.
>
> This is partly because there are too many context switches and context
> switches necessarily use a precise timestamp, and file timestamps are
> under-represented since they normally use a direct access to
> time_second.
Interestingly, I've now observed several application workloads where the
rate of user space high precision time queries far outnumbers the kernel
rate of time stamp queries. Specifically, for applications that are
event-driven and need to generate time outs to pass to poll() and
select(). Applications like BIND9 generate two gettimeofday() system
calls for every select() call, in order to manage their own internal event
engine. As select() itself has a precision keyed to 1/HZ, using time
stamps at a similarly low precision for driving an internal scheduler
based on select() or poll() makes some amount of sense. Using the
libwrapper.so I attached to my previous e-mail and setting 'FAST' mode, I
see a 4% performance improvement in throughput for BIND9. David Xu has
reported a similar improvement in MySQL performance using libwrapper.so.
For BIND9 under high load, the rate of context switches is much lower than
the rate of select() calls, as multiple queries are delivered to the UDP
socket per interrupt due to interrupt coalescing (etc).
Given the way applications are being written to manage their own event
loops using select() or similar interfaces, the ability to quickly request
low precision timestamps for use with those interfaces makes a fairly
significant difference in macro-level performance. How we expose
"cheaper, suckier time" is something I'm quite willing to discuss, but the
evidence seems to suggest that if we want to improve the performance of
this class of applications, we need to provide time keeping services that
match their requirements (run frequently with fairly weak requirements on
precision). I'm entirely open to exposing this service in different ways,
or offering a different notion of "cheaper, suckier". For example, I
could imagine exposing an interface intended to return timing information
specifically for HZ-driven sleep mechanisms, such as poll() and select().
The advantage, for experimental purposes, in the approach I committed is
that it allows us to easily test the impact of such changes on
applications without modifing the application. The disadvantage is that
we'll want to change it, but given that I am not yet clear we fully
understand the requirements, that is probably inevitable.
FWIW, once we have an interface that says "here's how you get bad time",
we can implement it in other ways than I've done -- for example, exporting
a kernel memory page with the necessary information to somewhat reliably
convert rdtsc() into an estimated time stamp without ever doing a system
call (this is what Darwin does, btw).
Your proposals on how this should be done are most welcome, but the trick
will be balancing the needs of several parties -- people interested in
highly precise time measurement due to a preoccupation with NTP and atomic
clocks, people who just want their applications to run faster, and people
who want the system to be clean. I think we can meet most of the needs of
most of these people if we do it right, but I'm not sure what right is
since (to be honest) I don't have a detailed understanding of what each of
these communities really needs (let alone wants).
Robert N M Watson
More information about the cvs-all
mailing list