[RFC] BPF timestamping
Jung-uk Kim
jkim at FreeBSD.org
Fri Jun 11 16:38:42 UTC 2010
On Friday 11 June 2010 09:08 am, Bruce Evans wrote:
> On Thu, 10 Jun 2010, Jung-uk Kim wrote:
> > On Thursday 10 June 2010 05:45 am, Bruce Evans wrote:
> >> On Wed, 9 Jun 2010, Jung-uk Kim wrote:
> >>> bpf(4) can only timestamp packets with microtime(9). I want to
> >>> expand it to be able to use different format and resolution.
> >>> The ...
> >>
> >> This has too many timestamp types, yet not one timestamp type
> >> which is any good except possibly BPF_T_NONE, and not one
> >> monotonic timestamp type. Only external uses and compatibility
> >> require use of CLOCK_REALTIME.
> >> ...
> >
> > Please note that I am not trying to solve timecounter issues
> > here. The current BPF timestamping is not too good because of two
> > main reasons; 1) it is too slow with some timecounter hardware as
> > you have noted and 2) we have no API to change timestamp
> > resolution, accuracy, format, offset, or whatever *at all*.
> >
> > The most common trick for the first problem is using
> > getmicrotime(9) instead of microtime() if the users don't care
> > much about its accuracy. For those people who want to collect as
> > many packets as possible without spending fortunes, it works
> > pretty well. However, suppose you have multiple interfaces. You
> > want good timestamps from a slower controller (LAN side) and less
> > accurate timestamps from a super fast controller (WAN side), but
> > you can't. My patch solves this problem by assigning time
> > stamping function per descriptor. So, you can use the same
> > resolution but different accuracies, for example.
>
> I now think you should provide exactly the same timestamping
> features as provided to useland by clock_gettime(2),
> clock_getres(2) and clock_getaccprecres(2missing), using
> essentially the same interface and code. The userland interface
> involves clock ids of type clockid_t with names like CLOCK_REALTIME
> instead of bpf-specific names and types. Unfortunately it only
> supports the timespec format.
I thought about using them but struct timespec isn't good enough. It
has exactly the same problem as struct timeval does, i.e.,
sizeof(time_t) and sizeof(long) are variable depending on arch. Note
struct bpf_xhdr uses int64_t and uint64_t to work around the problem.
At least in theory, it should be good enough until we have to support
a 16-byte aligned arch. :-)
> > The second problem is little bit harder for us without breaking
> > libpcap and its consumers as it expects struct timeval and
> > nothing else. That's why I had to introduce new header format
> > with compat shims. In fact, struct bpf_hdr (and struct
> > pcap_sf_pkthdr) is really obsolete and people have been talking
> > about pcap NG for many years, which can store timestamps in
> > variable resolutions and offsets.
>
> Does it prefer or support bintimes?
It supports bintime. It does not prefer anything although the default
resolution is 1 usec for backward compatibility with old pcap format.
> > However, we can only use the default resolution even if libpcap
> > gets the new format because we are stuck with struct bpf_hdr[1].
> >
> > BTW, I updated my patch, which includes monotonic clocks now.
> >
> > BPF_T_MICROTIME_MONOTONIC microuptime(9)
> > BPF_T_NANOTIME_MONOTONIC nanouptime(9)
> > BPF_T_BINTIME_MONOTONIC binuptime(9)
> > BPF_T_MICROTIME_MONOTONIC_FAST getmicrouptime(9)
> > BPF_T_NANOTIME_MONOTONIC_FAST getnanouptime(9)
> > BPF_T_BINTIME_MONOTONIC_FAST getbinuptime(9)
> >
> > http://people.freebsd.org/~jkim/bpf_tstamp2.diff
> >
> > Thanks for the hint, Bruce, although you may say there are more
> > bogus clock types now. ;-)
>
> Yes, there are far too many, but many are still missing:
> - aliases BPF_T_*TIME_PRECISE for BPF_T_*TIME correpsonding to the
> corresponding aliases for clockid_t's. This gives 18 clock ids
> per timecounter instead of only 12. clock_gettime() only
> supports 6 of these (it doesn't support the micro or bin time
> formats). - aliases BPF_T_UPTIME* for BPF_*TIME_MONOTONIC. This
> gives 27 clock ids per timecounter instead of only 18.
> clock_gettime() only supports 9 of these.
> - BPF_T_SECOND corresponding to CLOCK_SECOND. clock_gettime()
> supports this.
> - BPF_T_THREAD_CPUTIME corresponding to CLOCK_THREAD_CPUTIME_ID,
> but without the bogus _ID suffix. The latter gives the runtime of
> the current thread in nanoseconds. This might be almost useful for
> bpf if all the packets are stamped by the same kernel or user
> thread. Then it would function as a packet id with extra info
> about the time spent processing packets.
> - BPF_T_VIRTUAL and BPF_T_PROF corresponding to CLOCK_VIRTUAL and
> CLOCK_PROF. The latter give user and user+sys times for
> processes. They would be about as useful as BPF_T_THREAD_CPUTIME
> for bpf. - the total is now 31 for bpf (19 missing) and 13 for
> clock_gettime(). - multiply this by the number of timecounters.
> Non-primary timecounters should be available iff something has a
> use for them.
> - raw cputicker timestamps. CLOCK_THREAD_CPUTIME_ID's timer uses
> these. These are not available in userland. They are easily
> available in the kernel, by calling cpu_tick(). Scaling them is
> nontrivial. - raw timecounter reads. These are already available
> in userland via sysctlbyname("kern.timecounter.tc.<name>.counter",
> ...). Strangely, they are hard to call from the kernel.
That's really far too many for my taste. :-( It'll significantly
increase number of special cases for switch statement but I cannot
avoid it (please see below). I added _MONOTONIC because it was
relatively cheap to implement and important. I may add some aliases
for _REALTIME, _PRECISE, and _UPTIME if you insist, though.
> By using normal clock ids and calling kern_clock_gettime(), you can
> avoid lots of duplication (including documentation of the bpf clock
> ids) and automatically support new normal clock ids. However, I
> can't see how to implement the following features as efficiently:
> - direct scaling to the final precision (kern_clock_gettime() only
> returns timspecs -- see abov)
> - delayed scaling to the final precision (bpf seems to make
> timestamps as binuptimes and scale them later)
> - avoiding going through layers and switches. bpf goes through
> several layers and switches now, but perhaps it can go directly to
> the *time() function in kern_tc.c via a single function pointer,
> where kern_clock_gettime() and delayed scaling have to use a switch
> or an indexed function pointer since their clock id is highly
> variable.
As I said, we cannot use kern_clock_gettime() and clockid_t. The code
duplication is also necessary evil because multiple descriptors may
be attached to a single interface, unless you are effectively asking
me to revert the following commit:
http://docs.freebsd.org/cgi/mid.cgi?200607241542.k6OFg5ck098374
Cheers,
Jung-uk Kim
More information about the freebsd-net
mailing list