KVM Clock
John Baldwin
jhb at freebsd.org
Mon Jan 20 20:50:25 UTC 2014
On Wednesday 15 January 2014 17:59:10 Julian Stecklina wrote:
> On Mi, 2014-01-15 at 17:04 +0100, Roger Pau Monné wrote:
> > On 15/01/14 14:45, Julian Stecklina wrote:
> > > On Mi, 2014-01-15 at 14:08 +0100, Roger Pau Monné wrote:
> > >> On 15/01/14 13:05, Julian Stecklina wrote:
> > >>> On 01/14/2014 05:13 PM, Peter Grehan wrote:
> > >>>> Hi Julian,
> > >>>>
> > >>>>> is anyone working on KVM clock support for FreeBSD? If not, I
> > >>>>> might take a shot at it.
> > >>>>
> > >>>> None I know of: go for it :)
> > >>>
> > >>> Works for me so far:
> > >>> https://github.com/blitz/freebsd/commit/cdc5f872b3e48cc0dda031fc7d6bde
> > >>> dc65c3148f> >>
> > >> Looking
> > >>
> > >> at the code it seems some common routines could be shared
> > >> between the KVM PV clock and the Xen PV clock
> > >> (sys/dev/xen/timer/timer.c). The data passed from the hypervisor to
> > >> the guest has exactly the same format (see struct vcpu_time_info in
> > >> Xen public headers).
> > >
> > > Yes, I know. Didn't get around to making it pretty yesterday evening. ;)
> > > I'll post an updated patch, when I have some time. Any suggestions where
> > > to put the two common functions?
> > >
> > >> At a first sight the KVM clock can benefit from using scale_delta
> > >> (which is going to be faster than the C version implemented in
> > >> kvmclock_get_timecount),
> > >
> > > At least somewhat on amd64. 32-bit looks pretty identical.
> > >
> > >> and xen_fetch_vcpu_tinfo is exactly the same
> > >>
> > >> as kvmclock_fetch.
> > >
> > > I think xen_fetch_vcpu_tinfo has a subtle bug:
> > > 217 do {
> > > 218 dst->version = src->version;
> > > 219 rmb();
> > > 220 dst->tsc_timestamp = src->tsc_timestamp;
> > > 221 dst->system_time = src->system_time;
> > > 222 dst->tsc_to_system_mul = src->tsc_to_system_mul;
> > > 223 dst->tsc_shift = src->tsc_shift;
> > > 224 rmb();
> > > 225 } while ((src->version & 1) | (dst->version ^
> > >
> > > src->version));
> > >
> > > In line 225 src->version is potentially read twice. If you consider the
> > > following schedule:
> > >
> > > host starts updating data, sets src->version to 1
> > > guest reads src->version (1) and stores it into dst->version.
> > > guest copies inconsistent data
> > > guest reads src->version (1) and computes xor with dst->version.
> > > host finishes updating data and sets src->version to 2
> > > guest reads src->version (2) and checks whether lower bit is not set.
> > > while loop exits with inconsistent data!
> > >
> > > I think the C standard (at least C11) permits this as it does not
> > > specify in which order the two reads in line 225 need to happen.
> >
> > According to the operator precedence and associativity in C
> > (http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B#Operator_preceden
> > ce), if I'm reading it right, the condition in the while line will be
> > evaluated as follows (because of the left-to-right associativity of the
> >
> > | operator):
> > 1. (src->version & 1)
> > 2. (dst->version ^ src->version)
> > 3. result of 1 | result of 2
> >
> > So AFAICT the flow that you describe could never happen, because
> > (src->version & 1) is always done before (dst->version ^ src->version).
>
> Operator precedence doesn't matter. Consider it written like this:
>
> or(and(src->version, 1), xor(dst->version, src->version))
>
> C gives you no guarantees whether the and or the xor will be executed
> first. There is no sequence point in between. And even with a sequence
> point, the C11 memory model gives you no guarantees about how the reads
> are ordered.
>
> This discussion is somewhat theoretical, because
> a) the hypervisor will probably update the data structure in the
> current vCPU context (making memory consistency issues go away).
> b) the compiler will likely not be an asshole. ;)
>
> Pseudocode for a better fetch could be:
>
> dst->version = src->version;
> rmb();
> ... read data ...
> rmb();
> version_post = src->version;
> rmb(); <- keeps the compiler from reading src->version multiple times
>
> and then doing the test with version_post and dst->version.
> Unfortunately, rmb() expands into lfence, even if there is no need for
> that here.
There is the __compiler_membar() macro in <sys/cdefs.h> that you could use if
this code is x86-specific (and thus knows it only needs a compiler barrier).
--
John Baldwin
More information about the freebsd-virtualization
mailing list