Xen Dom0, are we making progress?

Thu Mar 29 17:37:11 UTC 2007

:  It seems very similar to User Mode Linux, rather than a true VM environment.  http://user-mode-linux.sourceforge.net/   Each DragonFlyBSD vkernel runs as a process.  I don't know why this is even interesting, for anything but kernel developers.  Improving BSD jails to the same level as Solaris Containers (Solaris Containers are Solaris Zones with resource control), would widely useful for many BSD users.
:
:  In VM environment, like Xen, each VM has its own kernel and possibly different OS.  Xen has managed to get a lot of people interested in their VM environment, so there are a lot of OSes that support the Xen "architecture".  And for those that don't there is early support for booting them by using virtual features in newer CPUs (ex. Windows).  Microsoft has joined the Xen bandwagon, even though the core is all open source, as they are threatened in the enterprise space by the VMWare juggernaut, and their Virtual Server/Virtual PC product is so bland, no one cares.
:
:  UML has been available for longer than Xen, but Xen already outperforms it.  I don't see a lot of future in the "virtual kernel" concept.
:
:Tom

    Well, judging by the history of how UML is used, the biggest uses
    appear to be (A) Kernel development, (B) Machine virtualization for
    sale to third parties (virtual servers), and (C) Security separation.

    You can't really compare BSD jails to a virtual kernel.  From a security
    standpoint, its night and day.  Jails require a ton of hooks all over the
    kernel and even with those hooks they have no real ability to
    compartmentalize resource use, nor is security assurable with any real
    level of confidence.  You are still running directly under the real
    kernel and it shows.  Virtual kernels are far more secure, even more
    so once we give them a new syscall table map that disables all
    real-kernel system calls other then read, write, vmspace_*() calls, and
    a few other things required for operation once the vkernel has
    initialized.  They can be made extremely secure in ways that jails
    cannot.

    Regarding Xen, there is not much of a difference between a virtual
    kernel implementation like UML or DragonFly's vkernel and something
    like Xen.  Both use the same concepts and have approximately the 
    same overhead, so its mainly an issue of algorithms and coding.  I
    do believe that Xen and vkernel environments are easier to optimize 
    then complete machine virtualization (vmware-like) environments in the
    long term, simply because the kernels running under Xen or as virtual
    kernels *know* they are operating virtually and can be heavily
    optimized for that fact.  For example, it would be possible to truely
    free pages marked 'free' in the VM page queues.

    As with many linux-centric projects, a great deal of effort is made
    by certain individuals to optimize performance for patricular types
    of applications, with varying results and varying implications to
    maintainability.  It is not a direction I really care to go.  Xen
    suffers from this myopia to a degree, though probably not as bad
    as VMWare suffers from it.

    My primary reason for doing it in DragonFly is as a kernel development
    aid.  Testing kernel code in a virtual kernel environment reduces the
    engineering cycle time from ~7-10 minutes to about 20 seconds.  It's
    really amazing.  But there are already a number of subsystems that I
    think I may move into a virtual kernel for security reasons.  Our wiki
    is a good example.  I just don't trust all the myrid applications we
    have to run to support the site.

    --

    The two biggest issues in machine virtualized environments are
    (1) system calls and (2) page table faults.  At the moment (and without
    any real effort on my part to optimize it), system calls are about
    10 times as expensive:

    vkernel# /tmp/sc1
    timing standard getuid() syscall
    getuid()  0.978s 302100 loops =  3.237uS/loop

    test28# /tmp/sc1
    timing standard getuid() syscall
    getuid()  0.940s 3178900 loops =  0.296uS/loop

    Page table faults are somewhat less expensive, but still not cheap.
    It depends on the type of fault.  Read faults are highly optimizable
    but the 'dirty' bit in the virtualized page table has to be emulated
    so writable VM maps have to be mapped read-only for a read rather then
    read-write for a read in order to be able to take a write fault and
    set the dirty bit in the virtualized page table.  With the vmspace_*()
    system calls, the page faults are still handled by the real kernel
    so it isn't as bad as one might imagine.

    So, e.g. compiles are still fairly reasonable.  I haven't done a full
    buildworld test but compile overhead seems to be only about 30% more.
    Long running services whos main interaction with the system is
    with fairly optimal network and file I/O calls seem to do the best.

    Virtual kernels won't be winning any rewards, but they sure can be 
    convenient.  Most of my kernel development is now done in virtual
    kernels.  It also makes kernel development more attainable to people
    who are not traditionally kernel coders.  The synergy is very good.

    --

    In anycase, as usual I rattle on.  If FreeBSD is interested I recommend
    simply looking at the cool features I added to DragonFly's kernel to
    make virtual kernels possible.  It's really just three major items:
    Signal mailboxes, a new MAP_VPAGETABLE for mmap, and the new vmspace_*()
    system calls for managing VM spaces.  Once those features were in place
    it didn't take long for me to create a 'vkernel' platform that linked
    against libc and used the new system calls.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>