Xen Dom0, are we making progress?
Matthew Dillon
dillon at apollo.backplane.com
Thu Mar 29 17:37:11 UTC 2007
: It seems very similar to User Mode Linux, rather than a true VM environment. http://user-mode-linux.sourceforge.net/ Each DragonFlyBSD vkernel runs as a process. I don't know why this is even interesting, for anything but kernel developers. Improving BSD jails to the same level as Solaris Containers (Solaris Containers are Solaris Zones with resource control), would widely useful for many BSD users.
:
: In VM environment, like Xen, each VM has its own kernel and possibly different OS. Xen has managed to get a lot of people interested in their VM environment, so there are a lot of OSes that support the Xen "architecture". And for those that don't there is early support for booting them by using virtual features in newer CPUs (ex. Windows). Microsoft has joined the Xen bandwagon, even though the core is all open source, as they are threatened in the enterprise space by the VMWare juggernaut, and their Virtual Server/Virtual PC product is so bland, no one cares.
:
: UML has been available for longer than Xen, but Xen already outperforms it. I don't see a lot of future in the "virtual kernel" concept.
:
:Tom
Well, judging by the history of how UML is used, the biggest uses
appear to be (A) Kernel development, (B) Machine virtualization for
sale to third parties (virtual servers), and (C) Security separation.
You can't really compare BSD jails to a virtual kernel. From a security
standpoint, its night and day. Jails require a ton of hooks all over the
kernel and even with those hooks they have no real ability to
compartmentalize resource use, nor is security assurable with any real
level of confidence. You are still running directly under the real
kernel and it shows. Virtual kernels are far more secure, even more
so once we give them a new syscall table map that disables all
real-kernel system calls other then read, write, vmspace_*() calls, and
a few other things required for operation once the vkernel has
initialized. They can be made extremely secure in ways that jails
cannot.
Regarding Xen, there is not much of a difference between a virtual
kernel implementation like UML or DragonFly's vkernel and something
like Xen. Both use the same concepts and have approximately the
same overhead, so its mainly an issue of algorithms and coding. I
do believe that Xen and vkernel environments are easier to optimize
then complete machine virtualization (vmware-like) environments in the
long term, simply because the kernels running under Xen or as virtual
kernels *know* they are operating virtually and can be heavily
optimized for that fact. For example, it would be possible to truely
free pages marked 'free' in the VM page queues.
As with many linux-centric projects, a great deal of effort is made
by certain individuals to optimize performance for patricular types
of applications, with varying results and varying implications to
maintainability. It is not a direction I really care to go. Xen
suffers from this myopia to a degree, though probably not as bad
as VMWare suffers from it.
My primary reason for doing it in DragonFly is as a kernel development
aid. Testing kernel code in a virtual kernel environment reduces the
engineering cycle time from ~7-10 minutes to about 20 seconds. It's
really amazing. But there are already a number of subsystems that I
think I may move into a virtual kernel for security reasons. Our wiki
is a good example. I just don't trust all the myrid applications we
have to run to support the site.
--
The two biggest issues in machine virtualized environments are
(1) system calls and (2) page table faults. At the moment (and without
any real effort on my part to optimize it), system calls are about
10 times as expensive:
vkernel# /tmp/sc1
timing standard getuid() syscall
getuid() 0.978s 302100 loops = 3.237uS/loop
test28# /tmp/sc1
timing standard getuid() syscall
getuid() 0.940s 3178900 loops = 0.296uS/loop
Page table faults are somewhat less expensive, but still not cheap.
It depends on the type of fault. Read faults are highly optimizable
but the 'dirty' bit in the virtualized page table has to be emulated
so writable VM maps have to be mapped read-only for a read rather then
read-write for a read in order to be able to take a write fault and
set the dirty bit in the virtualized page table. With the vmspace_*()
system calls, the page faults are still handled by the real kernel
so it isn't as bad as one might imagine.
So, e.g. compiles are still fairly reasonable. I haven't done a full
buildworld test but compile overhead seems to be only about 30% more.
Long running services whos main interaction with the system is
with fairly optimal network and file I/O calls seem to do the best.
Virtual kernels won't be winning any rewards, but they sure can be
convenient. Most of my kernel development is now done in virtual
kernels. It also makes kernel development more attainable to people
who are not traditionally kernel coders. The synergy is very good.
--
In anycase, as usual I rattle on. If FreeBSD is interested I recommend
simply looking at the cool features I added to DragonFly's kernel to
make virtual kernels possible. It's really just three major items:
Signal mailboxes, a new MAP_VPAGETABLE for mmap, and the new vmspace_*()
system calls for managing VM spaces. Once those features were in place
it didn't take long for me to create a 'vkernel' platform that linked
against libc and used the new system calls.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the freebsd-stable
mailing list