QEMU 0.8.1 and -kernel-kqemu: stalls with "npxdna: fpcurthread
== curthread"
Juergen Lock
nox at jelal.kn-bremen.de
Fri May 12 21:29:31 UTC 2006
In article <20060512101754.K65309 at delplex.bde.org> you write:
>On Fri, 12 May 2006, Georg-W. Koltermann wrote:
>
>> I installed QEMU 0.8.1 and kqemu-kmod kqemu-kmod-1.3.0.p7 today on
>> 6.1-PRERELEASE from 6 March. I tried running it with the
>> "-kernel-kqemu" option for increased performance.
>>
>> It did boot up my WinXP guest rather quickly, but then it soon got into
>> some kind of extreme busy state. The load went up, the X display was
>> not refreshed any more, just the mouse still moved. I could escape to
>> the console and found lots of
>>
>> May 11 13:04:44 hunter kernel: npxdna: fpcurthread == curthread
>43 times
>> ...
>>
>> messages. I then had to kill qemu.
>>
>> It does run ok without the "-kernel-kqemu" option. Any idea?
>
>1. This error should cause a panic instead of a printf. An invariant has
> been violated. The panic was broken in rev.1.131 of npx.c.
>2. This error has been implemented before. It was in the amd64
> linux32_sysvec.c until rev.1.9 of that. There it was caused by dubious
> setting of CR0_TS. The fix is dubious too:
>
>% Index: linux32_sysvec.c
>% ===================================================================
>% RCS file: /home/ncvs/src/sys/amd64/linux32/linux32_sysvec.c,v
>% retrieving revision 1.8
>% retrieving revision 1.9
>% diff -u -2 -r1.8 -r1.9
>% --- linux32_sysvec.c 29 Jul 2005 19:40:38 -0000 1.8
>% +++ linux32_sysvec.c 22 Sep 2005 15:46:21 -0000 1.9
>% @@ -807,4 +807,5 @@
>% regs->tf_rbx = ps_strings;
>% load_cr0(rcr0() | CR0_MP | CR0_TS);
>
>I don't know why exec_linux_setregs() hacks on cr0 directly. The
>native setregs function (exec_setregs()) just needs to set CR0_TS, and
>it calls fpstate_drop() to do this, as it should since only fpstate_drop()
>knows exactly what needs to be done. I think CR0_MP is already set
>here (it's set at boot time and should remain set).
>
>% + fpstate_drop(td);
>
>The correct fix seems to be to add this and remove the direct hacking
>on cr0. fpstate_drop() sets CR0_TS again, so setting it earlier has
>no effects except to waste time and implement a race. It gives a
>window in which the invariant (CR0_TS set if and only if fpucurthread
>== NULL) is violated. CR0_TS should be set together with the other
>things that fpstate_drop() does, with interrupts disabled like
>fpstate_drop() or at least in a critical section (*). I think the
>race is harmless. Suppose the above is interrupted in the race window.
>Then, since the kernel doesn't really use the FPU, and lazy context
>switching is not implemented, and only fpudna() checks the invariant,
>interrupt handling will at most just save the FPU state, clearing
>CR0_TS again in the process, at which point the invariant is restored.
>
>npxdna/fpudna() itself has a race checking the invariant. The invariant
>is checked before disabling interrupts, so it may change underneath us.
>Then various bad things happen, but at worst they result in a panic not
>occuring or printing a wrong message. My fix for this just disables
>interrupts while checking. An interrupt gate could be used to avoid the
>(only some (?) of the harmless (?) race window between the trap occuring
>and reaching npxdna()/fpudna().
>
>A comment says that exec_linux_setregs() is copied from from ia32_setregs().
>The latter has the same dubious code as the above. Somehow it was never
>missing the fpstate_drop().
>
>%
>% /* Return via doreti so that we can change to a different %cs */
>
>This bogus comment is also in ia32_setregs(). It is missing in the
>native setregs(). C code cannot return via doreti and shouldn't know
>that the syscall will return via doreti; it's not just %cs that will
>be changed by doreti -- doreti will load all the registers that we've
>just set up.
>
>(*) The npx/fpu code hard-disables Interrupts mainly for historical
>reasons -- it was necessary or at least simplest for the old i386 code to
>hard-disable interrupts so as to avoid complications in the
>non-exception16 case).
>
>Maybe other emulators get this wrong similarly.
So you think kqemu is doing something wrong? The problem is _k_qemu is
closed source and afaik the author doesnt use freebsd, the inner
workings of it are in a binary blob that gets linked into a kld and it
runs guest code (including kernel code with -kernel-kqemu) in kernel
space on the host cpu. You can see the freebsd-specific parts in
/usr/ports/emulators/kqemu-kmod/work/kqemu-1.3.0pre7/kqemu-freebsd.c
(after doing make in the port's dir) - could this be patched there?
Btw, kqemu on amd64 also causes lots of
fpudna in kernel mode!
messages even when not using -kernel-kqemu (so that kqemu only runs
guest userland code in kernel space.)
More information about the freebsd-emulation
mailing list