seems I finally found what upset kqemu on amd64 SMP... shared
gdt! (please test patch :)
John Baldwin
jhb at freebsd.org
Thu May 1 14:36:14 UTC 2008
On Thursday 01 May 2008 06:19:51 am Juergen Lock wrote:
> On Wed, Apr 30, 2008 at 12:24:58AM +0200, Juergen Lock wrote:
> > Yeah, the amd64 kernel reuses the same gdt to setup all cpus, causing
> > kqemu to end up restoring the interrupt stackpointer (after running
> > guest code using its own cpu state) from the tss of the last cpu,
> > regardless which cpu it happened to run on. And that then causes the
> > last cpu's (usually) idle thread's stack to get smashed and the host
> > doing multiple panics... (Which also explains why pinning qemu onto cpu
> > 1 worked on a 2-way host.)
>
> Hmm maybe the following is a little more clear: kqemu sets up its own
> cpu state and has to save and restore the original state because of that,
> so among other things it does an str insn (store task register), and later
> an ltr insn (load task register) using the value it got from the first
> str insn. That ltr insn loads the selector for the tss which is stored
> in the gdt, and that entry in the gdt is different for each cpu, but since
> a single gdt was reused to setup the cpus at boot (in init_secondary() in
> /sys/amd64/amd64/mp_machdep.c), it still points to the tss for the last
> cpu, instead of to the right one for the cpu the ltr insn gets executed on.
> That is what the kqemu_tss_workaround() in the patch `fixes'...
Perhaps kqemu shouldn't be doing str/ltr on amd64 instead? The things i386
uses a separate tss for in the kernel (separate stack for double faults) is
handled differently on amd64 (on amd64 we make the double fault handler use
one of the IST stacks).
> > Here's the patch I just tested, of course you'd want to disable this
> > once the gdt is no longer shared, so assuming someone wants to fix this,
> > please also do an OSVERSION bump...
>
> The patch applied with offsets (I still had debug code in when I made it),
> here is a rebased version:
>
> Index: kqemu-freebsd.c
> @@ -33,6 +33,11 @@
>
> #include <machine/vmparam.h>
> #include <machine/stdarg.h>
> +#ifdef __x86_64__
> +#include <sys/pcpu.h>
> +#include <machine/segments.h>
> +#include <machine/tss.h>
> +#endif
>
> #include "kqemu-kernel.h"
>
> @@ -234,6 +239,19 @@
> va_end(ap);
> }
>
> +#ifdef __x86_64__
> +/* called with interrupts disabled */
> +void CDECL kqemu_tss_workaround(void)
> +{
> + int gsel_tss = GSEL(GPROC0_SEL, SEL_KPL);
> +
> + gdt_segs[GPROC0_SEL].ssd_base = (long) &common_tss[PCPU_GET(cpuid)];
> + ssdtosyssd(&gdt_segs[GPROC0_SEL],
> + (struct system_segment_descriptor *)&gdt[GPROC0_SEL]);
> + ltr(gsel_tss);
> +}
> +#endif
> +
> struct kqemu_instance {
> #if __FreeBSD_version >= 500000
> TAILQ_ENTRY(kqemu_instance) kqemu_ent;
> Index: common/kernel.c
> @@ -1025,6 +1025,9 @@
> #ifdef __x86_64__
> uint16_t saved_ds, saved_es;
> unsigned long fs_base, gs_base;
> +#ifdef __FreeBSD__
> + struct kqemu_global_state *g = s->global_state;
> +#endif
> #endif
>
> #ifdef PROFILE
> @@ -1188,6 +1191,13 @@
> apic_restore_nmi(s, apic_nmi_mask);
> }
> profile_record(s);
> +#ifdef __FreeBSD__
> +#ifdef __x86_64__
> + spin_lock(&g->lock);
> + kqemu_tss_workaround();
> + spin_unlock(&g->lock);
> +#endif
> +#endif
>
> if (s->mon_req == MON_REQ_IRQ) {
> struct kqemu_exception_regs *r;
> Index: kqemu-kernel.h
> @@ -44,4 +44,10 @@
>
> void CDECL kqemu_log(const char *fmt, ...);
>
> +#ifdef __FreeBSD__
> +#ifdef __x86_64__
> +void CDECL kqemu_tss_workaround(void);
> +#endif
> +#endif
> +
> #endif /* KQEMU_KERNEL_H */
> _______________________________________________
> freebsd-amd64 at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-amd64
> To unsubscribe, send any mail to "freebsd-amd64-unsubscribe at freebsd.org"
--
John Baldwin
More information about the freebsd-amd64
mailing list