bhyve: svm (amd-v) update
Willem Jan Withagen
wjw at digiware.nl
Tue May 20 20:11:53 UTC 2014
On 18-5-2014 16:44, Anish wrote:
> Thanks for testing it.
>> Your patch applied cleanly to the working copy of the "bhyve_svm"-project.
> I was then able to merge with HEAD
> (using "theirs-full" on one file) and compile the kernel. So, to me it
> looks OK to commit.
> Yes, that's correct. You have to retain changes in sys/amd64/vmm/amd/amdv.c
> from bhyve_svm branch.
>
>> Unfortunately, I am still not able to boot CentOS 6.5 using my Phenom
> 1055T. It produces 200% load on the
> host CPU, and the emulated machine generates endlessly:
> Its 200% load because of 2 vcpus to guest. It stuck in loop even with
> single processor(1 vcpu) after PCI probing[debug messages with linux
> .....earlyprintk=serial debug]
>
> [ 3.684243] UDP hash table entries: 1024 (order: 3, 32768 bytes)
>
> [ 3.686484] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)
>
> [ 3.691987] NET: Registered protocol family 1
>
> [ 3.693382] pci 0000:00:01.0: Activating ISA DMA hang workarounds
>
> [ 3.695214] PCI: CLS 64 bytes, default 64
>
> [ 3.698176] Trying to unpack rootfs image as initramfs...
>
> [ 30.595279] BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1]
>
> [ 3.505631] pnp: PnP ACPI: found 5 devices
>
> [ 3.506417] ACPI: bus type PNP unregistered
>
> [ 3.635781] pci 0000:00:06.0: no compatible bridge window for [mem
> 0xfe440000
>
> -0xfe45ffff pref]
>
> [ 3.637555] pci 0000:00:06.0: BAR 6: assigned [mem 0x80000000-0x8001ffff
> pref
>
> ]
>
> [ 3.638986] pci 0000:00:01.0: BAR 6: assigned [mem 0x80020000-0x800207ff
> pref
>
> ]
>
> [ 3.640416] pci 0000:00:04.0: BAR 6: assigned [mem 0x80020800-0x80020fff
> pref
>
> ]
>
> [ 3.641864] pci 0000:00:05.0: BAR 6: assigned [mem 0x80021000-0x800217ff
> pref
>
> ]
>
> [ 3.643259] pci 0000:00:00.0: not setting up bridge for bus 0000:01
>
> [ 3.644550] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7]
>
> [ 3.645670] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff]
>
> [ 3.646795] pci_bus 0000:00: resource 6 [mem 0x80000000-0xdfffffff]
>
> [ 3.648031] pci_bus 0000:00: resource 7 [mem 0xd000000000-0xfcffffffff]
>
> [ 3.650970] NET: Registered protocol family 2
>
> [ 3.661491] TCP established hash table entries: 16384 (order: 6, 262144
> bytes
>
> )
>
> [ 3.671854] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
>
> [ 3.681116] TCP: Hash tables configured (established 16384 bind 16384)
>
> [ 3.683335] TCP: reno registered
>
> [ 3.684243] UDP hash table entries: 1024 (order: 3, 32768 bytes)
>
> [ 3.686484] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)
>
> [ 3.691987] NET: Registered protocol family 1
>
> [ 3.693382] pci 0000:00:01.0: Activating ISA DMA hang workarounds
>
> [ 3.695214] PCI: CLS 64 bytes, default 64
>
> [ 3.698176] Trying to unpack rootfs image as initramfs...
>
> [ 30.595279] BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1]
>
> [ 30.596366] Modules linked in:
>> Additionally, It produces a lot of MSR requests:
> Yes, on AMD Linux is touching more MSRs( AMD specific -address 0xC00XXXX)
> compared to Intel.
>
> Thanks and regards,
> Anish
>
>
> On Fri, May 16, 2014 at 2:17 PM, Nils Beyer <nbe at renzel.net> wrote:
>
>> Hi Anish,
>>
>> Anish wrote:
>>> If patches looks good to you, we can submit it. I have been testing it on
>>> Phenom box which lacks some of newer SVM features.
>>
>> Your patch applied cleanly to the working copy of the "bhyve_svm"-project.
>> I was then able to merge with HEAD
>> (using "theirs-full" on one file) and compile the kernel. So, to me it
>> looks OK to commit.
>>
>> Unfortunately, I am still not able to boot CentOS 6.5 using my Phenom
>> 1055T. It produces 200% load on the
>> host CPU, and the emulated machine generates endlessly:
>>
>> =======================================================================================
>> BUG: soft lockup - CPU#0 stuck for 67s! [swapper:1]
>> Modules linked in:
>> CPU 0
>> Modules linked in:
>>
>> Pid: 1, comm: swapper Not tainted 2.6.32-431.el6.x86_64 #1 BHYVE
And more...
>> I'd love to see CentOS perfectly running on my Phenom as it runs perfectly
>> on an Intel i3.
>>
>> If you need any further information/debug, please let me know...
I've been trying to get Ubuntu, CentOS and like to run on AMDs, and
currently I'm compiling a kernel, but it goes dirt slow.
Attached a patch I have to debug more of the MSRs and it does what I do
to get the TSC running.... It helps, but things are still like molases.
For Ubuntu I also needed to fix part of the AHCI code since it bails out
on ATA FLUSH.
I'm going to take a look at the recently posted diff which should get
bhyve_svm in line with head. And see if that speeds up my Ubuntu kernels.
--WjW
-------------- next part --------------
Index: sys/amd64/vmm/amd/svm.c
===================================================================
--- sys/amd64/vmm/amd/svm.c (revision 264582)
+++ sys/amd64/vmm/amd/svm.c (working copy)
@@ -82,6 +82,8 @@
static bool svm_vmexit(struct svm_softc *svm_sc, int vcpu,
struct vm_exit *vmexit);
static int svm_msr_rw_ok(uint8_t *btmap, uint64_t msr);
+static int svm_msr_ro_ok(uint8_t *btmap, uint64_t msr);
+static int svm_msr_rw_ro_ok(uint8_t *btmap, uint64_t msr, int mask);
static int svm_msr_index(uint64_t msr, int *index, int *bit);
static uint32_t svm_feature; /* AMD SVM features. */
@@ -315,9 +317,24 @@
/*
* Give virtual cpu the complete access to MSR(read & write).
*/
+#define MSR_RO 1
+#define MSR_RW 3
+
static int
svm_msr_rw_ok(uint8_t *perm_bitmap, uint64_t msr)
{
+ return svm_msr_rw_ro_ok(perm_bitmap, msr, MSR_RW);
+}
+
+static int
+svm_msr_ro_ok(uint8_t *perm_bitmap, uint64_t msr)
+{
+ return svm_msr_rw_ro_ok(perm_bitmap, msr, MSR_RO);
+}
+
+static int
+svm_msr_rw_ro_ok(uint8_t *perm_bitmap, uint64_t msr, int mask)
+{
int index, bit, err;
err = svm_msr_index(msr, &index, &bit);
@@ -336,8 +353,12 @@
}
/* Disable intercept for read and write. */
- perm_bitmap[index] &= ~(3 << bit);
- CTR1(KTR_VMM, "Guest has full control on SVM:MSR(0x%lx).\n", msr);
+ perm_bitmap[index] &= ~(mask << bit);
+ if (mask==MSR_RW) {
+ CTR1(KTR_VMM, "Guest has Read/Write control on SVM:MSR(0x%lx).\n", msr );
+ } else {
+ CTR1(KTR_VMM, "Guest has Read/Write control on SVM:MSR(0x%lx).\n", msr );
+ }
return (0);
}
@@ -415,10 +436,26 @@
svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_SYSENTER_CS_MSR);
svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_SYSENTER_ESP_MSR);
svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_SYSENTER_EIP_MSR);
-
+
+#define AMD_MSR_TSEG_BASE 0xc0010112
+#define AMD_MSR_OSVW_ID_LENGTH 0xc0010140 /* read */
+#define AMD_MSR_OSVW_STATUS 0xc0010141 /* read */
+#define AMD_MSR_MC4_CTL_MASK 0xc0010048
+
/* For Nested Paging/RVI only. */
svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_PAT);
+ svm_msr_rw_ok(svm_sc->msr_bitmap, AMD_MSR_OSVW_ID_LENGTH);
+ svm_msr_rw_ok(svm_sc->msr_bitmap, AMD_MSR_OSVW_STATUS);
+ /*
+ * MSRs that are allowed to be read.
+ * most obvious one is the TSC read which could be time critical
+ */
+ svm_msr_ro_ok(svm_sc->msr_bitmap, MSR_TSC);
+ svm_msr_ro_ok(svm_sc->msr_bitmap, MSR_HWCR);
+ svm_msr_ro_ok(svm_sc->msr_bitmap, AMD_MSR_TSEG_BASE);
+ svm_msr_ro_ok(svm_sc->msr_bitmap, AMD_MSR_MC4_CTL_MASK);
+
/* Intercept access to all I/O ports. */
memset(svm_sc->iopm_bitmap, 0xFF, sizeof(svm_sc->iopm_bitmap));
@@ -566,6 +603,13 @@
svm_efer(svm_sc, vcpu, info1);
break;
}
+ if (ecx == MSR_TSC) {
+ uint64_t tscval = rdtsc();
+ VCPU_CTR0(svm_sc->vm, vcpu,"VMEXIT TSC MSR\n");
+ state->rax = tscval & 0xffffffff;
+ ctx->e.g.sctx_rdx = tscval >> 32;
+ break;
+ }
retu = false;
if (info1) {
Index: sys/amd64/vmm/intel/vmx.c
===================================================================
--- sys/amd64/vmm/intel/vmx.c (revision 264582)
+++ sys/amd64/vmm/intel/vmx.c (working copy)
@@ -109,6 +109,9 @@
#define guest_msr_rw(vmx, msr) \
msr_bitmap_change_access((vmx)->msr_bitmap, (msr), MSR_BITMAP_ACCESS_RW)
+#define guest_msr_ro(vmx, msr) \
+ msr_bitmap_change_access((vmx)->msr_bitmap, (msr), MSR_BITMAP_ACCESS_READ)
+
#define HANDLED 1
#define UNHANDLED 0
@@ -786,6 +789,11 @@
* MSR_EFER is saved and restored in the guest VMCS area on a
* VM exit and entry respectively. It is also restored from the
* host VMCS area on a VM exit.
+ *
+ * The TSC MSR is exposed read-only. Writes are disallowed as that
+ * will impact the host TSC.
+ * XXX Writes would be implemented with a wrmsr trap, and
+ * then modifying the TSC offset in the VMCS.
*/
if (guest_msr_rw(vmx, MSR_GSBASE) ||
guest_msr_rw(vmx, MSR_FSBASE) ||
@@ -793,7 +801,8 @@
guest_msr_rw(vmx, MSR_SYSENTER_ESP_MSR) ||
guest_msr_rw(vmx, MSR_SYSENTER_EIP_MSR) ||
guest_msr_rw(vmx, MSR_KGSBASE) ||
- guest_msr_rw(vmx, MSR_EFER))
+ guest_msr_rw(vmx, MSR_EFER) ||
+ guest_msr_ro(vmx, MSR_TSC))
panic("vmx_vminit: error setting guest msr access");
/*
Index: sys/amd64/vmm/io/vlapic.c
===================================================================
--- sys/amd64/vmm/io/vlapic.c (revision 264582)
+++ sys/amd64/vmm/io/vlapic.c (working copy)
@@ -143,7 +143,7 @@
#define VLAPIC_TIMER_UNLOCK(vlapic) mtx_unlock_spin(&((vlapic)->timer_mtx))
#define VLAPIC_TIMER_LOCKED(vlapic) mtx_owned(&((vlapic)->timer_mtx))
-#define VLAPIC_BUS_FREQ tsc_freq
+#define VLAPIC_BUS_FREQ (128*1024*1024)
static __inline uint32_t
vlapic_get_id(struct vlapic *vlapic)
Index: sys/amd64/vmm/vmm_msr.c
===================================================================
--- sys/amd64/vmm/vmm_msr.c (revision 264582)
+++ sys/amd64/vmm/vmm_msr.c (working copy)
@@ -113,6 +113,9 @@
case MSR_MCG_CAP:
guest_msrs[i] = 0;
break;
+ case MSR_TSC:
+ guest_msrs[i] = rdtsc();
+ break;
case MSR_PAT:
guest_msrs[i] = PAT_VALUE(0, PAT_WRITE_BACK) |
PAT_VALUE(1, PAT_WRITE_THROUGH) |
Index: sys/amd64/vmm/vmm_msr.h
===================================================================
--- sys/amd64/vmm/vmm_msr.h (revision 264582)
+++ sys/amd64/vmm/vmm_msr.h (working copy)
@@ -29,7 +29,7 @@
#ifndef _VMM_MSR_H_
#define _VMM_MSR_H_
-#define VMM_MSR_NUM 16
+#define VMM_MSR_NUM 17
struct vm;
void vmm_msr_init(void);
Index: usr.sbin/bhyve/bhyverun.c
===================================================================
--- usr.sbin/bhyve/bhyverun.c (revision 264582)
+++ usr.sbin/bhyve/bhyverun.c (working copy)
@@ -52,6 +52,7 @@
#include <vmmapi.h>
#include "bhyverun.h"
+#include "compiledate.h"
#include "acpi.h"
#include "inout.h"
#include "dbgport.h"
@@ -75,6 +76,8 @@
#define MB (1024UL * 1024)
#define GB (1024UL * MB)
+#define FALSE 0
+#define TRUE (!FALSE)
typedef int (*vmexit_handler_t)(struct vmctx *, struct vm_exit *, int *vcpu);
@@ -139,8 +142,8 @@
" -S: <slot,driver,configinfo> legacy PCI slot config\n"
" -l: LPC device configuration\n"
" -m: memory size in MB\n"
- " -w: ignore unimplemented MSRs\n",
- progname, (int)strlen(progname), "");
+ " -w: ignore unimplemented MSRs\n"
+ ,progname, (int)strlen(progname), "");
exit(code);
}
@@ -287,10 +290,6 @@
if (vme->u.inout.string || vme->u.inout.rep)
return (VMEXIT_ABORT);
- /* Special case of guest reset */
- if (out && port == 0x64 && (uint8_t)eax == 0xFE)
- return (vmexit_catch_reset());
-
/* Extra-special case of host notifications */
if (out && port == GUEST_NIO_PORT)
return (vmexit_handle_notify(ctx, vme, pvcpu, eax));
@@ -315,16 +314,16 @@
uint64_t val;
uint32_t eax, edx;
int error;
+ val = 0;
- val = 0;
error = emulate_rdmsr(ctx, *pvcpu, vme->u.msr.code, &val);
+
if (error != 0) {
- fprintf(stderr, "rdmsr to register %#x on vcpu %d\n",
+ fprintf(stderr, "rdmsr to register %#x ignored on vcpu %d\n\r",
vme->u.msr.code, *pvcpu);
if (strictmsr)
return (VMEXIT_ABORT);
}
-
eax = val;
error = vm_set_register(ctx, *pvcpu, VM_REG_GUEST_RAX, eax);
assert(error == 0);
@@ -332,7 +331,6 @@
edx = val >> 32;
error = vm_set_register(ctx, *pvcpu, VM_REG_GUEST_RDX, edx);
assert(error == 0);
-
return (VMEXIT_CONTINUE);
}
@@ -343,7 +341,7 @@
error = emulate_wrmsr(ctx, *pvcpu, vme->u.msr.code, vme->u.msr.wval);
if (error != 0) {
- fprintf(stderr, "wrmsr to register %#x(%#lx) on vcpu %d\n",
+ fprintf(stderr, "wrmsr to register %#x(%#lx) ignored on vcpu %d\n\r",
vme->u.msr.code, vme->u.msr.wval, *pvcpu);
if (strictmsr)
return (VMEXIT_ABORT);
@@ -676,6 +674,7 @@
argc -= optind;
argv += optind;
+ printf("BHyve compiled: %s \n\r\n\r", compiledate );
if (argc != 1)
usage(1);
Index: usr.sbin/bhyve/xmsr.c
===================================================================
--- usr.sbin/bhyve/xmsr.c (revision 264582)
+++ usr.sbin/bhyve/xmsr.c (working copy)
@@ -38,24 +38,72 @@
#include <stdlib.h>
#include "xmsr.h"
+#include "xmsr-info.h"
+#define BIT(b) (1<<b)
+#define FALSE 0
+#define TRUE (!FALSE)
+
int
emulate_wrmsr(struct vmctx *ctx, int vcpu, uint32_t code, uint64_t val)
{
+ long retval = -1;
- switch (code) {
+ switch (code) {
case 0xd04: /* Sandy Bridge uncore PMC MSRs */
case 0xc24:
- return (0);
+ /* simulate that these registers are written */
+ retval=(0);
+ break;
default:
break;
}
- return (-1);
+ fprintf(stderr,"wrmsr: %#x, %s, val: %li(%#lx).\n\r",
+ code, xmsr_info_mnemonic(code), val, val);
+ return retval;
}
+/*
+ * Return: error value
+ * 0 = instruction emulated
+ * !0 = instruction ignore
+ */
int
emulate_rdmsr(struct vmctx *ctx, int vcpu, uint32_t code, uint64_t *val)
{
+ int retval = 0;
- return (-1);
+ switch (code) {
+ case 0xd04: /* Sandy Bridge uncore PMC MSRs */
+// *val = (0);
+ break;
+ case 0xc24:
+// *val = (0);
+ break;
+ case AMD_MSR_TSEG_BASE:
+// *val = 0xcfe00000;
+ break;
+ case AMD_MSR_HWCR:
+ *val = (BIT(24)|BIT(4));
+ break;
+ case AMD_MSR_OSVW_ID_LENGTH:
+ *val = (4);
+ break;
+ case AMD_MSR_OSVW_STATUS:
+ *val = (BIT(3)|BIT(2));
+ break;
+// case AMD_MSR_IBSCTL:
+// *val = BIT(8);
+// break;
+ default:
+ retval = 1;
+ break;
+ }
+ fprintf(stderr,"rdmsr(%i:%s): %#x, %s, val: %li(%#lx).\n\r",
+ retval, (retval==0?"oke":"err"),
+ code, xmsr_info_mnemonic(code), *val, *val);
+ return retval;
+
}
+
+
More information about the freebsd-virtualization
mailing list