Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

Rodney W. Grimes freebsd-rwg at pdx.rh.CN85.dnsmgr.net
Wed Jan 11 10:41:59 UTC 2017


IIRC the 367.44 version of the nvidia drivers do NOT support the
Quadro 2000, you need to be using the 340.xx version of them.  I
ran into problems on native hardware.

Also before you attempt to get VGA passthrough working it is best
to make sure you can run native, have you tried running your guest
on the host in a native configuration?

I have fought this on other platforms many times only to find out
that what I was trying would not ever run native, let alone in a
virtualized environment.

> > The problem appears to be in the area of assigning memory-mapped
> > I/O ranges by bhyve for the VGA card to a region outside of the
> > CPU's addressable space; i.e., bhyve does not check CPUID's
> > 0x80000008 AL value (0x27 for my CPU, which is 39 bits -- while
> > bhyve assigns 0xd000000000 & above for the large Prefetch Memory
> > chunks, which requires 40 address bits). At least this is my
> > understanding of why VGA passthrough does not work.
> 
> To test this, I tried writing to PCI BARs in FreeBSD guest using
> `pciconf -w`. Not much use that was: I could read back the values
> written to the registers (e.g., `pciconf -r pci0:0:4:0 0x14:48`),
> but `pciconf -lvb` still showed the same huge base addresses --
> they did not want to change.
> 
> OK, I had enough of that. So I went to dig in the source, and
> changed the "#define PCI_EMUL_MEMBASE64" from '0xD000000000UL'
> to '0x3400000000UL' in src/usr.sbin/bhyve/pci_emul.c. Recompiled
> bhyve, booted up FreeBSD, and:
>   # pciconf -lvb
>   [...]
>   vgapci0 at pci0:0:4:0:     class=0x030000 card=0x084a10de chip=0x0dd810de rev=0xa1 hdr=0x00
>       vendor     = 'NVIDIA Corporation'
>       device     = 'GF106GL [Quadro 2000]'
>       class      = display
>       subclass   = VGA
>       bar   [10] = type Memory, range 32, base 0xc2000000, size 33554432, enabled
>       bar   [14] = type Prefetchable Memory, range 64, base 0x3400000000, size 134217728, enabled
>       bar   [1c] = type Prefetchable Memory, range 64, base 0x3408000000, size 67108864, enabled
>       bar   [24] = type I/O Port, range 32, base 0x2080, size 128, enabled
> 
> ...a-a-and:
>   # kldload nvidia-modeset
>   Linux ELF exec handler installed
>   nvidia0: <Quadro 2000> on vgapci0
>   vgapci0: child nvidia0 requested pci_enable_io
>   vgapci0: attempting to allocate 1 MSI vectors (1 supported)
>   msi: routing MSI IRQ 269 to local APIC 3 vector 51
>   vgapci0: using IRQ 269 for MSI
>   vgapci0: child nvidia0 requested pci_enable_io
>   random: harvesting attach, 8 bytes (4 bits) from nvidia0
>   # nvidia-smi
>   acquiring duplicate lock of same type: "os.lock_sx"
>    1st os.lock_sx @ nvidia_os.c:599
>    2nd os.lock_sx @ nvidia_os.c:599
>   stack backtrace:
>   #0 0xffffffff80aa6780 at witness_debugger+0x70
>   #1 0xffffffff80aa6683 at witness_checkorder+0xde3
>   #2 0xffffffff80a4fac2 at _sx_xlock+0x72
>   #3 0xffffffff82a515c2 at os_acquire_mutex+0x32
>   #4 0xffffffff82a21068 at _nv016673rm+0x18
>   Tue Jan 10 17:06:48 2017       
>   +-----------------------------------------------------------------------------+
>   | NVIDIA-SMI 367.44                 Driver Version: 367.44                    |
>   |-------------------------------+----------------------+----------------------+
>   | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
>   | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
>   |===============================+======================+======================|
>   |   0  Quadro 2000         Off  | 0000:00:04.0     Off |                  N/A |
>   | 30%   35C    P8    N/A /  N/A |      0MiB /   963MiB |      0%      Default |
>   +-------------------------------+----------------------+----------------------+
>                                                                                
>   +-----------------------------------------------------------------------------+
>   | Processes:                                                       GPU Memory |
>   |  GPU       PID  Type  Process name                               Usage      |
>   |=============================================================================|
>   |  No running processes found                                                 |
>   +-----------------------------------------------------------------------------+
> 
> Beauty! It's very slow to execute, though. And Xorg is not in a hurry
> to start working:
>   [   204.724] (--) PCI:*(0:0:4:0) 10de:0dd8:10de:084a rev 161, Mem @ 0xc2000000/33554432, 0x3400000000/134217728, 0x3408000000/67108864, I/O @ 0x00002080/128, BIOS @ 0x????????/65536
>   [...]
>   [   204.736] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
>   [   204.736] (==) NVIDIA(0): RGB weight 888
>   [   204.736] (==) NVIDIA(0): Default visual is TrueColor
>   [   204.736] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
>   [   204.738] (**) NVIDIA(0): Enabling 2D acceleration
>   [   213.674] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:0:4:0
>   [   213.674] (--) NVIDIA(0):     CRT-0
>   [   213.674] (--) NVIDIA(0):     DFP-0 (boot)
>   [   213.674] (--) NVIDIA(0):     DFP-1
>   [   213.674] (--) NVIDIA(0):     DFP-2
>   [   213.674] (--) NVIDIA(0):     DFP-3
>   [   213.675] (--) NVIDIA(0):     DFP-4
>   [   213.698] (--) NVIDIA(0): CRT-0: disconnected
>   [   213.698] (--) NVIDIA(0): CRT-0: 400.0 MHz maximum pixel clock
>   [   213.698] (--) NVIDIA(0): 
>   [   213.743] (--) NVIDIA(0): DELL 2007FP (DFP-0): connected
>   [   213.743] (--) NVIDIA(0): DELL 2007FP (DFP-0): Internal TMDS
>   [   213.743] (--) NVIDIA(0): DELL 2007FP (DFP-0): 330.0 MHz maximum pixel clock
>   [...]
>   [   213.747] (II) NVIDIA(0): NVIDIA GPU Quadro 2000 (GF106GL) at PCI:0:4:0 (GPU-0)
>   [   213.747] (--) NVIDIA(0): Memory: 1048576 kBytes
>   [   213.747] (--) NVIDIA(0): VideoBIOS: 70.06.0d.00.02
>   [   213.747] (II) NVIDIA(0): Detected PCI Express Link width: 16X
>   [   213.748] (**) NVIDIA(0): Using HorizSync/VertRefresh ranges from the EDID for display
>   [   213.748] (**) NVIDIA(0):     device DELL 2007FP (DFP-0) (Using EDID frequencies has
>   [   213.748] (**) NVIDIA(0):     been enabled on all display devices.)
>   [...]
>   [   213.751] (II) NVIDIA(0): Virtual screen size determined to be 1600 x 1200
>   [   213.761] (--) NVIDIA(0): DPI set to (99, 98); computed from "UseEdidDpi" X config
>   [   213.761] (--) NVIDIA(0):     option
>   [   213.761] (--) Depth 24 pixmap format is 32 bpp
>   [   213.767] (II) NVIDIA: Reserving 12288.00 MB of virtual memory for indirect memory
>   [   213.767] (II) NVIDIA:     access.
>   [   216.789] (EE) NVIDIA(GPU-0): Failed to initialize DMA.
>   [   216.789] (EE)  *** Aborting ***
>   [   216.791] (EE) NVIDIA(0): Failed to allocate push buffer
>   [   216.839] (EE) 
>   Fatal server error:
>   [   216.839] (EE) AddScreen/ScreenInit failed for driver 0
> 
> Linux still doesn't work (curse Ubuntu! what a mess. It tried to start
> Xorg at boot, so I managed to disable that, but no matter what, I
> couldn't stop it from trying to run 'nvidia-smi' at boot! And trust me,
> I tried a lot. I removed all the scripts related to nvidia, /etc/udev/
> is basically empty [/etc just looks like a pile-up of crap, wow!], yet
> /usr/bin/nvidia-smi still tried to run by itself until I moved it away).
> 
> dmesg:
>   [    1.390957] nvidia: module verification failed: signature and/or required key missing - tainting kernel
>   [    1.394715] nvidia 0000:00:04.0: can't derive routing for PCI INT A
>   [    1.395185] nvidia 0000:00:04.0: PCI INT A: no GSI
>   [    1.414173] vgaarb: device changed decodes: PCI:0000:00:04.0,olddecodes=io+mem,decodes=none:owns=io+mem
>   [    1.417062] nvidia-nvlink: Nvlink Core is being initialized, major device number 247
>   [    1.417609] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  375.26  Thu Dec  8 18:36:43 PST 2016 (using threaded interrupts)
>   [    1.419820] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  375.26  Thu Dec  8 18:04:14 PST 2016
>   [    1.422067] [drm] [nvidia-drm] [GPU ID 0x00000004] Loading driver
>   [...]
>   [    3.904893] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 246
> # lspci -vvn
>   00:04.0 0300: 10de:0dd8 (rev a1) (prog-if 00 [VGA controller])
>           Subsystem: 10de:084a
>           Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>           Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>           Latency: 0
>           Interrupt: pin A routed to IRQ 16
>           Region 0: Memory at c2000000 (32-bit, non-prefetchable) [size=32M]
>           Region 1: Memory at 3400000000 (64-bit, prefetchable) [size=128M]
>           Region 3: Memory at 3408000000 (64-bit, prefetchable) [size=64M]
>           Region 5: I/O ports at 2080 [size=128]
>           [virtual] Expansion ROM at c0080000 [disabled] [size=512K]
>   [...]
> But:
>   # ./nvidia-smi 
>   No devices were found
> dmesg:
>   [  173.498953] NVRM: RmInitAdapter failed! (0x53:0x3:1856)
>   [  173.499115] NVRM: rm_init_adapter failed for device bearing minor number 0
> 
> Not sure what's happening. But I'll try with AMD/ATI card.
> 
> -- 
> [SorAlx]  ridin' VN2000 Classic LT
> _______________________________________________
> freebsd-virtualization at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
> To unsubscribe, send any mail to "freebsd-virtualization-unsubscribe at freebsd.org"
> 

-- 
Rod Grimes                                                 rgrimes at freebsd.org


More information about the freebsd-virtualization mailing list