Getting dreaded Error Code 43 when trying to passthrough an Nvidia GPU to a Windows Virtual Machine using QEMU/KVM in ProxMox

Vitalius · December 14, 2017, 1:34pm

Continuing the discussion from Configuring a headless Linux OS installation strictly for virtualizing then managing a Windows installation?:

To summarize from the previous thread:

Trying to passthrough the GPU so that I can have Linux host to manage the computer while the users only see Windows.
Hardware is the following:

AMD Athlon X4 845 CPU
ASRock FM2A88M PRO3+ Motherboard
8GB of DDR3 Kingston RAM
120GB SSD
Nvidia GT 710 GPU

Config file for the VM on ProxMox is the following:

 bios: ovmf 
 bootdisk: virtio0
 cores: 4
 cpu: host
 efidisk0: local-lvm:vm-100-disk-2,size=128K
 hostpci0: 02:00,pcie=1,x-vga=on,romfile=GK208_BIOS_FILE.bin
 machine: q35
 memory: 7168
 name: Test-Windows
 net0: virtio=96:24:CF:B4:AA:A2,bridge=vmbr0 
 numa: 0
 ostype: win10
 scsihw: virtio-scsi-pci
 smbios1: uuid=f675c872-c390-4668-9c48-423f5b4ff239
 sockets: 1
 usb0: host=6-1.2 # Mouse & Keyboard
 usb1: host=2-4 # Other 
 usb2: host=3-1.2.3.4 # Physical 
 usb3: host=1-1.2.3.4 # USB Ports
 virtio0: local-lvm:vm-100-disk-1,cache=writeback,size=90G

The BIOS bin file is parsed correctly and UEFI capable. The rom parsing instructions comes from one of the below links:

 root@pve-001:~/rom-parser# ./rom-parser /usr/share/kvm/GK208_BIOS_FILE.bin
     Valid ROM signature found @600h, PCIR offset 190h
         PCIR: type 0 (x86 PC-AT), vendor: 10de, device: 128b, class: 030000
         PCIR: revision 0, vendor revision: 1
     Valid ROM signature found @fc00h, PCIR offset 1ch
         PCIR: type 3 (EFI), vendor: 10de, device: 128b, class: 030000
         PCIR: revision 3, vendor revision: 0
             EFI: Signature Valid, Subsystem: Boot, Machine: X64
     Last image

I’ve followed the following guides trying a mix and match of settings:

https://pve.proxmox.com/wiki/Pci_passthrough

https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF

I realize this isn’t Fedora 26 or RyZen, but useful info regardless:

I could’ve easily missed a step in one of them given all the information I’m trying to combine here.

I have the latest Nvidia Drivers installed (version 388).
IOMMU is working:

root@pve-001:~# dmesg | grep -e IOMMU 
[    0.615425] AMD-Vi: IOMMU performance counters supported 
[    0.617069] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40 
[    0.618547] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

root@pve-001:~# find /sys/kernel/iommu_groups/ -type l 
/sys/kernel/iommu_groups/7/devices/0000:00:13.2        
/sys/kernel/iommu_groups/7/devices/0000:00:13.0        
/sys/kernel/iommu_groups/5/devices/0000:00:11.0        
/sys/kernel/iommu_groups/3/devices/0000:00:09.0        
/sys/kernel/iommu_groups/11/devices/0000:00:15.2       
/sys/kernel/iommu_groups/11/devices/0000:00:15.0       
/sys/kernel/iommu_groups/11/devices/0000:05:00.0       
/sys/kernel/iommu_groups/1/devices/0000:00:03.0        
/sys/kernel/iommu_groups/1/devices/0000:02:00.1        
/sys/kernel/iommu_groups/1/devices/0000:00:03.1        
/sys/kernel/iommu_groups/1/devices/0000:02:00.0        
/sys/kernel/iommu_groups/8/devices/0000:00:14.2        
/sys/kernel/iommu_groups/8/devices/0000:00:14.0        
/sys/kernel/iommu_groups/8/devices/0000:00:14.3        
/sys/kernel/iommu_groups/6/devices/0000:00:12.2        
/sys/kernel/iommu_groups/6/devices/0000:00:12.0        
/sys/kernel/iommu_groups/4/devices/0000:00:10.0        
/sys/kernel/iommu_groups/12/devices/0000:00:18.4       
/sys/kernel/iommu_groups/12/devices/0000:00:18.2       
/sys/kernel/iommu_groups/12/devices/0000:00:18.0       
/sys/kernel/iommu_groups/12/devices/0000:00:18.5       
/sys/kernel/iommu_groups/12/devices/0000:00:18.3       
/sys/kernel/iommu_groups/12/devices/0000:00:18.1       
/sys/kernel/iommu_groups/2/devices/0000:00:08.0        
/sys/kernel/iommu_groups/10/devices/0000:00:14.5       
/sys/kernel/iommu_groups/0/devices/0000:00:02.2        
/sys/kernel/iommu_groups/0/devices/0000:00:02.0        
/sys/kernel/iommu_groups/0/devices/0000:01:00.0        
/sys/kernel/iommu_groups/9/devices/0000:00:14.4

The vfio-pci driver is being used correctly:

root@pve-001:~# lspci -k
02:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1) 
Subsystem: eVga.com. Corp. GK208 [GeForce GT 710B] 
Kernel driver in use: vfio-pci 
Kernel modules: nvidiafb, nouveau

But after all that, I still get this:
Screenshot from 2017-12-13 14-19-32

Note how the basic “Microsoft Display Adapter” isn’t there.

Is it because I’m using VirtIO? All the guides seem to be using SCSI without VirtIO. I’d like as much performance as possible, but if it’s just not happening with VirtIO, I can use SCSI.

catsay · December 14, 2017, 2:01pm

Have you done the following yet to fool Nvidia’s drivers?

https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#.22Error_43:_Driver_failed_to_load.22_on_Nvidia_GPUs_passed_to_Windows_VMs

Vitalius · December 14, 2017, 2:19pm

BTW, if you add “,x-vga=on” to the hostpci, it should do the tricks for hv_vendor, vga=none, etc…

So presumably, if I have x-vga=on, it does the hv_vendor_id thing for me?

The next post though says this:

Unfortunately, it seems that “hv_vendor_id=proxmox” – the option that is set when “x-vga=on” is specified – is probably not correct in all cases. Specifically, it may be a problem for all nvidia cards.

So if I use x-vga=on, it might actually break it?

catsay · December 14, 2017, 2:24pm

I’m not clued up on how proxmox does it or what problems there may be.
I use arch and fedora.
But nvidia drivers refuse to work without the vendor_id func set to on. So set and see if it works is as good advice as I can offer right now.

FWIW In my case on arch I simply used to fill it with the GPU manufacturer name, such as gigabyteetc.

Vitalius · December 14, 2017, 2:33pm

So ProxMox is a linux distribution that uses QEMU/KVM to virtualize OSes.

Using the qm showcmd <vid>, I get this, which is the command used to actually start the VM with my current settings:

/usr/bin/kvm 
 -id 100 
 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' 
 -mon 'chardev=qmp,mode=control' 
 -pidfile /var/run/qemu-server/100.pid 
 -daemonize 
 -smbios 'type=1,uuid=f675c872-c390-4668-9c48-423f5b4ff239' 
 -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/kvm/OVMF_CODE-pure-efi.fd' 
 -drive 'if=pflash,unit=1,id=drive-efidisk0,file=/dev/pve/vm-100-disk-2' 
 -name Test-Windows 
 -smp '4,sockets=1,cores=4,maxcpus=4' 
 -nodefaults 
 -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' 
 -vga none
 -nographic
 -no-hpet
 -cpu 'kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=proxmox,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,enforce,kvm=off'
 -m 7168 
 -k en-us 
 -readconfig /usr/share/qemu-server/pve-q35.cfg 
 -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' 
 -device 'vfio-pci,host=02:00.1,id=hostpci1,bus=ich9-pcie-port-2,addr=0x0' 
 -device 'usb-host,hostbus=6,hostport=1.2,id=usb0' 
 -device 'usb-host,hostbus=2,hostport=4,id=usb1' 
 -device 'usb-host,hostbus=3,hostport=1.2.3.4,id=usb2' 
 -device 'usb-host,hostbus=1,hostport=1.2.3.4,id=usb3' 
 -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' 
 -iscsi 'initiator-name=iqn.1993-08.org.debian:01:6a10a080c99' 
 -drive 'file=/dev/pve/vm-100-disk-1,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect-zeroes=on' 
 -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' 
 -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' 
 -device 'virtio-net-pci,mac=96:24:CF:B4:AA:A2,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' 
 -rtc 'driftfix=slew,base=localtime' 
 -machine 'type=q35' 
 -global 'kvm-pit.lost_tick_policy=discard`

So based on all that, it’s using hv_vendor_id=proxmox. I’ll probably overwrite that one and see if that fixes it.

catsay · December 14, 2017, 2:37pm

What version of qemu does proxmox currently have?

Vitalius · December 14, 2017, 2:44pm

Everything that could possibly be relevant to this situation:

root@pve-001:~# pveversion -v 
proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve) 
pve-manager: 5.1-35 (running version: 5.1-35/722cc488) 
pve-kernel-4.13.4-1-pve: 4.13.4-25 
libpve-http-server-perl: 2.0-6 
lvm2: 2.02.168-pve6                                                                                                                                                                            
corosync: 2.4.2-pve3 
libqb0: 1.0.1-1 
pve-cluster: 5.0-15 
qemu-server: 5.0-17 
pve-firmware: 2.0-3 
libpve-common-perl: 5.0-20 
libpve-guest-common-perl: 2.0-13 
libpve-access-control: 5.0-7 
libpve-storage-perl: 5.0-16 
pve-libspice-server1: 0.12.8-3 
vncterm: 1.5-2 
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17 
pve-firewall: 3.0-3 
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2 
glusterfs-client: 3.8.8-1 
lxc-pve: 2.1.0-2 
lxcfs: 2.0.7-pve4 
criu: 2.11.1-1~bpo90 
novnc-pve: 0.6-4 
smartmontools: 6.5+svn4324-1 
zfsutils-linux: 0.7.2-pve1~bpo90

So to answer your question, 5.0-17 for the server, and 2.9.1-2 for qemu-kvm.

catsay · December 14, 2017, 2:53pm

Hmm well then spoofing vendor_id for the hypervisor should be supported and work.

The only other important part is -cpu 'kvm=off'
which is equivalent to hidden state='on' in virtd.

Ok so the fix may be to just set the x-vga=on on the hostpci setting, look at the resulting command with qm showcmd and copy the -cpu part to args.

Hope that helps.

Vitalius · December 14, 2017, 3:05pm

Yep, that’s what I’m trying next. Weird how hv_vendor_id=proxmox doesn’t work but otherwise anything else works. I guess Nvidia added “proxmox” to their list of “yep it’s virtualized” vendor_id checks.

Edit: @catsay

root@pve-001:~# qm showcmd 100
vm 100 - unable to parse value of 'cpu' - format error
cputype: property is missing and it is not optional
hv_vendor_id: property is not defined in schema and the schema does not allow additional properties

In the command it shows, it has the arguments I posted and the defaults it uses. So -cpu is specified twice over.

I had figured it would know to override the settings, but I guess I’ll have to figure that out or just manually enter the command in a shell.

Edit 2:

So I’m manually running the command rather than using ProxMox’s GUI for now to test things. I’m switching hv_vendor_id=proxmox to hv_vendor_id=Nvidia43FIX.

I get this error when I do this and have been for a while:

WARNING: Image format was not specified for '/dev/pve/vm-100-disk-2' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.
kvm: -device vfio-pci,host=02:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0,romfile=/usr/share/kvm/GK208_BIOS_FILE.bin: Failed to mmap 0000:02:00.0 BAR 3. Performance may be slow

Failing to Memory Map (I presume) but I’m not sure if that matters.

Still getting Error 43 in Windows 10 Pro.

FurryJackman · December 14, 2017, 4:26pm

Seems like your GPU is not being released by vfio-pci. For the record, all I’ve needed to do is kvm=off and a random hardware vendor ID, you do not need a ROM dump, in fact those could make the situation worse.

If all else fails, switch the card from being stubbed by vfio-pci to being stubbed by pci.stub in a kernel grub argument.

Vitalius · December 14, 2017, 5:48pm

I did that because of this:

https://pve.proxmox.com/wiki/Pci_passthrough#romfile

romfile
http://lime-technology.com/forum/index.php?topic=43644.msg482110#msg482110

Some motherboard can’t gpu passthrough on the first pci slot by default because its vbios is shadowed during bootup. So we need to capture its bios when its working “normally” then when we move the card to slot 1 we can start the vm using the dumped vbios.

I confirmed this motherboard is doing that because when I try to get a copy of the BIOS rom in ProxMox and parse it, it fails. There is no other PCI-e x16 slot on this board (just a x1 slot) so I had to boot to Windows and use GPU-Z because somehow that bypasses what the motherboard provides to get a clean copy of the BIOS rom. I verified this by parsing the file it provided and it worked fine, as I show in the OP.

So this is a requirement as far as I can tell?

That’s the older way to handle it and I’ve read that carries a lot of baggage in terms of issues to be aware of when doing that.

EDIT I forgot to add:

I have disabled the Framebuffer in the kernel settings:

GRUB_CMDLINE_LINUX_DEFAULT=“quiet amd_iommu=on video=vesafb:off,efifb:off”

But I am still seeing BIOS messages, up to a point, on the display.

FurryJackman · December 14, 2017, 6:34pm

As long as the display modules don’t load, you should be fine if you’re running without a desktop environment. To be safe, add “nomodeset” and update your initial ramdisk and grub at the same time.

Vitalius · December 14, 2017, 6:47pm

Not sure what you mean here. I believe they aren’t loading? They’re listed under lspci -k though.

02:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1)
Subsystem: eVga.com. Corp. GK208 [GeForce GT 710B]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau

Also, ProxMox doesn’t have a DE installed to start with and I haven’t installed one.

To be safe, add “nomodeset” and update your initial ramdisk and grub at the same time.

Should I still do this given the above information?

FurryJackman · December 14, 2017, 11:34pm

You don’t want the kernel modules for display drivers to load at all. lspci will still detect the card, but the kernel modules won’t be loaded.

I’d still go ahead with nomodeset. But I’d blacklist Nouveau as well.

_adrian · December 14, 2017, 11:40pm

same here. you don’t have this in the kvm command you pasted.

i don’t know how to get proxmox to set it, but i’d be willing to bet that’s all you need.

Vitalius · December 18, 2017, 4:01pm

Well, I did the following changes:

Allowed unsafe interrupt mapping.
Switched from using pci-e=1 to just using PCI by removing that.
Switched x-vga=on to x-vga=1 as that’s what the wiki uses.

I now get a new error (so kinda progress I guess):

Your computer’s system firmware does not include enough information to properly configure and use this device. To use this device, contact your computer manufacturer to obtain a firmware or BIOS update. (Code 35)

And I’m still getting this error when starting the VM:

kvm: -device vfio-pci,host=02:00.0,id=hostpci0.0,bus=pci.0,addr=0x10.0,multifunction=on,romfile=/usr/share/kvm/GK208_BIOS_FILE.bin: Failed to mmap 0000:02:00.0 BAR 3. Performance may be slow

Doing cat /proc/iomem gives me this:

f0000000-fed3ffff : PCI Bus 0000:00
  f0000000-f9ffffff : PCI Bus 0000:02
    f0000000-f7ffffff : 0000:02:00.0
      f0000000-f7ffffff : vfio-pci
    f8000000-f9ffffff : 0000:02:00.0
      f9000000-f92fffff : efifb
  fa000000-fa0fffff : PCI Bus 0000:01
    fa000000-fa003fff : 0000:01:00.0
      fa000000-fa003fff : r8169
  fa100000-fa11ffff : 0000:00:08.0
  fd000000-fe0fffff : PCI Bus 0000:02
    fd000000-fdffffff : 0000:02:00.0
      fd000000-fdffffff : vfio-pci
    fe080000-fe083fff : 0000:02:00.1
      fe080000-fe083fff : vfio-pci

And after uninstalling and reinstalling the drivers, I’m back to Code 43. sigh

somenoob · December 18, 2017, 6:18pm

Try driver before 33X.xx there is no nvidia check, and you need x-vga=on, if you boot bios mode.

Vitalius · December 18, 2017, 6:34pm

Perhaps that will work. Will definitely try the drivers next.

x-vga=on was changed to x-vga=1 in my config because the wiki states it’s a boolean value and defaults to 0. I’m using UEFI via OVMF.

somenoob · December 18, 2017, 6:44pm

I didn’t read whole thread, but your gpu maybe doesn’t support uefi mode, if you didn’t check.

Vitalius · December 18, 2017, 6:50pm

It definitely does. All GT 710’s come with a UEFI capable ROM. Parsing the ROM I get using GPU-Z returns Type 3 UEFI.