Single GPU passthrough with Proxmox

khaudio · February 22, 2017, 5:17am

Lately, I’ve been taking a journey through the various methods of GPU passthrough on a Linux host. For some reason, success eludes me on everything except for Proxmox, so I’ve settled on that for now. I’ve been posting updates around, but I figured it was time to stop crowding the other threads, and make my own. I’ll post updates here, as I make progress.

I’m going to jump ahead and assume you’ve at least brushed over the numerous guides on how to identify your gpu with lspci and bind it to vfio.

While you can use the web interface to edit VMs, you still need to manually edit the .conf files for GPU passthrough.

Here’s a bash script I hacked together to edit .conf files in Proxmox.

#!/bin/bash
# Edits Proxmox VM entered by user.
cd /etc/pve/qemu-server
for filename in *.conf; do
    cat $filename | grep name | sed "s/name:/VMID: $filename/" | sed "s/.conf/      Name:/" | sed "s/^/     /g"
done
echo "Enter the VMID you wish to edit:"
read vmid
nano /etc/pve/qemu-server/$vmid.conf

I needed to use the “romfile” parameter when passing a single gpu, but it only worked after the installation of this package found on the proxmox forums:

wget http://odisoweb1.odiso.net/qemu-server_4.0-103_amd64.deb && dpkg -i http://odisoweb1.odiso.net/qemu-server_4.0-103_amd64.deb

After installing that, edit the vmid.conf file and add (replace X and Y with your needed variables):

machine: q35
bios: ovmf
hostpci0: 0X:00,pcie=1,x-vga=on,romfile=Y.rom

That allowed me to pass my sole GPU (X99 build, so no igpu) to a VM. I’m still working on having windows pick it up, but Antergos works very well. With 2 gpus in the system, however, windows will take it just fine. Another challenge is getting any VM with a gpu passed to it to boot with more than 8 logical CPUs. My system will say it’s booting the VM, won’t give a VNC console output (it never does when it has a physical GPU), but won’t display anything either. If anyone has a solution, please let me know.

You can download a GPU rom from https://www.techpowerup.com/vgabios/
or grab your own with:

cd /sys/bus/pci/devices/0000:0X:00.0/
echo 1 > rom
cat rom > /usr/share/kvm/Y.rom
echo 0 > rom

I had to get mine from GPU-Z in windows (dual booted; not a VM), since proxmox constantly gave me a “ROM contents invalid” error. Move it to /usr/share/kvm/Y.rom.

If that doesn’t work, add the grub boot flag video=efifb:off
It didn’t seem to make a difference for me, but it is key for others.

One more thing to note is that in my case, my connected display shows “no signal” and shuts off for 10 or so seconds before lighting back up and displaying the VM I just started. It looks like it didn’t work, but then suddenly does work, so if it doesn’t work right away, just give it a minute before you kill the VM. In my case, I’m using a GTX 980ti.

I think I’ll soon be revising this post into a guide, but for now, I wanted to get the commands and script out to those that may need it, since I’ve seen a lot of posts looking to pass a GPU to a VM. I struggled with this for a number of weeks, and am still working to troubleshoot the remaining issues. Hope it helps for now.

khaudio · May 11, 2020, 7:12pm

I’m circling back around and have some updates.

Now, I’m on a taichi x470 ultimate board with a Ryzen 5 2600X; still passing through the strix 980ti with it being the only gpu in the system.

The key thing in proxmox that makes this work is removing and readding the gpu per the arch wiki

saved script as prepgpu and added to PATH

#!/bin/bash
echo 1 > /sys/bus/pci/devices/0000\:00\:03.1/remove
echo 1 > /sys/bus/pci/rescan

I have to do this before I start the windows 10 vm with the gpu- every time. Neither video:efifb=off nor video:vesafb=off,video:efifb=off nor pci=realloc negates this requirement. I went ahead and spun up an alpine container with ssh key access to the host (yes, it’s a security issue…) and added the following to /etc/local.d/reset_gpu_and_poweroff.start in the container:

#!/bin/sh
ssh [email protected] prepgpu && poweroff

then ran rc-update add local default to run it at boot.

Screen Shot 2020-05-11 at 1.55.36 PM

running the container resets the gpu and powers itself down without having to access the host via cli, all within roughly 15-20 seconds. then, the windows vm can be powered on without the BAR 3: cannot reserve [mem 0xf0000000-0xf1ffffff 64bit pref] errors appearing in dmesg.

The other thing to make this work was a few settings in in vmid.conf:

agent: 1,fstrim_cloned_disks=1
args: -no-hpet -rtc driftfix=slew -global kvm-pit.lost_tick_policy=discard -cpu host,hv_time,kvm=off,+kvm_pv_unhalt,+kvm_pv_eoi,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_vendor_id=proxmox -machine kernel_irqchip=on
cpu: host,flags=+ibpb;+hv-tlbflush
hostpci0: 0e:00,pcie=1,romfile=980ti_dumped.rom,x-vga=1
machine: q35

Setting the card to pci instead of pcie yielded a Code 43 error. setting machine to pc-q35-3.1 didn’t seem to make a difference for me.

The thing that is unsatisfactory is that the IOMMU groups are suboptimal, and since I have one more pcie card to pass through to another vm, the gpu runs at x8, since the other x16 slot is the only other slot I can pass through successfully. It works when I leave the PCIe gen settings in UEFI on auto, but when forcing it to Gen3 per another thread, it disables the 10Gbe NIC, which is the same thing that happens when using slots other than the 2nd x16 slot, BUT when using both x16 slots, both are set to x8 speed. So, no matter what I do, the GPU is bound to x8. I get this in dmesg:

pci 0000:0e:00.0: 63.008 Gb/s available PCIe bandwidth, limited by 8 GT/s x8 link at 0000:00:03.1 (capable of 126.016 Gb/s with 8 GT/s x16 link)

Either way, it’s better than nothing, and I’m quite happy with pve 6 overall.

khaudio · May 13, 2020, 3:15am

Ah. Well. Turns out, all I had to do was disable SR-IOV and enable Above 4G Decoding in UEFI, and now all my dmesg errors have gone away; no need to reset the pcie bridge between win VM boots. Nice.

khaudio · May 13, 2020, 3:30am

Oh but wait, minute-to-minute update… I fired up the timespy benchmark to see how it performed. It froze, I smelled that terrible burning smell, and now the GPU is dead.

It’s the asus strix model (fans turn off below ~65ºC), so maybe passing it through screwed it up… /sigh.

joe2gaan · May 13, 2020, 7:37pm

Not good… I hope it is not really dead.

khaudio · May 13, 2020, 9:45pm

I took it apart today. The gpu chip itself wreaks of death. ;_;

At least the kids got to see the innards of the card and learn a little about thermodynamics. I’m hoping nvidia releases a new consumer line in that stream tomorrow so I can replace the dead card with a cheaper, older one. I’d go AMD, but I quite like gamestream, especially since it’s a VM in a server.

Trooper_ish · May 13, 2020, 10:12pm

Oh jeepers dude, RIP

The_Riddick · May 16, 2020, 12:53am

I thought those GPU’s had HARDCODED safe temp shutoff detected. Maybe it was the 10 series that did that?

I might try this at some point once I get a 8core AMD APU, fortunately I have a watercooling kit on my 1080TI Mini that does not run off the GPU fan controller (I broke that somehow and now its stuck on 100%).

Is it possible to pass-through usb sound cards like my OMNI 5.1 which has custom creative software?

khaudio · May 17, 2020, 3:47am

As far as I understand, usb is easier to pt than pcie, and there is more than one method to do so.

Yeah… I wonder the same thing. I want to pick up another card and try it out, but I’m kind of afraid of killing another one, so it’ll have to be new from a vendor that has painless returns. I would think all graphics cards would have some kind of safety precautions, which is what makes me think that the vfio stuff changed something.

Care to elaborate?

GigaBusterEXE · May 17, 2020, 4:13am

I ordered a K40 Tesla, it should be here Tuesday, I’m going to try and do passthrough, if it works well it might be an option for a replacement sinces it’s Titan black level GPU without the GeForce virtualization restrictions for only 100$

Trooper_ish · May 17, 2020, 10:52am

Huh, I didn’t think the K40 had outputs. Would it still render video out? Would it need looking glass?

I mean, the card is for compute, so could do headless rendering or whatever? I didn’t think that benefitted from passthrough though

khaudio · May 19, 2020, 1:22am

Alright, new nvidia gpu installed, right before a product refresh, yippee.

Now getting BAR 1 error messages. Hopefully futzing with bios and boot args can solve it. If not, I may have to spin back up that container.

The thing that unsettles me is seeing the GPU fans stop when it is passed through. I installed afterburner in the windows VM, which works to control fan speed, but I’m unsure what to do with Linux containers using it for ML, and whether I’ll have to install some kind of manual fan control in every guest OS or not. Hopefully it’s just a fan profile in the vbios or whatever and not an actual bug.

khaudio · May 19, 2020, 7:35pm

Solved. It turns out, the problem was an air gap between the ears.

I have PVE installed on top of ZFS, which uses systemd-boot (by default when using ZFS) instead of GRUB, so none of my changes in /etc/default/grub were working. I had to put them in /etc/kernel/cmdline. What’s more, it now takes my pcie_acs_override arg, so testing different slots for my firewire card (different VM) should be easier. Up to this point, I’ve had to use the GPU in the 1st x16 slot and the FW card in the 2nd x16 slot, which sets them both to x8. It’s probably fine, but it’d be nice to move it to an x1 slot and make some room for future expansion, like a SAS HBA or something.

tl;dr if you found yourself here on a search:
If you’re using proxmox and ZFS, your kernel boot flags go in /etc/kernel/cmdline and you apply them with pve-efiboot-tool refresh NOT using GRUB.

Here’s what I now have in the VM conf

agent: 1,fstrim_cloned_disks=1
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=proxmox,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_tlbflush,hv_ipi,kvm=off'
balloon: 1024
bios: ovmf
boot: c
bootdisk: scsi0
cores: 8
cpu: host,hidden=1,flags=+ibpb;+pdpe1gb
hostpci0: 0e:00,pcie=1,romfile=patched.rom,x-vga=1
machine: pc-q35-3.1
memory: 8192
numa: 0
ostype: win10
sockets: 1
vga: none

I went ahead and patched the vbios with Marvo2011’s fork of the NVIDIA-vBIOS-VFIO-Patcher. I’m unsure if it matters, at this point, but it’s working, so I’ll leave it be. Curiously, the rom I dumped appears to be valid, according to rom-parser, but only the one I pulled from techpowerup works with the patcher.

The_Riddick · May 23, 2020, 1:52am

I was fiddling with this stupid fan controller where you plug in PWM in and it has a control method for multiple fans, but for some reason it wasn’t working correctly with my GPU and the wiring was weird.

It would keep throttling down the fan to low and have no software control so I tried a direct wire method and well, GPU didn’t like that and something broke as far as I can see.

Its very easy to send back voltage down the fan speed-meter control wire basically. (forgot which color it was).

Not a major, the 1080TI is going to be sold with block and people can install their own external temp fan control which is the BEST way to do it. ATM I just have my motherboard controlling the 2 fans on the 240mm ran.