Threadripper GPU Passthrough working with Vega

Hi there, I have another GPU passthrough up and running perfectly fine. I had less luck with the RX 480 so far, but I will try to get that Polaris card working next. I’m a developer and work collaboratively with AMD so I can adapt kernel code or drivers if necessary.

Here’s the setup for those that are interested:
CPU: 1950X
Mainboard: ASRock X399 Taichi @ Bios 1.70

  • Do enable SVM, ACS, PCIe ARI, IOMMU, SR-IOV and all the other virtualization bits you can find!

GPUs: GTX 1080 @ Slot 1 @ Host, RX 480 @ Slot 3 @ Host, AMD Vega FE @ Slot 4 @ Guest

Now to the configuration:

  • Fedora 26, kernel 4.11+ (ROCm kernel by AMD, but other kernel >=4.11 should work as well, given it has all the Virtualization bits and pieces enabled. I compiled this one myself.)
    You can download my kernel configuration here:
    https://pastebin.com/2i0gL8yV

  • /etc/modprobe.d/vfio.conf:

options vfio-pci ids=1002:6863,1002:aaf8
options kvm_amd avic=1

  • /etc/modprobe.d/kvm.conf:

#options kvm_intel nested=1
#options kvm_amd nested=1

  • virsh edit win10 (note: Does NOT work with SeaBios, UEFI (TianoCore/OVMF) is required. If necessary, convert your Windows installation with mbr2gpt.exe before attempting passthrough):

<domain type=‘kvm’>
<name>win10</name>
<memory unit=‘KiB’>16777216</memory>
<currentMemory unit=‘KiB’>16777216</currentMemory>
<vcpu placement=‘static’>8</vcpu>
<os>
<type arch=‘x86_64’ machine=‘pc-i440fx-2.7’>hvm</type>
<loader readonly=‘yes’ type=‘pflash’>/usr/share/edk2/ovmf/OVMF_CODE.fd</loader>
<nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd</nvram>
</os>
<features>
<acpi/>
<apic/>
<hyperv>
<relaxed state=‘on’/>
<vapic state=‘on’/>
<spinlocks state=‘on’ retries=‘8191’/>
</hyperv>
<vmport state=‘off’/>
</features>
<cpu mode=‘host-passthrough’ check=‘none’>
<topology sockets=‘1’ cores=‘4’ threads=‘2’/>
<feature policy=‘disable’ name=‘smep’/>
</cpu>
<clock offset=‘localtime’>
<timer name=‘rtc’ tickpolicy=‘catchup’/>
<timer name=‘pit’ tickpolicy=‘delay’/>
<timer name=‘hpet’ present=‘no’/>
<timer name=‘hypervclock’ present=‘yes’/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<pm>
<suspend-to-mem enabled=‘no’/>
<suspend-to-disk enabled=‘no’/>
</pm>
<devices>
<emulator>/usr/bin/qemu-kvm</emulator>
<disk type=‘block’ device=‘disk’>
<driver name=‘qemu’ type=‘raw’/>
<source dev=’/dev/sdb’/>
<target dev=‘sda’ bus=‘sata’/>
<address type=‘drive’ controller=‘0’ bus=‘0’ target=‘0’ unit=‘0’/>
</disk>
<disk type=‘block’ device=‘disk’>
<driver name=‘qemu’ type=‘raw’/>
<source dev=’/dev/sdd’/>
<target dev=‘sdb’ bus=‘sata’/>
<boot order=‘1’/>
<address type=‘drive’ controller=‘0’ bus=‘0’ target=‘0’ unit=‘1’/>
</disk>
<controller type=‘usb’ index=‘0’ model=‘ich9-ehci1’>
<address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x06’ function=‘0x7’/>
</controller>
<controller type=‘usb’ index=‘0’ model=‘ich9-uhci1’>
<master startport=‘0’/>
<address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x06’ function=‘0x0’ multifunction=‘on’/>
</controller>
<controller type=‘usb’ index=‘0’ model=‘ich9-uhci2’>
<master startport=‘2’/>
<address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x06’ function=‘0x1’/>
</controller>
<controller type=‘usb’ index=‘0’ model=‘ich9-uhci3’>
<master startport=‘4’/>
<address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x06’ function=‘0x2’/>
</controller>
<controller type=‘pci’ index=‘0’ model=‘pci-root’/>
<controller type=‘pci’ index=‘1’ model=‘pci-bridge’>
<model name=‘pci-bridge’/>
<target chassisNr=‘1’/>
<address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x07’ function=‘0x0’/>
</controller>
<controller type=‘pci’ index=‘2’ model=‘pci-bridge’>
<model name=‘pci-bridge’/>
<target chassisNr=‘2’/>
<address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x08’ function=‘0x0’/>
</controller>
<controller type=‘virtio-serial’ index=‘0’>
<address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x05’ function=‘0x0’/>
</controller>
<controller type=‘sata’ index=‘0’>
<address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x0b’ function=‘0x0’/>
</controller>
<interface type=‘network’>
<mac address=‘52:54:00:4a:5c:f7’/>
<source network=‘default’/>
<model type=‘rtl8139’/>
<address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x03’ function=‘0x0’/>
</interface>
<serial type=‘pty’>
<target port=‘0’/>
</serial>
<console type=‘pty’>
<target type=‘serial’ port=‘0’/>
</console>
<channel type=‘spicevmc’>
<target type=‘virtio’ name=‘com.redhat.spice.0’/>
<address type=‘virtio-serial’ controller=‘0’ bus=‘0’ port=‘1’/>
</channel>
<input type=‘mouse’ bus=‘ps2’/>
<input type=‘keyboard’ bus=‘ps2’/>
<input type=‘tablet’ bus=‘virtio’>
<address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x0a’ function=‘0x0’/>
</input>
<graphics type=‘spice’ autoport=‘yes’>
<listen type=‘address’/>
<image compression=‘off’/>
</graphics>
<sound model=‘ich9’>
<address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x04’ function=‘0x0’/>
</sound>
<video>
<model type=‘qxl’ ram=‘65536’ vram=‘65536’ vgamem=‘65536’ heads=‘1’ primary=‘yes’/>
<address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x02’ function=‘0x0’/>
</video>
<!-- GPU PASSTHROUGH FROM HERE -->
<hostdev mode=‘subsystem’ type=‘pci’ managed=‘yes’>
<source>
<address domain=‘0x0000’ bus=‘0x0a’ slot=‘0x00’ function=‘0x0’/>
</source>
<address type=‘pci’ domain=‘0x0000’ bus=‘0x02’ slot=‘0x04’ function=‘0x0’/>
</hostdev>
<hostdev mode=‘subsystem’ type=‘pci’ managed=‘yes’>
<source>
<address domain=‘0x0000’ bus=‘0x0a’ slot=‘0x00’ function=‘0x1’/>
</source>
<address type=‘pci’ domain=‘0x0000’ bus=‘0x02’ slot=‘0x05’ function=‘0x0’/>
</hostdev>
<!-- GPU PASSTHROUGH TO HERE -->
<redirdev bus=‘usb’ type=‘spicevmc’>
<address type=‘usb’ bus=‘0’ port=‘3’/>
</redirdev>
<redirdev bus=‘usb’ type=‘spicevmc’>
<address type=‘usb’ bus=‘0’ port=‘4’/>
</redirdev>
<redirdev bus=‘usb’ type=‘spicevmc’>
<address type=‘usb’ bus=‘0’ port=‘5’/>
</redirdev>
<memballoon model=‘virtio’>
<address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x09’ function=‘0x0’/>
</memballoon>
</devices>
</domain>

  • /etc/default/grub:

GRUB_CMDLINE_LINUX=“rd.driver.blacklist=nouveau modprobe.blacklist=nouveau rd.lvm.lv=fedora/swap rd.lvm.lv=fedora/root amd_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 pcie_acs_override=downstream amdgpu.exp_hw_support=1 rd.driver.pre=vfio-pci”

Works like a charm. Hope this can help somebody.

7 Likes

Thanks for sharing! Does passthrough work after rebooting the guest, without having to reboot the host?

I’m a bit curious as to whether you made any interesting discoveries on the way. How far did you get with RX, and do you have a hypothesis about why FE works?

I look forward to hear about you experiences with Polaris.

No, the Vega FE can be initialized exactly once, after that the VM halts when loading TianoCore due to not being able to reload the cards firmware. It’s not a huge issue for me since I will just boot Linux, run the VM and that’s it (until I reboot the whole system). Neither FLR (function level reset, I believe) nor power state cycling will bring the card back. Still unknown why this happens, but the bug has been copied from AMD’s Fury chip. But I believe it could be fixed with a BIOS update on the Vega FE’s end.

RX 480 used to work in passthrough on my Z97 + 4790K plus ACS override patch. Even when binding the card to AMDGPU drivers at boot instead of VFIO and even through rebooting the VM. Now on Threadripper the VM will immediately go into the “paused” state after starting with a RX 480 passthrough, regardless if the card was already bound to a driver or to VFIO on the host. The card will remain in sleep state until the host is rebooted. I believe it’s some kernel and UEFI issue on Threadripper with power cycling the PCIe bus. I think it will require AGESA and BIOS updates to get fixed. So that still looks very bad.

It might be that because Vega has bugs in power cycling / state switching that it actually works on Threadripper. Something along the line that even though Threadripper does PCIe bus power cycling wrongly, the Vega card will, due to bugs on that end, miraculously power cycle correctly when initialized by the TianoCore UEFI (however, resetting the card through SeaBios doesn’t work! Windows will halt during loading the driver / initializing the card).

I’m more of a CUDA/OpenCL developer but I am fluent in C and C++, so after a bit more debugging I could try out some kernel hacking if I have time. At least I’ll get to that if AMD doesn’t move forward fixing this the next 1-2 months.

Hey, have you tried passing through a USB controller (not individual USB devices, that works fine)? I have the same board as you and could not get that to work.

See my thread here.

I see. Your explanation is in line with what I have seen in reddit discussions, that it is the reset bug affecting some AMD cards which in this case enables them to be passed. In the past I have been able to work around the reset bug by “ejecting” the GPU from the “safely remove hardware” icon in Windows before rebooting the guest (this was using a 7790 or a 6850, I don’t remember which card was affected), but it does not seem to work nowadays. Perhaps it required Seabios, but if Seabios won’t work then it’s no use. I hope that you or AMD can find a solution, the possibility to reboot the guest without the host is currently my dealbreaker when it comes to buying Threadripper or not (NPT is of less importance to me).

There is some stuff here about vega gpu resets here and here:
https://lists.freedesktop.org/archives/dri-devel/2017-September/153443.html
https://lists.freedesktop.org/archives/amd-gfx/2017-October/014788.html

Could be some good news.
Something for @wendell to make sense of, I’m not smart enough… :disappointed_relieved:

1 Like

This bug went unfixed for 10 years and the fix was literally a one-liner…

I don’t even know what to say.

I don’t think this is a fix that’s acceptable to be merged upstream.

Still, we can definitely make this change to address the NPT issue for the time being on an individual level.

Just curious, has anyone checked to see if the 4.15 RC kernels have made any progress in supporting Polaris?

Having the 1950x… If I want to use closer to the 16 cores… The vcpu settings which everyone seems to suggest as “8”… Does this number directly reflect the number of cores used? as well as the cpu mode --> topology sockets???
So confused… Any help would be appreciated…