xwraith
November 5, 2018, 11:19pm
1
So I’ve run into something a bit strange. Currently I’m booting into Linux via GRUB, and the motherboard (or GRUB) is currently favoring my Vega graphics card over my WX3100 card even though the WX3100 card is in the first PCIe slot.
Naturally I can’t find a configuration in the BIOS that says, hey use just this one card at boot.
What happens is that my Vega card, which is intended for passthrough, boots and keeps the Loading Linux… and Loading initial ramdisk… from the boot process even after Linux boots. I’ve tried to tell GRUB to not use a graphical interface in the hopes that it would stop this behavior, but no luck.
Any idea how to troubleshoot/proceed?
If I try to pass through the video card to the VM, nothing loads and the video card just seems to reboot endlessly.
Thanks!
Does the behavior mirror PCI enumeration as shown by lspcI ?
What happens if you swap the slots for the graphics cards?
Hmm didn’t realize that the PCI ids were in a different order. After swapping the cards:
➜ ~ lspci | grep -i VGA
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Lexa XT [Radeon PRO WX 3100]
45:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon RX Vega 64] (rev c1)
I haven’t been able to do a VM test yet, but should get a chance shortly.
And nothing really different in the behavior other then that the BIOS did now prefer my secondary GPU.
My card still won’t passthrough though – still goes into what looks like an endless reboot loop.
Can anyone spot anything in my libvirt config?
➜ vm-config sudo virsh dumpxml win10-personal > win10-personal-passthrough.xml
➜ vm-config cat win10-personal-passthrough.xml
<domain type='kvm'>
<name>win10-personal</name>
<uuid>dbe1a2ce-e940-4a10-9152-5ad37cac3518</uuid>
<metadata>
<libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
<libosinfo:os id="http://microsoft.com/win/10"/>
</libosinfo:libosinfo>
</metadata>
<memory unit='KiB'>16777216</memory>
<currentMemory unit='KiB'>16777216</currentMemory>
<vcpu placement='static'>2</vcpu>
<os>
<type arch='x86_64' machine='pc-q35-3.0'>hvm</type>
<loader readonly='yes' type='pflash'>/usr/share/ovmf/x64/OVMF_CODE.fd</loader>
<nvram>/var/lib/libvirt/qemu/nvram/win10-personal_VARS.fd</nvram>
<bootmenu enable='yes'/>
</os>
<features>
<acpi/>
<apic/>
<hyperv>
<relaxed state='on'/>
<vapic state='on'/>
<spinlocks state='on' retries='8191'/>
</hyperv>
<vmport state='off'/>
</features>
<cpu mode='host-model' check='partial'>
<model fallback='allow'/>
</cpu>
<clock offset='localtime'>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
<timer name='hypervclock' present='yes'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<pm>
<suspend-to-mem enabled='no'/>
<suspend-to-disk enabled='no'/>
</pm>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='writethrough' io='threads' discard='unmap' detect_zeroes='on'/>
<source dev='/dev/FastArrayAlpha/win-10-boot-personal'/>
<target dev='sdb' bus='scsi'/>
<boot order='1'/>
<address type='drive' controller='0' bus='0' target='0' unit='1'/>
</disk>
<controller type='usb' index='0' model='qemu-xhci' ports='15'>
<address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
</controller>
<controller type='scsi' index='0' model='virtio-scsi'>
<address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
</controller>
<controller type='sata' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
</controller>
<controller type='pci' index='0' model='pcie-root'/>
<controller type='pci' index='1' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='1' port='0x10'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
</controller>
<controller type='pci' index='2' model='pcie-to-pci-bridge'>
<model name='pcie-pci-bridge'/>
<address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
</controller>
<controller type='pci' index='3' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='3' port='0x11'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
</controller>
<controller type='pci' index='4' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='4' port='0x12'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
</controller>
<controller type='pci' index='5' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='5' port='0x13'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
</controller>
<controller type='pci' index='6' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='6' port='0x14'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
</controller>
<controller type='pci' index='7' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='7' port='0x15'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
</controller>
<interface type='direct'>
<mac address='52:54:00:ce:ae:04'/>
<source dev='enp4s0' mode='bridge'/>
<model type='e1000'/>
<address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
</interface>
<input type='mouse' bus='ps2'/>
<input type='keyboard' bus='ps2'/>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x45' slot='0x00' function='0x0'/>
</source>
<address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x45' slot='0x00' function='0x1'/>
</source>
<address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
</hostdev>
<memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
</memballoon>
</devices>
</domain>
xwraith
November 10, 2018, 9:01pm
5
I stumbled across this post here and that allowed me to reset my Vega card and let passthrough work!
I cannot believe this but for me I just found a workaround. Ignore sudo if you have root)
Step 1:
Shutdown the VM.
Step 2:
Run these two commands (edit for your PCI topology!):
echo "1" | sudo tee -a /sys/bus/pci/devices/0000\:0a\:00.0/remove <-GPU
echo "1" | sudo tee -a /sys/bus/pci/devices/0000\:0a\:00.1/remove <-HDMI
Step 3 (I’m not sure this is needed):
Suspend to RAM
Step 4:
Run this command:
echo "1" | sudo tee -a /sys/bus/pci/rescan
Your VM will start normally if you have the same kind of bug I do. I’ve spent a solid 20 hours on this, can’t believe I didn’t try device removal. Rescan shouldn’t work with a device with no known reset method BUT it does, weird. My kernel is stock 4.15 Ubuntu, no patches or anything.