The Pragmatic Neckbeard 4: KVM and libvirt

sinisa94 · February 11, 2017, 8:54pm

Aside of topic, I wanted to ask if there is a way to isolate let's say 2 cores from host and pin it to guest so it goes like "passed throu cores without overhead that is getting caused cuz vcores'threads are getting all around host cores cuz "reasons" even when pinned.

mythicalcreature · February 11, 2017, 9:14pm

This may be possible if you can force the host to boot the graphics card in UEFI mode (I may not have the terminology correct here). My motherboard has a feature that allows setting each PCIe device to boot in UEFI or legacy mode.
Setting a non primary graphics card to boot in UEFI mode allows me to still get pre-OS and pre-driver graphics output from the device.

sinisa94 · February 11, 2017, 11:09pm

maybe you're right, my mobo has same feature (or similar) but it doesnt give any issue on archlinux with same setup :/ (btw i found out that turbo boost doesnt work on arch (a8-5600k) and also priority of threads/apps is bugged with qemu so it doesnt fully utilize cores when needed, so setting fixed multiplier without step controls/vcore lowering, am getting pretty good performances now :D

SgtAwesomesauce · February 13, 2017, 7:34am

In your bios, you should be able to specify something along the lines of "default video device"

That should help you when it comes to which card the bios grabs.

SgtAwesomesauce · February 13, 2017, 7:37am

Right, there's an option for vcpu pinning. I can give you info on that tomorrow, I'm currently on mobile, so my responses are limited.

sinisa94 · February 13, 2017, 4:55pm

I've found out how to pin vcpus, but is there any way to make host not use lets say core 2-3 at all?

khaudio · February 13, 2017, 5:55pm

Dropping in for an update.

I went ahead and installed Antergos and fooled around. I had to nuke it twice when I screwed things up, but I ended up following the instructions exactly, and still haven't gotten any output from either GPU.

I have an hd7750 in pcie slot 1 and the 980ti I want to pass in slot 3.

The 980ti will NOT work, no matter what I do. any combination of ovmf, seabios, q35, rombar off, alt rom file, etc, yields nothing. The only thing loaded when entering lspci -nnk is vfio-pci.

Also, I reversed it all and tried passing the ATI card. same problem. blank screen; nothing. I even tried it with pci-stub instead, and that didn't work either. Additionally, the nvidia card now shows output during boot, but stops after "[OK] Started Hostname Service". SSH still works. I guess I screwed up the nouveau file or something. Not a huge deal for now.

Presently, I'm going back to vfio, then swapping the cards again to try and pass the 7750 from slot 3. I don't anticipate it'll work, but at least it shows me it's gotta be the board. I even checked and double-checked that both VT-x and VT-d were enabled.

One weird issue is that whenever I attempt to start a VM that has just 8 of the available 16 threads assigned to it, the entire system SLOOWS down to a crawl, even AFTER destroying the VM instance via virsh. Updating grub takes like 5 minutes before I can reboot and try again.

Next step after the card is to patch the kernel, even though each card is already in its own IOMMU group. If that doesn't work, I'm gonna give Xen a go.

Oh, and I did try passing all the PLX bridges to a VM... the system froze up and I lost the SSH connection I had open, so I won't be trying that again. In the meantime, I opened a text document with most of the commands and modifications needed, all in a row.

$ sudo pacman -Syyu && sudo pacman -S openssh && sudo nano /etc/ssh/sshd_config
    Port 22
    AddressFamily any
    ListenAddress 0.0.0.0
    ListenAddress ::
$ sudo systemctl enable sshd && sudo systemctl start sshd
$ sudo nano /usr/sbin/update-grub
    #!/bin/sh
    set -e
    exec grub-mkconfig -o /boot/grub/grub.cfg "$@"
$ sudo chown root:root /usr/sbin/update-grub && sudo chmod 755 /usr/sbin/update-grub
$ sudo nano /etc/default/grub
    GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on vfio-pci.ids=10de:17c8,10de:0fb0 video=efifb:off"
$ mkdir -p ~/VM/ISO & mkdir -p ~/VM/ROM
$ sudo nano ~/VM/print_iommu
    for d in /sys/kernel/iommu_groups/*/devices/*; do 
        n=${d#*/iommu_groups/*}; n=${n%%/*}
        printf 'IOMMU Group %s ' "$n"
        lspci -nns "${d##*/}"
    done;
$ sudo chmod +x ~/VM/print_iommu
$ sudo nano /etc/modprobe.d/nouveau.conf
    blacklist nouveau
$ sudo nano /etc/modprobe.d/vfio.conf
    options vfio-pci ids=10de:17c8,10de:0fb0
$ sudo nano /etc/mkinitcpio.conf
    MODULES="vfio vfio_iommu_type1 vfio_pci vfio_virqfd"
$ sudo mkinitcpio -p linux && sudo update-grub
$ sudo pacman -S libvirt virt-manager qemu yaourt
$ sudo reboot
$ wget https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/stable-virtio/virtio-win.iso && mv virtio-win.iso ~/VM/ISO/
$ yaourt -S ovmf-git
    n,n,y,y,y,y
$ sudo nano /etc/libvirt/qemu.conf
    nvram = [
        "/usr/share/ovmf/x64/ovmf_x64.bin:/usr/share/ovmf/x64/ovmf_vars_x64.bin"
    ]
$ sudo systemctl enable libvirtd
$ sudo systemctl start libvirtd
$ sudo pacman -S firewalld iptables ebtables dnsmasq
$ sudo systemctl enable firewalld && sudo systemctl enable iptables && sudo systemctl enable dnsmasq
$ sudo systemctl restart libvirtd
$ virt-manager
$ sudo virsh destroy win10-1
$ sudo virsh edit win10-1
    <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>

    <qemu:commandline>
    <qemu:arg value='-cpu'/>
    <qemu:arg value='host,hv_time,kvm=off,hv_vendor_id=null'/>
    </qemu:commandline>
$ sudo virsh start win10-1

At this point, I've accepted that it ultimately may not work. I'm thinking it's just a motherboard issue, since both cards are failing. Hope the above text helps someone. Of course some items need to be replaced if you're using a different card or distro. This one's for antergos/arch.

SgtAwesomesauce · February 13, 2017, 7:26pm

Not really. The host will not use that core when QEMU is under load, but that's about all I can give you.

Yeah, That's not supported. The bridges are needed by Linux.

I'm stumped as well. Your profile says you've got an X99-E WS. That board shouldn't have any issues with passthrough, but it does appear to be something odd on the hardware.

You said that you were able to pass the 980ti to Ubuntu on proxmox, right? That indicates that you've set up your BIOS correctly, otherwise it would fail in the assignment to QEMU.

Have you tried putting the 980 ti in the bottom few slots? I had issues with the middle PCIe slot on my z170-a board and was only able to get my GPU passed through on either the bottom or second to bottom slot.

I know it's silly, but it may be worth a try.

khaudio · February 13, 2017, 7:35pm

Very interesting. I'll give it a go when I get a chance. There's not enough room in the case to stick it in the last slot, but I'll definitely try the second to last.

sinisa94 · February 13, 2017, 9:47pm

I think I've found what is causing issues with cpu performance in guest os, powernow/speedstep makes cpu go low freq even while it shouldnt, so I've simply set it to be constant x39 and it works really good (only issue is that my motherboard is so cheap that it has no vcore settings in bios so i cant even properly overclock it :S).

khaudio · February 15, 2017, 3:58am

Ok, I've had some success. I gave XenServer a try. It was cool, but it didn't work out, so I'm now using Proxmox again.

Either gpu (7750 or 980ti) can be successfully passed through, but only the secondary card will work. Otherwise, it throws "invalid ROM contents." I'm okay-ish with this compromise, but I know there are people out there that have successfully passed the sole gpu in their system- it's just that none of their solutions have worked for me. I'm pretty happy that it actually works- puzzling why Proxmox is the only one for me that does. Nevertheless, I've added a few more boot flags and so on, so I'll try to find some time later tonight to swap the cards again. Also, I need to figure out how the qemu args to hide nvm translate to proxmox's implementation. Windows boots, but I haven't installed it yet, so we will see about error codes once I get it going.

SgtAwesomesauce · February 15, 2017, 4:32pm

There's a discord channel (not related to L1T) for helping with VFIO related things.

https://discord.gg/qHK3bDc

They may be able to help more than I can.

At this point, I'm 100% clueless when it comes to why it's only working with proxmox.

khaudio · February 15, 2017, 5:56pm

Great, I will check that out and see if anyone has succeeded.

For anyone else going the proxmox appraoch, here's a helpful little script I just whipped up with some google-fu on bash scripting:

#!/bin/bash
#
# Edits Proxmox VM entered by user.
echo "Enter the VM ID you wish to edit:"
read vmid
nano /etc/pve/qemu-server/$vmid.conf

answer 101, 102, etc for associated VM ID #.

Blanger · February 15, 2017, 7:46pm

Great series......thank you very much for taking the time and effort to produce such a informative detailed guide!

The clipboard........can you expand on that topic just a little, another tutorial would be nice but not necessary for me, but do explain sharing the clipboard between host and guest that would be a help to a lot of folks without using other methods.

khaudio · February 18, 2017, 5:41am

Last update (in case anyone was wondering how it turned out) on this thread; I may start a new one since I'm now using Proxmox (still KVM).

I still have 2 GPUs in the machine, which is problematic when dual booting mac OS. Proxmox, however, passes everything through perfectly. I ran unigine heaven in both a virtualized instance of Windows and a dual booted bare metal installation. They performed exactly the same, from what I could tell, though I didn't actually run the benchmark and compare the scores- I just eyeballed the frame rate.

The VMs that have a physical GPU passed to them don't boot with more than 8 CPU cores assigned to them. I have an 8c/16t CPU, so I don't understand why that is. Does anyone else here using libvirt/qemu/kvm on top of any other gnu/Linux have this issue?

Now, the goal now is to get 10.12 running on this sucker, so I don't have to dual boot.

I do have more questions about VM images on top of ZFS, but I'll save those for another thread.

mtarini · December 26, 2019, 7:57pm

@SgtAwesomesauce: Great series…in particular, the conceptual explanations in the beginning are very illuminating and help tie the whole thing together for newbies. A few q, though:

How does this tie-in with the ZFS esp. the part where you mention creating zvol?
Is it better to pass entire NVME/SATA SSDs to the VM or somehow use the ZFS pool created before for VMs (as you mentioned many advantages of doing so, such as snapshots etc)?

PS: Is the guide still applicable currently as a few to-do steps are different from Arch wiki or the comparatively recent distro specific guides here on Level1 ?

SgtAwesomesauce · December 26, 2019, 8:17pm

I’ll start here.

Yes, the guide should still work. Newer instructions may be better though, since I haven’t updated it.

The conceptual discussion absolutely still applies though.

Right, I kinda fell short there. You can create a zvol and use that as your vms C drive. Just set the path in your xml to /dev/zvol/path/to/zvol

That’s about the extent of it.

It really depends. You’ll get better raw performance out of an ssd, but you have better flexibility with a zpool.

mtarini · December 26, 2019, 8:38pm

Thanks. So it is similar to the example case you used in part 3

Okay…need to educate myself more on this. Is it possible to combine both raw perforamance/flexibility via, say -

root on ZFS (was reading more about ZFS after your part3 and here it says its not advisable: https://passthroughpo.st/zfs-configuration-linux-setup-basics/)

OR

OS on SSDs (host/VMs) and data on zpools as zvols? (Wouldn’t need higher capacity SSDs then, say 1 TB NVME in this case)

SgtAwesomesauce · December 26, 2019, 10:17pm

Somewhat.

ZFS is getting more performant, and how much you can eek out of it is very dependant on the tuning flags you apply. I’m not the best to answer tuning questions, but I’d you’re really interested in that, I’d recommend making a dedicated thread for that. I can’t think of one that exists and I feel it would be better served as it’s own, so people can find it better.

You can do either of your cases, however, I’d recommend doing number 2. I sort of do 2 at the moment. I have a lvm of ssds for my “hot data” that I use for my vm os disk, then I use my zfs array for less speed-sensitive data in both datasets and zvols. At the moment, my zvols don’t have any data in them though.

Exactly.

CGBS · December 27, 2019, 2:41am

Thanks for this. I’ve been wanting to VMify my Windows 10 box but haven’t done so as, even though my real gamer days are well behind me, every now and then things appear from the ether like Outer Worlds or Mechwarrior 5 (did they just use MW 4 graphics engine?) and i gotta get some. Not to mention my ever growing need for all things to be on ZFS.

That said, I still just have an old gtx 1070, and so even though they are still shiny and new, has anyone had any experience doing this process with the new Radeon Navi cards? If so is it even worth it for me to get one or will the pass through performance.of the 1070 be fine and I can just wait it out until I have an actual need to get something new? Not counting the joy of dealing with NVIDIA’s deep love and support of it’s customers on something other than bare metal Windows if course.