Host crashes when using libvirt (works fine without it)

Hi,

I have been using VMs (mostly windows 10, for gaming) for a while now, by running qemu manually and passing through a GPU (nvidia) without any issues.

I recently tried to start using libvirt (and virt-manager) but after a few minutes of running, the entire system crashes. (basically the image freezes on both GPUs, without displaying or logging any errors)
I even tried to run qemu with (almost) the same arguments from /var/log/libvirt/qemu/XX.log, and it still works fine when ran manually.

Assuming that it’s not a very random hardware issue, and that I’m not an idiot, what could be the issue?
What does libvirt do (besides running qemu with the logged arguments) that could cause the crash?

Any tips/ideas would be appreciated.

Thanks.

PS: I’m using a 1900X + RX580 (host) + GTX1080 (guest) on Arch. (also tried Ubuntu 19.04, with the same results)

Do you have NVIDIA drivers installed on Arch? I had a similar issue to this, when I removed the NVIDIA drivers with pacman (which I no longer needed) it then worked from virt-manager.

Have you tried without X running? When I was experiencing the issue I was able to virsh start so long as X was not running. If X was running it would hang.

No, I don’t have the NVIDIA drivers installed.
I tried to start the VM without X and it crashed after 5 minutes. (as usual)

I also tried running the same VM (with the same settings and image) on a fresh and up to date ubuntu 18.04 and it did not crash after 3+ hours of playing/idling in Diablo3. (this is what I use to test/replicate the crash)

But, after upgrading the ubuntu instalation to 18.10 (do-release-upgrade), it began crashing again, even with the leftover kernel 4.15.0 from 18.04.

Here is the VM config file

I tried the same GPU, VM and host OS on a different machine (Ryzen 1700, Biostar X370) and I did not have the same issue.
So I guess it’s something related to Threadripper or my X399 Taichi.

The issue seems to disappear if I pin the vcpus to anything except core 0.
(so auto/no pinning is bad; pinning core 0 is bad; at least for me; I had no crashes since I made the change)

@loadrunner I created an account just to thank you.
I had the same problem and just like you said, avoiding to pin cpu0 fixed the problem. Thanks so much. Hope you have a great day :slight_smile: