B6 - linux client repeatedly gets disconnected from windows host and is unable to reconnect

Debian guest, Windows host, B6 version on both. NVidia GPU passthrough (RTX 4000 SFF). I’ve tried with different hosts (guest in Libvirt/Qemu parlance) and versions of the NVidia driver.
LookingGlass works fine for a while, then client gets disconnected and is unable to reconnect.
The host is still responsive though, as I’m able to connect through ssh to reboot it.
Any idea why this happens, and if there is a way to keep the host service alive?

Please post both the client and host app logs.

Apologies I should have done that in the first post:

Host: pastebin com r0r8jass
Client: pastebin com UCETCbC4
(can’t include links? not sure why)

This is one instance on a newly spun VM (windows LG host, linux LG guest).

Your VM seems to be mis-configured. You don’t have an EPYC threadripper with single thread 8 cores. You should always try to replicate the actual CPU in your host at the VM level. Deviating from that can have dramatic effects on VFIO. Also, single thread cores are especially problematic to multi-threading applications such as LG. If you plan to provide half your compute resources to the VM, your XML CPU config tag should be something similar to:

  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' clusters='1' cores='8' threads='2'/>
    <feature policy='require' name='topoext'/>
  </cpu>

I haven’t seen the rest of your XML, but based on the above, you might want to visit the VFIO discord server and seek some help in properly configuring your VM before even considering running LG.

I was setup with passthrough before but moved away from it as it would sometimes bring down the host (Ryzen 7950X 16C/32T). And I also had LG disconnections with the passthrough setup. I’ll try again…

Also I don’t think EPYC threadripper is a valid option in my version of libvirt, I’ve been using EPYC Rome.

Confirmed, had a similar disconnection with the passthrough setup after a couple hours of it working. I would add that it happens when I’m doing something else meaning it could be related to some activity timeout (purely guessing).

The logs for the new disconnects using host passthrough with the CPU features above:

host pastebin com uNCG9G1E
client pastebin com YRbiuiGJ

Do you have memory ballooning disabled? That might be something that takes an inconsistent amount of time to break.

My setup in that regard is as follows

<memballoon model="virtio">
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
</memballoon>

I should add that the last disconnect I experienced occured within a short amount of time (within one hour from start and doing nothing on the VM, purely as a test) while others have taken much longer before.

This guest has 32GB allocated on a 192GB physical + 256 GB swap host.

Ah, there you go then. Memory ballooning needs to be disabled with GPU passthrough.

I disabled it by setting <memballoon model="none"/> but still experienced a disconnection after a while.