Looking Glass Client hangs if host GPU cant keep up

Hi there!

I already wrote you in the release thread.
This is my setup:
Guest: Rx 570 (Primary GPU)
Host: 1070 TI (Secondary GPU)

Resolution: 2560x1440 á 144Hz

When Im playing games my host GPU tanks hard and stays at 100% utilization. After a while the client hangs.

The client log itself only says that frames may be dropped because the host cant keep up.
The debug mode of the client does not give any more insight.

It always hangs after a few seconds and especially when I switch into the container with my mouse and keyboard using evdev switch.

Im actually rather surprised that my 1070TI tanks so much. It even tanks when watching a 1440p60fps video on youtube in firefox.

When I switched my windows VM to 1080p the hang did not appear again.

Hope these information suffice :open_mouth:

Cheers!

I don’t really know anything about it but I wonder about your PCIe setup.

Is one of your GPUs stuck on a PCIe x4 or maybe a PCIe v2 x8? I think some of the older boards did bad tricks like put PCIe slots behind 4 lanes of slow PCIe v2. It would be better to use a x8 / x8 SLI split on two GPU slots than use one of those crippled slots.

I just think one of the things that could cause this is a bottleneck on transferring the frames card to card. 2560x1440 @144 Hz is a lot of bits.

Please show the client output on the command line.

No idea. The motherboard is the ASUS Prime B350. Unfortunately is no other setup possible due to my IOMMU groups and I couldnt get the error 43 bug fixed in the VM.

Edit: In nvidia-settings it says:
PCIe Link Width: x2
PCIe Link Speed: 5.0 GT/s

not sure what that means.

Edit 2:
According to the product page:
16 PCI Express® 3.0/2.0 lanes
But that should only limit my UPS right? Currently my FPS is < UPS, which is frankly odd with this much graphics power.

It can only produce 52 FPS, while UPS sits at 60

I tried to reproduce it this morning but could not get it to hang. Which is odd. But Ive played around with graphics settings a lot yesterday. Ill report back with the exact report.

But the report usually only says this.

looking-glass-client -k yes
   191509071 [I]               main.c:1646 | main                           | Looking Glass (B2-rc4-0-g969effedde)
   191509097 [I]               main.c:1647 | main                           | Locking Method: Atomic
   191509382 [W]             option.c:288  | option_parse                   | Ignored invalid argument: yes
   191609403 [I]            ivshmem.c:180  | ivshmemOpenDev                 | KVMFR Device     : /dev/shm/looking-glass
   191661476 [I]                egl.c:188  | egl_initialize                 | Double buffering is off
   191664713 [I]                egl.c:202  | egl_initialize                 | Multisampling enabled, max samples: 4
   191664720 [I]               main.c:1074 | try_renderer                   | Using Renderer: EGL
   191706968 [I]               main.c:1399 | lg_run                         | Using Clipboard: X11
   191714243 [I]                egl.c:405  | egl_render_startup             | Supported extensions: EGL_EXT_platform_base EGL_EXT_device_base EGL_EXT_device_enumeration EGL_EXT_device_query EGL_KHR_client_get_all_proc_addresses EGL_EXT_client_extensions EGL_KHR_debug EGL_KHR_platform_x11 EGL_EXT_platform_x11 EGL_EXT_platform_device EGL_KHR_platform_wayland EGL_EXT_platform_wayland EGL_MESA_platform_gbm EGL_KHR_platform_gbm EGL_MESA_platform_surfaceless
   191714259 [I]                egl.c:411  | egl_render_startup             | use native: false
   191766205 [I]                egl.c:500  | egl_render_startup             | Vendor  : NVIDIA Corporation
   191766219 [I]                egl.c:501  | egl_render_startup             | Renderer: GeForce GTX 1070 Ti/PCIe/SSE2
   191766223 [I]                egl.c:502  | egl_render_startup             | Version : OpenGL ES 3.2 NVIDIA 455.23.04
   191968921 [I]               main.c:1530 | lg_run                         | Host ready, reported version: B2-rc4-0-g969effedde
   191968931 [I]               main.c:1531 | lg_run                         | Starting session
   269954852 [ ]            texture.c:308  | egl_warn_slow                  | ================================================================================
   269954883 [W]            texture.c:309  | egl_warn_slow                  | The guest is providing updates faster then your computer can display them
   269954896 [W]            texture.c:310  | egl_warn_slow                  | This is a hardware limitation, expect microstutters & frame skips
   269954901 [ ]            texture.c:311  | egl_warn_slow                  | ================================================================================

Edit: I see it uses OpenGL, is there also vulkan support?

I didn’t realize it could be so bad as that. I thought maybe 4x. Not a 2 lane limit. Now I’m surprised that it works at all.

Ran a 7950 off a 1x 3.0 lane with little to no bottleneck, but that might be a little different since there’s little communication when you run a GPU naively, cpu instructions < 144 1440P frames every second in terms of bandwidth

the 2x might happen because a secondary M.2 slot was used that wasn’t the primary and its sharing lanes between the GPU and the secondary M.2

can you take a picture of which slot you have occupied, if any?
if Possible use the Primary m.2 slot so your GPU can run in a higher lane count

So this causes the bad performance?

you can think of it like a network connection, if your internet connection isn’t fast enough, you can’t stream high resolution content without it hitching

since its effectively sending the rendered Frames from one GPU to another (I think that’s how this software works anyway) you PCI-E lanes are how much bandwith you have

The Guest uses the primary GPU slot, because auf the poor IOMMU groups.

The secondary GPU is used by the 1070TI for the host and Im using the M.2 NVME slot. Not sure if it shares the lanes with the secondary GPU slot lanes.

From the looks of lstopo, it does not seem to share lanes with the NVME.
But below the picture of it:

07:00:0 is the 1070 TI (Host)
08:00:0 is the Rx570 (Guest)

Hmm, I thought that this might only affect the UPS. Do you reckon I should get a better AM4 motherboard if so which would you recommend with good IOMMU groups and enough PCI lanes?

wendell usually exclusively does linux and IOMMU content on motherboards he reviews these days

but before we chalk this up to a bad board layout, might be some other issues that are the cause

I am by no means a linux expert, but the other two guys probably are, in fact Gnif created looking glass (I’m pretty sure anyways)

Ay okay, hope the other two can give me more insight about this then :slight_smile:

Ive looked up some motherboards and this one seems to have 2x PCIe 3.0 x16 slots plus several other nice features. Question is, how are the IOMMU groups :confused:

Okay. I have no idea why my nvidia card sucks as host GPU, but after solving the error 43 problem on my 1070 TI using this tool (https://github.com/Matoking/NVIDIA-vBIOS-VFIO-Patcher).

It suddenly began working without a problem.
Though, the UPS is still limited at 60 but the FPS as well due to my bad MB and its prob bottlenecked.
Host cards sits confortably at 4% utilization.