It looks fixed. Certificate expires on 10-Jun-18.
I was pondering about the memory copy issue, and I thought what about dividing the image into 4 quadrants with 4 shared memory files so that the block size per quadrant is smaller for resolutions higher than 1080p? Is it a overall QEMU hypervisor performance issue or a block size per shared memory file issue?
That could be worth a shot to improve memory copy performance in 4K.
Does anybody know, what exactly mean 100% of PCIe Bandwidth Utilization (NVIDIA CPL)?
If 100% is meant for both directions saturate together, or only for one of them?
Screenshot 4K with NVIDIA CPL
Typically, a bus is only half-duplex.
Depends on if it’s a serial or a parallel bus. Serial buses most often are full duplex and parallel buses mostly are half duplex. PCI-E is a full-duplex serial bus and can send in both directions at the same time.
My HostOS GPU (GTX 960 OC) shows PCIe utilization 14% when [email protected] is UPS 112.
When we assume 100% bandwith of PCIe 16x 3.0 = 2 x 16 x 985 MB/s = 31,5GB/s, then 31,5 x 0,14 = 4,41 GB/s (probably is enough for 3840 x 2160 x 4 x 112 ~ 3,6GB/s)?
The problem with looking glass at the moment is not PCI bandwidth but memory copy speeds.
I’m trying find some confirmation, these UPS indicated 112 (whole?) frames/s are real (GuestOS run lightweight 4K benchmark).
Which is what I just gave an idea about how to optimize. If it’s memory copy per shared memory file, you can divide the image into quadrants. If it’s total, there’s nothing we can do.
iirc its total memory copy speed but at @gnif can tell you for sure.
Also i had a idea. Why not just store the differences between frames to save on space and memory copy time. Sure you might have higher CPU usage but something like 1440p or even 4k might be possible. Most frames wont change in a very significant way unless you doing a really action intensive FPS game @gnif
If it’s total, it’s QEMU that needs to optimize their hypervisor.
A difference based solution would mean the host program in the guest has to convert to a lossless compressed image format that supports difference frames and GOPs (Groups of Pictures) Which is essentially what H.264 does, but not losslessly until you specify that type of encoding.
Wait… Losslessly encoded H.264 might work actually. You can save quite a bit of bandwidth when you compress to perceptually lossless, although the CPU cost would be tremendous. (ProRes HQ and DNxHD 220x are perceptually lossless, but are still lower in bitrate than uncompressed frames)
As an example, Blackmagic uses a hardware FPGA IP to compress a 4K 60p image 4:1 to send down a single 3G-SDI cable. It’s perceptually lossless, but it does throw away information to get the 4:1 compression done. The tech is called TICO: http://www.intopix.com/products/index/index/id/31/lang/en#.U5BaOGdZqpo
(NAB 2014 BTW, 4 years ago)
TICO seems to be VERY light on the CPU so it may be worth looking into. A TICO Binary blob in Looking Glass could be something that could be explored. Remember, this is ULTRA low latency compared to other perceptually lossless solutions.
Similarly, DisplayPort 1.4 has something called DSC. (Display Stream Compression) It’s the exact same concept except in the DisplayPort standard and included with every DisplayPort TX and RX chip in the spec: https://www.vesa.org/news/vesa-updates-display-stream-compression-standard-to-support-new-applications-and-richer-display-content/
This 1 hour webinar on DSC and DisplayPort might be very interesting to watch, @gnif:
Rightmark Memory Analyzer gives similar results for native and KVM/QEMU run Windows 10.
Four-threads SSE2_128bit operations over 4x 32MB data blocks: ~44GB/s for Read, ~14GB/s for Write (~44GB/s for Write NT ?)
Reads seem to benefit like RAID, but writes are where the hypervisor is starting to really struggle.
Seems like if QEMU cannot improve their performance, we have to look at visually lossless and light CPU load compression solutions.
I have problems with low ups. When moving a window, everything is fine. When playing, UPS drops to 30, sometimes up. Why is this or how do i start analysing?
I would try lower quality settings in game(v-sync on?). Maybe GuestGPU (RX480?) has not enough performance for both tasks: 3D rendering + LG (grabing).
Try in GuestOS some lightweight 3D load (for example benchmark Lightsmark 2008). How high UPS will be?
250fps inbenchmark, 137 ups. Sooooo… Why is my Graphicscard failing at grabbing a 2005 game smoothly with vsync on?
Could it be an issue with LTSB?
I don’t think so.
I think, there is no priority for DXGI (LG framegrabbing) task.
When GPU_load (related to 3D render) is high too much, UPS goes down.
Try the lowest quality/postprocessing settings in your game.
Vsync sometimes helps. I tried a couple of games. In one game i had to turn vsync off.
is there a way to set the priority for the task? Do i have to recompile?
Simply run the Looking Glass Host in the guest as Administrator. That will grant it realtime priority.
I did and it helped a bit. Well that pretty much says that my PC is just crap. Thank you all for helping.