Hi Gnif,
Just registered in the forum to confirm dmabuf-test branch gets DMA Buffer fully working with Linux nvidia driver series 535 (525 for some reason still bailed out with BAD PARM).
Testbed:
DOM0 : Zen1 Server running Proxmox 8.0
GPUs : Quadro M6000, Quadro P4000, Quadro RTX6000 (this latter tested both as VFIO/PT and vGPU)
3 x VM Guest w LG Host B6 : Win10 LTSC 1809 with Nvidia drivers (Quadro or Grid version) 536.25
1 x VM Guest w LG Client (dmabuf-test branch) : Ubuntu 22 with Nvidia driver 535.86
Qemu additional lines on VM1/2/3 (Hosts) :
VM1 → args: -device ivshmem-plain,id=shmem0,memdev=looking-glass -object memory-backend-file,id=looking-glass,mem-path=/dev/shm/looking-glass,size=128M,share=yes -spice port=5901,addr=[server-ip],disable-ticketing=on,image-compression=off
VM2 → args: -device ivshmem-plain,id=shmem2,memdev=looking-glass-2 -object memory-backend-file,id=looking-glass-2,mem-path=/dev/shm/looking-glass-2,size=32M,share=yes -spice port=5902,addr=[server-ip],disable-ticketing=on,image-compression=off
VM3 → args: -device ivshmem-plain,id=shmem3,memdev=looking-glass-3 -object memory-backend-file,id=looking-glass-3,mem-path=/dev/shm/looking-glass-3,size=32M,share=yes -spice port=5903,addr=[server-ip],disable-ticketing=on,image-compression=off
Note: VM1 is rendering at 4K, thus the larger ivhsmem size. VM2/3 do HD.
Qemu additional line on VM4 (Client) :
VM4 → args: -device ivshmem-plain,id=shmem0,memdev=looking-glass -object memory-backend-file,id=looking-glass,mem-path=/dev/shm/looking-glass,size=128M,share=yes -device ivshmem-plain,id=shmem1,memdev=looking-glass-2 -object memory-backend-file,id=looking-glass-2,mem-path=/dev/shm/looking-glass-2,size=32M,share=yes -device ivshmem-plain,id=shmem3,memdev=looking-glass-3 -object memory-backend-file,id=looking-glass-3,mem-path=/dev/shm/looking-glass-3,size=32M,share=yes
Compiled and loaded kvmfr on VM4 (which automatically creates /dev/kmvfr0,1,2).
Connected as:
VM4 → VM1 : __GL_YIELD=usleep ./looking-glass-client.dma-test -f /dev/kvmfr0 -p 5901 -c [server-ip] -m KEY_RIGHTCTRL spice:audio=no egl:vsync=on
VM4 → VM2 : __GL_YIELD=usleep ./looking-glass-client.dma-test -f /dev/kvmfr1 -p 5902 -c [server-ip] -m KEY_RIGHTCTRL spice:audio=no egl:vsync=on
VM4 → VM3 : __GL_YIELD=usleep ./looking-glass-client.dma-test -f /dev/kvmfr2 -p 5903 -c [server-ip] -m KEY_RIGHTCTRL spice:audio=no egl:vsync=on
Performance improvement is significant:
-
on client CPU usage went down from 50-70% (no DMA Bufs) for each instance of LG Client to 5-8% (with DMA Bufs) (measured with top)
-
on host I got better numbers with NvFBC compared to DXGI (maybe because GPUs are Quadro), with LG Host CPU usage down to 1% for NvFBC vs 5% with DXGI for the same scenario (measured with Windows Task Manager)
I had 3 VM w LG host passing high FPS rendering to 3 LG Clients with DMA Buf on VM4 all at the same time for 8 hr, and was completely stable.
I tested LG Client on each Quadro generation (as there were rumors DMA buf might only work on Turing+) and happy to report, with driver 535 DMA buf flawlessly works on Maxwell and Pascal too. I could not test on Kepler, since Nvidia driver on that stops at 470, but think you could be able to use Nouveau there to get DMA bufs.
Kudos to Gnif for the milestone and good job Nvidia with the driver.
Thanks,
-max