Poor nested virtualization performance on WRX80 platform - is it normal?

I’m running OpenSUSE 15.3 on top of KVM type 1 hypervisor (headless KVM host). OpenSUSE VM is using VGA passthrough and performs reasonably well itself. I mean I didn’t perform any benchmarks but it feels like bare metal.

I’m trying however to run VirtualBox lab VMs inside this OpenSUSE VDI workstation and… it’s not so hot.

Before 15.3 I tried to run 13.1 but it was very old OpenSUSE and it had very old VirtualBox (5.0) which turned out to not really support nested virtualization and performance was quite terrible. So I switched to 15.3 which has VBox 6.1.26, which does support nested virtualization and now performance is…

still terrible tbf.

I mean yeah it’s a bit better but not really night and day kind of difference. Running Chrome inside nested Windows 10 VM is fairly bad experience. I’m getting like 20 fps on 720p video on YouTube and overall it’s very stuttery and unpleasant experience.

Host is Threadripper PRO 5965WX with 256GB RAM. Level 1 VM uses CPU architecture passthrough, has enabled nested virtualization, 1GB huge pages, memlock, statically pinned CPU cores, PCIe passthrough (GPU + NVME).

VirtualBox VMs reside on NVME SSDs passed through as PCIe devices.

Specs of chain:
host:
Arch Linux
24/48 cores/threads
256GB RAM (128GB locked as hugepages)
4x NVME RAID10 for VM images

level1 guest (libvirt):
OpenSUSE 15.3
12/24 cores/threads (host-passthrough, emulatorpin, iothread, topoext, tec-deadline)
64GB RAM (hugepages, nosharepages, memlock)
Quadro RTX 4000, pci-e passthrough
root disk storage as qemu qcow2 image on host storage
VBox vm images storage on 2x NVME RAID1, pci-e passthrough

level2 guest (VirtualBox 6.1.26):
Windows 10
12 vCPU, PAE/NX, Nested VT-x/AMD-V, Nested Paging, acceleration KVM
16GB RAM
VMSVGA, 3d acceleration enabled, 128mb (max supported)

OpenSUSE 15.5 guest performs slightly better than Windows 10 but it’s still terrible. The worst offender tho is Android 8 VM which runs like dogsh*t and gets like idk, 2fps on desktop. Barely functioning.

Isn’t your issue mostly due to the video rendering that’s being done in software? Subpar performance is expected in this case. Try to run your OpenSUSE without the GPU passthrough and you should notice similar scenarios.

Any way you can do some tests that are more CPU-bound and not really graphics related to see if there’s any indeed perf penalty when it comes to the nested virtualization?

if you RDP into the nested Win 10 from another workstation, is there a performance difference?

kind of yes but I’m not comparing to baremetal but to non-nested virtualization which I believe should behave the same way in terms of graphics right? I mean because level 1 vm has VGA passed through so it has hardware GPU in it and I’m using standard proprietary nvidia driver in level 1 VM so from graphics point of view it should be the same as running it on baremetal VirtualBox right? Only CPU virtualization is nested in fact in this case ig

That might be good idea, when I turned on fps counter in compiz in that nested VM it shows like 50-57 fps and when I’m looking at it, there’s no F way it’s 50 fps. Feels like 20-25-ish at best.

1 Like

I guess there’s some confusing things here. So far, from what I understood, you have a baremetal host, which is pretty much irrelevant to what’s going on.

You have your main VM which has a GPU passthrough, this has no issues either and works flawlessly, as it should.

You then have the nested-VM, which is using software rendering since it has no GPU on its own, and thus the graphical performance is lacking, which is expected.
If you remove the GPU passthrough from your main VM, it should behave similarly to your nested VM, as in awful graphical performance.

If you passthrough another GPU to the nested-VM, I believe you should have good performance as well.

Yes. However when I run it eg. on laptop (which only has VirtualBox on bare metal) it’s not that terrible. Performs noticably better. I’m using VMSVGA with 3d accel enabled in VBox and VBoxSVGA for windows guest also with 3d accel enabled which should not be like 100% software-software as far as I understand.

And just to remove software differences from equation - it’s 1:1 the same OS. I literally dumped laptop disk image to KVM VM and that’s my level 1 VM that I’m using so it’s the same OpenSUSE, the same VirtualBox as on laptop. The only difference is that I installed proprietary nvidia drivers on VM as I had AMD GPU in laptop.

I tried maxing vgpu memory to 128mb but it doesn’t help, actually it performs worse and introduces some weird stuttering on Linux guest. On windows guest it just doesn’t make any difference.

I have plenty of hardware GPUs considering I even have thunderbolt docks with pci-e attached to this machine so I could passthrough some potato hardware GPU to it but I’m not entirely sure how to then connect display to that OpenSUSE VM. I mean as far as I understand if my host would be OpenSUSE desktop then I could use Looking Glass but I don’t think it’s possible to use it to pass framebuffer to other VM instead of host system? My knowledge may not be up to date tho. VNC / RDP will probably have quite crappy perfoance as well since there’s whole encoding/decoding process in the middle.

In order to minimize misunderstanding, here are two videos of exactly the same VM, running in exactly the same VBox in exactly the same OpenSUSE (dumped disk to VM). One on baremetal laptop and one on Level 1 VM with VGA passthrough:

Level 1 VM

Laptop

You can compare shadow lag and cursor framerate inside VM (it does not have cursor integration so it’s showing actual in-vm cursor framerate)

Threadripper setup has so bad performance that it’s barely usable. And in fact this VM on TR has more cores than laptop baremetal because laptop has 8/16 cpu while VM has 24 vCPUs shared from 24/48 TR 5965WX

To be honest I believe in both cases this Android VM runs on full software VGA because Android x86 only supports old VBoxVGA adapter which already has dropped support for 3d acceleration so… yeah. It seems to be CPU-vs-CPU rendering performance difference.