WRX80 VFIO based multi-VM workstation

I don’t like cliffhangers so it’s not really gonna be full build log as this project is 90% complete but I’ll try to recreate my own journey that brought this machine where it is now as I think it’s quite unique approach to well what one could consider “workstation”.

I’m working in ITSec and as such I’m often creating various emulated customer networks as “virtual lab”. Also I’m trying to keep my “main” operational OS always-offline just in case my lab experiments would go terribly wrong. In free time I’m sometimes doing some amateur 3D modelling in Blender. Daily OS os choice is Linux OpenSUSE. Occasionally I’m playing some Windows games so I wanted to address this topic as well.

This machine has been built to replace my good old workstation-for-everything which was so severely overloaded and i/o starved that it just couldn’t handle it anymore. It was P67 based workstation and I ran out of pci-e lanes loooong time ago, as soon as I started upgrading network and storage to 10G and storage RAID (and still had to have GPU).

So this new WRX80 based workstation was supposed to replace workstation-for-everything in a bit more modern way. Instead of using one monolithic OS for everything where every part of OS colided with everything else and after some time installing software updates was impossible, I decided to implement it in microservice manner - using VFIO based VMs for various tasks, with pci-e passthrough for native performance.

I chose WRX80 platform because it has lots of i/o and I knew it’s gonna serve me well for many years since realistically - performance of i7-2600k wasn’t main problem for me in the first place. It was lack of i/o.

Project was ambitious but well defined and vision was relatively clear. I wanted this machine to:

  • run up to 3 OSes with native GPUs performance (in most cases 2)
  • virtualized drives for easy snapshotting and VMs management
  • as close as possible to minimal type-1 hypervisor model - all “usable” OSes as VMs
  • all VMs including smaller headless ones for LAB with access to 10G hardware backed network
  • behavior as close to having multiple baremetal workstations as possible

Motherboard choice was relatively easy - WRX80 platform didn’t have a lot of support from OEM vendors so realistically there was ASUS and ASRock and I really wanted mid-tower build in Fractal Define R6 chassis so:

  • ASRock WRX80 Creator because it had TB and really was the only viable option for me.
  • TR5965WX as I wanted to have decent single core performance as well
  • Quadro RTX 4000 as I already had one + 2x RTX3090 because at the time dual slot 4090 was unobtainium in EU and I didn’t want to waste my precious pci-e slots.
  • 256GB ECC
  • SSDs suitable for more write-oriented workloads but not enterprise ones - I chose 2x FireCuda 530 4TB for VMs storage due to their great sustained-write performance and 2x Samsung 990 Pro for home storage in main OpenSUSE VM (nvme pci-e passthrough) since it’s all-round performance king and 2x junkyard laptop salvaged 512gb nvme for host OS since it’s barely used
  • 2x Intel X710-DA4 for lots of 10G networking in VMs (and because I use MikroTik CRS317 which doesn’t have real VEPA support so I can only have 1 VM in VLAN per port so if I wanted to have up to 8VMs in single VLAN I had to get 8x10G sfp+

Due to space constraints and other hardware heat management I decided to go with air cooling. Which was quite questionable choice and I really didn’t know at the time if I’m gonna pull this off - but decided to try anyways.

And I failed immediately. Even with “open bench” like one on photo CPU was throttling and it wasn’t even placed under desk yet where access to cold air is a bit more obstructed. But it was just beginning of problems. Since I didn’t want to loose pci-e slots for 3090 I needed pci-e gen 4 riser and there was no such riser available for my chassis so I had to improvise.

And failed again xD But after few trips to local Home Depot equivalent I found few washers that allowed me to properly align height of riser to properly support pci-e slot. But then unfortunately it turned out that I can’t fit one of x710-DA4 in the lowest pci-e slot because riser is too close… I found alternative but I’m gonna describe that later because it arrived with MASSIVE delay. Soon v1 was somewhat ready with Noctua IndustrialPPC 3000rpm 140mm fans:

It worked. Well. Very well. Maintains solid boost clock for very long time and while yeah, it’s unbearable in stress - I rarely do have 100% full stress on all cores so irl it’s not that bad. And when it is - it’s rendering something anyways so I can just go touch the grass during that period.

Once storage arrived I put it in carrier board

And repurposed lost pci-e slot for riser to add RS232 bracket which comes handy for routers management

Second hole has been repurposed to route additional usb-c controller to make USB passthrough a bit more flexible:

Since Optane is already quite difficult to find and I couldn’t really find any good deal I decided to abandon idea of using Optane and instead repurpose U.2 connection on motherboard for USB-C controller using U.2 → M.2 → USB adapters.

After some further struggle with fans connection, LEDs and some final touches v2 was complete but BIOS said no.

Luckily thanks to community help and ASRock support I managed to obtain patched BIOS. But machine was still missing one crucial piece:

MikroTik CCR2004-pcie which is 2x25G RouterOS based router which also functions like network card. But you can totally just log into it over ssh and it behaves just like regular router with virtual ports facing pci-e slot.

And id didn’t fit.

But I broke USB header cables and managed to make it fit. With that current, final v3 was ready and I could start struggling with OS setup.

And with storage space limitations, which led to obtaining two more Kingston DC600M SSDs.

Final result:


And yes, everything is connected which is pretty hillarious xD

In order to switch between VMs I’m using physical HDMI KVM switch, just as if those were physical PCs. OpenSUSE VM connects to host (which is set up on headless Arch) using virt-manager and manages all VMs (including itself).

Since OpenSUSE VM also uses JACK Audio framework for podcasting/streaming it’s set up in RT fashion using statically bound cores, FIFO shed and other RT KVM practices.

Also other VMs having USB controllers are connected to audio stack in OpenSUSE VM physically (plugged into mixer and interfaces) which even more emphasizes separation between those.

At this point one could ask - why even bother building such Frankenstein if you could just have 3 separate PCs which would be much simpler and cheaper. Well yes I probably could. But they would take more place (physically) plus I wouldn’t be able to easily combine resources from all of them into one machine. While here I’m able to either split resources or in case I need it - I can just assign all cores and all GPUs and all goodies to one VM.

It draws 1600w under full load XD

4 Likes

Well, in the picture there are at least two cables not connected, which I find hilarious :wink:

Anyways, thanks for your detailed write-up of your journey.

2 Likes

it’s because cables are just-right-length short so I cannot plug them until PC is fully shoved into cutout in desk but then I wouldn’t be able to take photo xD Currently they are indeed connected

2 Likes

What was your thought process in choosing Arch as your host VM instead of a hypervisor like Proxmox?

There were plenty of reasons at the beginning and there were few issues I noticed later down the road that could potentially bite me if I didn’t go Linux route. For starters I’m often using Arch on servers because it’s very small. When I’m creating lab environments 2gb virtual disk is enough to set up Arch on btrfs with zlib compression. So it’s fairly compact for headless setups but at the same time very flexible. That means I know this distro fairly well and I’m able to set it up even in quite complicated scenarios.

It’s one of very few distros that I know how to set up with LUKS encrypted /boot partition and kernel (the only other distro supporting such setup that I know of is OpenSUSE which I also considered in minimal server variant).

Then there’s support for weird tweaks. I tried ESXi first but it didn’t work (couldn’t properly reset USB controllers on this motherboard so I could only start VM once).

Then I’m also using fairly complex btrfs setup for everything and LUKS with detached header stored on external thumb drive. All of this fairly exceeds typical supported setup options for dedicated, specialized systems like Proxmox / ESXi since I’d most likely have to quite seriously tamper with shell to adjust it to my requirements and at that point it kinda misses whole point of using specialized OS if you have to “generify” it back to normal Linux.

Then there are things I noticed later on. It turns out I need to have installed and loaded nvidia drivers on host OS, even though it doesn’t actually use those cards and I have to configure GPUs on host into persistence mode. Otherwise RTX 3090 has terrible power management and idles at 110w each when it’s not attached to any VM. It’s not only inefficient but also loud in my case because 2x3090+Quadro RTX 4000 without persistence mode result in 300W power draw on idle just from GPUs alone, doing nothing.

Arch is also relatively bleeding edge distro so it always gets patches farily early. As such there’s lower risk of waiting for years to get support for new hardware or bugfixes (and as of now I am actually affected by VMWare-on-KVM nested virtualization bug on AMD that has patch merged to mainline kernel in december, so fairly recent)

And final reason is just convenience of files checksumming, backups, automated btrfs snapshots and so on. Sometimes it’s just convenient to manage your VM files the same way you work with VirtualBox / VMWare Workstation VMs.

Also Arch has fairly decent tutorial for VFIO so it was some nice-to-have thing.


TL;DR it’s relatively exotic setup overall and using off-shelf hypervisor would most likely require more tweaking than it’s worth.

2 Likes