I’m planning to build a server with following specs:
-ASRock Taichi X399
-AMD 2950x 16c 32t
-4x1080ti (using 8 pcie lanes per card, 2 cards connected via bifurcation)
-1 10gbit NIC
-2x m.2 SSD, 1 per CPU die
-My systems won’t have a GPU for the host.
-I don’t need dynamic CPU or disc allocation.
I’m planning to run 4 headless linux VM’s for machine learning, which would be accessible over local network & internet. I’m not very familiar with virtualisation but eager to get started. I used Unraid in the past for what it’s worth. Passing through all GPU’s on Unraid was a bit of a pain but worked nonetheless.
I’ve got some questions though about my planned setup:
-I’m doubting between Xenserver and KVM (will probably end up testing both), what would be ideal in my scenario? Performance is priority, and looking for a free solution.
-Is it a good idea to go with Threadripper for this? I’m thinking it should be fine since I can split up the VM’s with relation to the CPU die on which it runs. However I’ve seen people recommend against threadripper due to issues with virtualisation. I would also consider EPYC 7301, however single core performance is also important.
-Where can I find a layout how the die’s are divided over the pcie and m.2 slots for the X399 taichi to avoid slow downs in communication between devices?
-Can I easily not have any GPU for the host?
-Do I need to spare CPU cores and RAM for the host or can every VM get 1/4 of the cores and RAM? How much does the host need?
-Out of curiosity: Can the host prevent abuse of the VM’s GPU? (prevent excessive overclocking for example?)
One Hypervisor you might want to consider trying is VMware ESXi. If you create an account with them you should be able to get a free permanent license to run the host software with some limitations, like being limited to 8 vCPU’s per virtual machine. I’ll have to do some digging to see whether or not they block off PCI-e Passthrough on the free license though, I don’t THINK they do…but they might.
VMware ESXi is pretty much THE name in virtualization in the enterprise market, nothing really comes close, not even Hyper-V from MS (they try hard…but still can’t really compare). The ESXi host can run on less than a 2Gb flash drive. When the Hyperviser boots, it pulls the OS info from your storage device and throws it into a small cache and just runs from there, never really accessing the “os” drive aside from maybe writing some logs. Because of this the Hyperviser itself takes up extremely minimal resources, and you certainly don’t need to reserve GPU’s or anything for it.
Now I’ve only installed or run VMware ESXi on old full enterprise class server chassis (like old Dell PowerEdge machines) and it always runs like a dream. VMware has a list of supported hardware, I don’t know off hand if they support basic AMD Ryzen chips (I know they support EPYC) but just because it’s not on the list doesn’t mean it won’t run.
Since it’s free (you should be able to run a 60 day trial version if you don’t want to sign up for a free permanent license) can’t hurt to try at least.
I know this doesn’t answer all your questions, but I hope it helps a bit at least
Thanks for your answers! @Dragon6687 The reason I only considered KVM and Xenserver is that that AWS used to use Xen and switched to KVM, so I figure it’s good to get experience with what is used on massive scale. Not only for best performance but also to look good on my portfolio for future work opportunites.
I’d like to be able to automate the process of setting up such machines in the future.
Not planning to use raid, just splitting 1 drive per 2 VM’s. Host OS would be some flavor of linux, VM’s linux as well. The host doesn’t need to do anything besides managing those 4 VM’s.
Are setups with no GPU for the host generally ill-advised? Or is it just a matter of changing some configurations post-setup (which can be automatised)?
It depends on how good you are at not breaking ssh access. If you are running headless, then if you break networking or ssh it is time for a host OS reinstall. You also could see if you have a serial connection available and that would work fine. Linux works fine without a gpu, it is just hard to fix things if you do not have a method of controlling the computer.
Passthrough on ESXi is pretty easy, on par with unraid as far as doing it all from a gui and not a command line. I have it working great on an x79 board (Intel Xeon), could not get it working on my AMD x399 but I think I have a bad motherboard / bios / user error.
With ESXi 6.7 you’re limited to 16 per VM. Tricky part for my testing was finding enough USB controllers to pass through to guests so I could plug/unplug keyboards and mice as needed.
Unlike other linux distros, vmware had no trouble disconnecting the boot gpu and passing it through to a guest. During boot up, the console just stops updating, and eventually a guest appears.
The machine has a total network bandwidth of 30gbit/s. Cards 1-4 are 1080ti’s, 1 & 2 are on the same numa node but seem to have the lowest P2P bandwidth (10 vs ~15 GB/s). The P2P latencies range from 10 to 15µs.