So I figured I put a quick topic to discuss VMware ESXi and braindump on the issues I’m encountering:
ESXi 6.7u2 boots and works with VMs just fine, however the problems start when passing through devices. Mainly on my Windows Gaming VM. The USB devices have a lot of stutter when the VM is basically idling. If a workload or game is added the stutter mostly goes away. While running the FF14 Shadowbringers benchmark, the game and audio stutter in the same place everytime as well.
Tried updating the Nvidia driver but this gave a fatal Directx11 error and now the GPU is locked up and needs a host reboot. I’m currently seeing if an upgrade from 1709 yo 1903 and a fresh 1903 VM will solve anything (don’t worry, my VMs are backed up )
How is ESXi with Ryzen 3700x under typical ESXi usage such as headless servers?
If there’s no passthrough, everytjing has been working great. pfSense, FreeNAS, Ubuntu server 18.04, Windows 10 1903 all work. FreeNAS has my 9211-8i passed through and works great, actually saw a 25% improvement in read performance over my [email protected] with stock settings. Wanted to get stock results before Overclocking memory and setting a more reasonable voltage
Try explicitly setting pcie3 mode instead of pci4 and see what happens?
I will doublecheck but I’ve seen no option for setting PCIe to 3.0 OR 4.0 just ‘Auto, Gen2, Gen1’
Went ahead and looked through all the options available to me, I see nothing indicating PCIe 3.0 or 4.0 anywhere
So after updating the VM to 1903 the stuttering got worse, to the point of crashing the nvidia driver. Most likely it’s not the USB ports necessarily but something to do with the Nvidia card’s interaction in the VM.
Ran LatencyMon and the latency is really REALLY bad. I’m thinking it has something to do with how the card is being passed through, so I’m looking into a few things
Pretty stumped atm, really don’t know what’s going on and why the GPU kernel on Windows is just not cooporating. May end up trying to load a Linux distro and have to try KVM/QEMU
And now FreeNAS is throwing a fit with my drive pool
Going to try passing through the GPU to a Linux VM and seeing how it handles it
I’ve managed to boot into and restart Linux Mint 19.1 successfully and everything’s working. Hopefully this is a blessing in disguise, as I’ve run with Mint before and this could be a good push to stick with Linux more
Time to see how I can setup a Wine instance to run the Shadowbringer’s Benchmark
The true casualty of the absurd amount of reboots is FreeNAS. Unfortunately my pool was corrupted. Thankfully I have backups of the important stuff. Just means I’m going to have to spend a few days reimporting my massive Plex library -.-
So I may be onto something. I’ve decided to try installing Windows 10 onto a spare SSD I have for testing purposes and it starts throwing WHEA_UNRECOVERABLE errors, indicating something to do with a PCIe device. After pulling my Perc H200 and setting the 2x8 back to 1x16 it appears to be stable, or at least moreso than it was (I can install Nvidia drivers now without a crash).
Updating 1903 to the latest updates, then going to try the benchmark again. If it passes without issue then I know it is SOMETHING to do with how the Ryzen 3000 series interacts with the CPU enabled PCIe slots. I also noticed on the FIrmware for the BIOS that it is only at AGESA 184.108.40.206. Makes me think the x470 stuff is an afterthought in terms of keeping up with the microcode compared to their x570 launch, so this may be an issue that’s been fixed already but hasnt trickled down yet
I’ve verified the setting. 2x8 AND Gen2/Auto cause the instability issue. Either 1x16 mode or running the 2x8 at Gen1 is required for more stability. Pretty sure its a motherboard or AGESA issue that’ll be remediated eventually
AsRock released a BIOS update for the x470 Taichi Ultimate (BIOS P3.30), which has added the PCIe Gen3 option that @wendell has mentioned. I’ve retested 2x8 with Gen 3 for the CPU slots, and while the top slot is still PCIe 1.0, it is fully using PCIe x16. I’ve loaded up ESXi and passthrough is working for all devices previously, and have had no WHEA errors or nvidia driver crashes.
So the important bits are now working, though RAM overclocking is still an issue. Hoping future AGESA updates help address this