Hey all,
I’ve had an interesting past week trying to get the 3970x working on VMWare ESXi and I thought I’d post my thoughts here, both to ask for help from smart people, but also to provide information for others who are looking to go down this route.
My goal is to build a system system, virtualized on the ESXi platform, that will allow me to game, monitor data, develop, test, basically do everything I want. I’ve chosen the AMD 3970x platform to do this.
My specs are as follows:
- AMD 3970x 32 Core CPU
- 128GB Corsair RAM CMT64GX4M4K3600C18
- Asus ROG Zenith 2 Extreme motherboard w/0702 bios
- H740p (Dell/LSI) raid controller w/8GB NVFlash + BBU
- 8x 2TB Crucial MX500 SSD in SFF8643 capable enclosure
- 3x 1TB Rocket PCIe v4 NVME
- 4x 12TB Seagate Ironwolf NAS HDD
- GTX 2080Ti GPU#1
- GTX 1060 GPU#2
- 2x GT910 ancillary GPU
- beQuiet TR4 cooler
- Boatload of Noctua 120mm/140mm fans
- Fractal Design XL R2 Case
- 4x 1080p BenQ 27" IPS monitors (blue light reduction models)
- 1x Asus ultrawide giant monitor (forgot model)
- 2x 2K 32" Dell/Acer monitor
There are a bunch of other misc parts as well, such as multiple USB pcie cards, x16 ribbon extenders, bluetooth USB adapters, and the like.
STATUS SO FAR:
Raid: The H740p raid controller came out of a Dell server. I love this controllers. They are configurable directly in the Asus UEFI which is amazing.
I’ve used ESXi for years, as well as Dell servers, and am fairly comfortable with these platforms. That said, I make no claims to be an absolute guru.
I set up 8x 2TB SSD in a RAID6 with roughly a 30% overprovision. RAID6 was chosen for double parity as well as crazy fast read speeds. Note: The 30% overprovision (or underprovision if you’d like) has a substantial impact on the TDW of the drive. I’ve purchased 2x other 2TB drives and have done over 3000 drive writes @ 30% OP and it still works fantastic. The Crucial MX500 was chosen due to having enough capacitor charge to write out its buffer in the event of a power failure. Speeds w/o drive cache and raid cache are @ 4GB/s and w/raid/drive cache @ 7GB/s (extended beyond 8GB write/read).
There was an initial issue with the raid controller being in Pcie x4 mode by default, of which I corrected via the Asus UEFI.
VMWare ESXi 6.7u3 was installed on 1x NVMe pcie4 drives. I decided to put ESXi on these drive as I will use it will double as a staging ground for doing vfstool disk conversions, so a high disk IO will be nice for that.
VMWare’s datastore is on the RAID6 array.
THE ISSUE:
I’ve created 1x VM so far, a Windows 10 LTSB 1603 to test things out. 16GB RAM, 250GB HDD, 6 core, thick eager.
I’ve passed through the GTX 1060 + GTX 1060 audio through to the Win10 VM and it boots up directly to a monitor. Yay! The only tweak needed was the following:
hypervisor.cpuid.v0 = FALSE
Which enabled the Nvidia GPU to act as normal.
FIRST ISSUE: The audio has severe issues. It is extremely choppy and playing a video via Youtube chops/skips/crackles. Yuck. I’m not sure how to correct this. Latest Nvidia drive and audio is direct via the monitor, so it effectively is GPU -> HDMI audio to monitor.
SECOND ISSUE: No keyboard or mouse
ESXi appears to filter out all HID USB mapped devices. I can’t seem to find a way around this. Does anybody know?
What you can then do is pass-through an entire USB controller to the VM and then plug whatever you want, such as a keyboard/mouse, into that USB device, and it’ll show up. But it doesn’t.
I have enabled all the USB devices to be pass-through and tried mapping them to the VM. From what I can tell, there are 2x USB controllers easily seen: ASMedia + AMD.
The AMD usb controller simply won’t “enable” for pass-through. It keeps on saying the ESXi host needs a reboot. So no go there.
The ASMedia one allows me to add it as a pcie-passthru to the VM, however, nothing in the VM works. No USB devices are recognized. No keyboards, no mice, no USB drives, nothing.
I then added a USB Controller card to the system (StarTech USB 3.1 PCIe Card - PEXUS313AC2V) and mapped that. However, nothing shows up as well. No keyboard/mice/drives, nothing.
THIRD ISSUE: Disk Response/Active Time: 100%
For some reason, the VM has a crazy high Average Response Time / Disk IO which is slowing things down, despite only pushing under 1 MB/s. An esxtop u shows the following:
11:19:53am up 1:41, 797 worlds, 1 VMs, 6 vCPUs; CPU load average: 0.04, 0.02, 0.01
DEVICE PATH/WORLD/PARTITION DQLEN WQLEN ACTV QUED %USD LOAD CMDS/s READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd KAVG/cmd GAVG/cmd Q
naa.6d0946608826be0025b455c6ea6eabe0 - 192 - 0 0 0 0.00 250.40 80.13 170.27 0.17 1.84 0.11 0.00 0.12
t10.ATA_____ST12000VN00072D2GS116___ - 31 - 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Notice the 250 CMDs/second. This is a crazy amount of commands for doing virtually nothing. I’ve done some googling and it lead me to doing the following:
Use this PowerCLI command: fsutil behavior set DisableDeleteNotify 1
This did not change anything. The VM host is doing nothing and all counters reflect massive capacity.
IN CLOSING:
3 issues:
-
How do I pass through and get to work USB controllers? I just need a local keyboard/mouse.
-
How do I get audio to work correctly on a VM w/pass-thru GPU?
-
How do I resolve the crazy disk queue times?
I’m working on this around the clock to get it working. I’m firmly committed to getting this to work. I have different USB hubs coming in the mail, different USB controllers, multiple bluetooth USB devices, even NVMe bifurcation + PCIe x4 breakout cables to PCIE x16 slots coming in. I’m going to make this work.
I’m going to consolidate all my efforts into this forum thread as I work on this. I’ll probably post updates, and then counter updates, and then back-tracking updates, so you’ll see the entire thought process as it goes and likely watch me make a ton of mistakes.
If you have any ideas on how to help, or things to chime in with, PLEASE DO SO!
This is my first post on L1Techs and I’m excited to be here!
waves to Wendell