ZERO - Dual Socket Linux VIrtualization Workstation

This is my current main machine, after many years of sifting through components to find a setup that meets my needs, I believe I have found one that is nearly perfect. I use my main workstation for a wide variety of tasks, some which require lots of threads and raw computational power, others require low latency I/O and high clock speed. Audio production, game development, electrical engineering, graphic design, programming, are a few of the things I use my machine for. I may, for example, be playing/testing a game in one VM while the next version is compiling in another VM to save time. Under my current budget, I believe this machine has managed the best of both worlds.

The Hardware
RAM = 32 GB Samsung ECC DDR3 1600MHz
CPU0 = Intel XEON E5 2667 v2
CPU1 = Intel XEON E5 2667 v2
GPU0 = AMD HD 7750 (guest)
GPU1 = NVIDIA GTX (guest)
GPU2 = NVIDIA Quadro FX 1700 (guest)
GPU3 = NVIDIA Quadro FX 1700 (guest)
GPU4 = Onboard generic VGA out (for managing hypervisor in case networking is broken)
SSD = Samsung 840 EVO 250GB
NIC0 = Intel 82574L Gigabit Ethernet Controller
NIC1 = Intel 82574L Gigabit Ethernet Controller
NIC2 = Intel 82576 Gigabit Ethernet Controller
NIC3 = Intel 82576 Gigabit Ethernet Controller
PSU = Seasonic 1000W PRIME Ultra 80+ Platinum
CASE = Phanteks Eclipse P400

Case Modification
Normally the Phantex P400 does not support EEB motherboards, luckily there is nothing in the way and adding support is as easy as drilling a few holes and installing stand-offs. I would have gotten a proper case for this system but I literally ran into a stranger after work who offered it to me for free brand new as long as I use it for something cool. I could not pass up the opportunity.

GTX 1070 Modification
Historically I have usually used AMD cards due to my hate for the state of Nvidia drivers on Linux, I purchased the GTX 1070 because at the time it was nearly impossible to find an AMD GPU like the RX 480/580 fin stock, after months of waiting I gave in. I noticed that the GPU clock fluctuations due to Nvidia’s low power limit were causing high variations in the frame rates of certain power hungry games. I soldered various lengths of wires to the current measurement resistors to make the card think it is only reaching 9% of its power limit when overclocked at full load. The core clock never drops below 2.05GHz now

CPU and Memory Overclock
I have overclocked the CPUs by increasing the BCLK from 100 to 104, this nets a small +160MHz increase to the CPU frequency. Even though this increase is small it is still noticeable in low latency applications and is reflected by benchmarks. The memory is 1033MHz memory overclocked to 1600MHz with no timing adjustment, this also makes a noticeable improvement to low latency applications. So far there has been no problems with stability.

Each task on my system is handled in a QEMU VM, one for WIndows game testing, another for testing OSX applications, one for Android, a Linux development environment, etc. I also spin up VMs to test mirrors of remote servers before pushing changes to them. These VMs can have recourses allocated dynamically through a set of management scripts running on the hypervisor. When I need the absolute lowest latency for a specific VM, I can can move all other processes off of the second CPU and NUMA node so that nothing else can interfere (other than low level Linux kernel tasks) almost as if it were running on actual hardware. Doing this nets a few extra % of performance over just making the process have the highest real time priority.

Other than the hypervisor and some scratch space for low latency disk access on an internal SSD, everything is hosted by one of my local servers. This server has a 800MBps array of 15k RPM SAS drives to host the VM disk images among other applications. This server also has 2 slower large arrays for archiving, backups, and general file storage. In addition to file hosting, this server also runs various things like email servers, web servers, game servers, git and apt repositories among other things. The workstation connects to my network via 4 bonded gigabit ethernet interfaces. I wish to upgrade to 10Gb once I can afford upgrading all my networking gear. In the near future I might just have a direct to server 10Gb link instead of upgrading switches.

I do not have a traditional dedicated KVM for switching I/O as they are expensive. Instead I have modified each of my monitors with connectors that hook up to the input switching buttons. I use synergy for mouse/keyboard and a USB switch as a fallback. Audio is taken care of by an audio mixer. The USB switch and monitor input connector connect to my keyboard via a USB cable. Inside there is a set of transistors which take the place of physical switches and are connected to extra I/O pins of my keyboard’s micro-controller. (Atmel AT90USB1286) I then can program the keyboard to toggle the transistors the correct amount of times to switch to each input, these can be triggered by specific key-combinations on the keyboard. (I use FN+1,2,3, etc.) This keyboard is one of my own creation, the [null] SaikouType ST110 and is fully programmable in the sense that you can upload your own firmware. I have tried looking glass and it seems promising, but the latency across NUMA nodes on my system is to high to be practical at this point for certain applications.

Cinebench R15
The left result shows the multi core score of a Linux guest and access to all 32 threads, the right shows a Windows guest’s single core score when isolated to it’s own CPU and NUMA node.

Phoronix Timed Linux Kernel Compilation
This is the result of a Debian 9 guest using all 32 threads, nearly as fast as Treadripper 1950x in this test.

Tested over 7 minutes with AIda64 stressing the system in the background. This is the best result I have ever managed from a virtual machine on any system.