VDI Enviorment Questions

Hi Guys,

I thought I would give this a shot to see if anyone else run a VDI?
What bottlenecks have you discovered? Is it related to how Windows operates? Or does windows 10 just not an ideal OS for a VDI.
I recently moved our entire staff to Power Edge R720 hosts that run VSphere 7.0 as the hypervisor.

We are getting complaints about the Virtual Desktops being slow or unresponsive that they feel like they are anchored to the ground.

What I have done so far is run the VMWare optimizations tool and create a GPO based on their results. I switched over my test VM storage to paravirtual non encryption.

What i don’t want happening to me or my colleague is to rebuild these servers with XCP-NG if we don’t have to. We have other staff that run on this hypervisor on Dell Precision servers (that have some hardware rending from the CPU).

1 Like

Whew over my head but the subject is an interest of mine. Subscribed to see further posts.

With my limited time working with Citrix, it was a bit of everything. Storage, networking, configuration/optimization. VMWare wasn’t the limiting factor, though some configs had to be changed sometimes.

My suggestion is to hire a consultant that’s familiar with your type of deployment

I’m not a VDI expert by any stretch but I can point out some performance pitfalls I’ve encountered with virtual desktops. Some of these may be obvious to you or irrelevant to your environment; hard to say without more details, especially since “slow” is relative to whatever your staff were using before. I’m making an assumption that the users were migrated from desktop or laptop computers and not from a previous VDI system.

CPU

  • The single-threaded performance of your server CPUs might be worse than what your users are used to getting with a desktop CPU. Especially if you’re reusing hardware that wasn’t purchased with VDI in mind, you might be dropping people from 3-4 GHz cores onto something more in the range of 2-3 GHz.
  • IIRC, the Sandy Bridge / Ivy Bridge processors in the R720 servers are worse affected by spectre mitigations than newer hardware.
  • You might not have enough physical cores for the number of vCPUs allocated to your VMs. This can negatively impact CPU performance even if your CPU utilization is less than 100%.

GPU

  • It’s not clear if you have a GPU solution for your VDI. Even a weak GPU makes a big difference to some applications, compared to a VM with no hardware acceleration. This applies even if your users don’t do any 3D work, since the GPU can also accelerate 2D rendering and video decode.

Storage

  • If the backing storage of the virtual desktops is HDD-based then at least sometimes Windows is going to tend towards the performance of being installed on an HDD, even if that storage is a RAID array of “fast” HDDs like 10k SAS. This is of course true for server VMs as well, but it may be more noticeable with desktop software. An all-flash or tiered datastore can make an enormous difference, especially for people who are used to having an SSD in their desktop computer.

Network

  • If users are working from file shares, it may be worthwhile to check if the VDI migration introduced, e.g., a firewall or congested link between the VMs and the NAS that wasn’t in the path before.
1 Like

Thanks for the more details, it helps me brain storm what could be a reconfiguration or just outdated hardware (but we are bound to the budget).

The bare applications my users use everyday are mostly Teams, Quick Books, Chrome, and other single performing applications.

CPU

  • We have about 32 logical cores per host, which comes to about 4 cores per person (unless we had bumped them up to 8). We try to prevent oversubscribing on the hosts. I think you’re spot on for the single performance since most of the applications probably aren’t optimized for multi-core and the CPU is a bit dated.

GPU

  • We do not have any. Alas I have tried informing my higher ups to try it out, they won’t budge. I believe Teams and web browsers use a lot of 2D rendering along with the other applications.

Storage

  • We have an array of Samsung SSD’s, I forgot which Raid my colleague had set them to. But we filled all of the bays and today we are getting a NAS NFS storage solution online today for our devs.

Network

  • To my knowledge we are good on the network.

One of my users keeps throwing at me Chrome benchmarks to measure I/O; however, I had showed them numbers from the following utility.

For the following image: I have two VMs in the same OU and rebooted with some idle time to close teams and for Windows to calm down. The left is on a Paravirtual storage controller and the right is the LSI Storage controller. This shows improvement in the CPU and improved I/O.

Here is the Source on where I found my guide. Maybe someone wants to give it a shot in the future.

I’m awaiting to see if we are switching to the XCP-NG hypervisor to see if there is a difference. At this point I think it’s the older hardware that’s causing this our problems. Our devs who are vouching for XCP-NG are on the newest Dell hardware.

When you say they are using teams, are they using it for video?

Further question is what protocol the clients are using?

Ultimately if your users are accessing any kind of rich content, eg even a website with an autoplaying video, perhaps some moving graphics using webgl… This is hard for a CPU to render.

And the CPU also needs to encode what the client is seeing, probably as some kind of video stream, a fair amount of load by itself - all GPUs include some kind of encoder block to offload this too nowadays, whether is is quicksync, nvenc or the amd equivalent.

And it’s having to do all that in software.

My vdi experience is limited, but I am running a couple of windows session hosts at work with thin clients. I did a simple test - load up our homepage in internet explorer (which has some flashy graphics and a background video) on each thin client until the host is saturated.

With no graphics acceleration it choked on just four instances - with a quadro p2200 assigned it would sustain 11 instances on that otherwise identical host.

I have trained them about using teams on their VM, they are informed to not try using it for calls or video. I imagine a couple of them tried.

I wanted to provide an update to future sysadmins or anyone looking into a vdi solution without handing over lots and lots of money to Nvidia GRID and VMWare or finding out AMD’s version requires a different hypervisor and hardware.

A few months ago my staff had a very horrible experience with the vdi on vSphere 7 and if you were following this thread you know the story.

At some point we had too many cooks in the kitchen saying it’s the hypervisor, it’s the hardware, or xyz. My suggestion, buy two Precision 3930 Rack Workstation with 128 GB of RAM (Non-ECC but that doesn’t matter for this edge case). The main issue was lack of graphical processing.

In the configuration we have power redundancy and went with a Intel Xeon E-2288G (8 Core, 16MB Cache, 3.7Ghz, 5Ghz Turbo w/UHD Graphics 630). For the uninitiated, the G in the model number means there is an onboard UHD Graphics similar to the i7’s and i9’s. So why not get an i9? They are the same, I just prefer Xeon for the servers ;).

Anyways, let’s not debate on the CPU but rather “was this small UHD enough to fix Desktop lag?” the answer is a huge YES. We went with the XCP-NG Hypervisor made a master pool and a huge storage solution and used all of the NICS to transfer all of the VMs from the older servers over the network to the new servers. This was a great achievement and ultimately works for our case. You can watch videos do photoshop, I think we need a third as two seems to be a single point of failure. Of course now I have to consider making GPO adjustments like don’t allow people to shutdown their computers. Best to also flipping on “Power on at boot” in the XOA management in case the Admin falls under the same fate.

Don’t expect to play games on just the Intel UHD Graphics, but at least it can handle programmers to accounting and lots of excel spreadsheets.
I don’t recommend this to be everyone’s solution, this small UHD change made the difference for us.

3 Likes

Are you saying the UHD Graphics 630 in the Xeon can function as a shared GPU? This is not PCI pass through correct? That would help me a lot, as our VDI has had some of the same performance issues you have. I have only been able to mitigate graphics lag by getting servers with higher single thread performance, but that has limited effectiveness.

Has anyone tried these processors with Proxmox?

Hi @thetrick,

I’m currently not using the GPU as a pass through and the resource is shared among the VMs. I didn’t have to jump through licensing hoops to get it working either in case you’re wondering and no extra drivers besides the XCP-ng tools.

I have not tried using proxmox on it.

1 Like

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.