I watched Wendell’s video about 15 Million IOPs and it started some thinking in my head.
we (my work) are currently migrating to a new datacenter (on-prem) where we have servers with an Xeon E5-2640 V4 @ 2.40Ghz over to an Xeon Gold 6246 @ 3.30 GHz.
We are based on a VMware environment and we have done some performance testing, even got some external consultancy on to optimize some database softwares we uses.
But we are not quite happy about the performance on this new CPU.
But i mean, CPU between Q1 2016 & Q2 2019, shouldn’t it make tomcat and cassandra in this case faster almost out of the box?
Or is the CPU from 2019 so “fast” that the apps we are using can’t keep up? (Just like Wendell said in his video about 8380).
I mean, our new CPU 900 MHz faster on base-clock, that should give a significant increase in speed, wouldn’t it?
You need at least 10% to really notice without benchmarking the thing. And that’s assuming you were 100% cpu bound (and not IO or network throughput bound) before.
I don’t know your specific use case, but the classic case for VMware deployments (including mine at the moment) is that you run out of disk IO performance and get bottlenecked there because the tools for measuring it are crap, and the promised performance metrics for a lot of NAS/SAN boxes are very hand-wavy because storage is such a “well, it depends!” black art.
Also be aware that VMware performance graphs are… of questionable accuracy.
You can throw more and more cores at a VM and (the default, CPU graph) performance monitoring will show you more and more free CPU inside the guest, but it can mean that the host was simply unable to service it.
Also check that if you upgraded CPUs and are using Resource Pools that you’ve expanded them appropriately.
VMware did this thing with 7 where they decided everyone has flash-based san and host-based caching no longer makes sense. Is that the case with your app?
If you can add some local flash storage to a host and add a datastore do it and use that as a baseline for testing. Suddenly you may be able to get a flash based san.
Work toward understanding where your bottlenecks are.
Database usually means transactional which means confirmed writes before returning. In a SAN environment that’s at least 4 round-trips to the SAN. On a spinning rust SAN in this day and age that will be glacial.
You can try a 3rd party i/o filter for esxi 7 that will add back host caching, if you have flash on which to cache in each host, but that makes admins and even consultants nervous because bugs will lead to datastore corruption. It’s safer to test by creating a local flash datastore just to sanity check before heading down that path. Or just put the $ toward a small flash-based san.
Oh really. I’m on 6.7 at the moment and have host-based caching.
It’s read-only and largely ineffective, so far as i can see, but nice of them to add it in vSphere 6 (IIRC) and then kill it next major release.
Been a vSphere admin since 3.5 and it seems that VMware are pretty determined to piss off their customer base with the license shenanigans, forced upgrades to enterprise plus, being dicks to deal with porting licenses within our different subsidiaries, etc.
(e.g., we bought some licenses for a subsidiary, we can’t port them back to the parent company now they’re unused and we host our child company’s stuff, we need to buy again… ffs… going hyperV because spite as much as anything else).
As an aside, at least with the limited testing i’ve done thus far, hyperV seems more performant, too (and given we’re paying for system center for SCCM anyway… SCVMM is free…). The vSphere licenses are paying for 2x as many hosts.
Give xcp-ng a try. You might the surprised at some of the features and migration stuff.
It’s crazy how much faster vsan is but then again maybe not. VMware still leads in features and performance. Often by 10%+ even over hyperv. Hyperv out of the box allows dangerous write back caching in a lot of semi out of the box iscsi scenarios so make sure that’s not actually your perf gains
So many awesome replies, thanks, would be hard for me to quote so many things here.
But we are also v6.7 based as far as i remember.
The hardware specifically is Dell VXRail, so server with built in storage in a configuration where it uses a NVMe for caching and then SSD’s as storage.
I’m not sure what our old VMware cluster is other than it’s a LUN around PowerEdge servers.
We are just surprised that our guests machines doesn’t seem to leverage the cpu(s) properly, like VMware isn’t putting enough to them.
We are trying to gather more stats from our vendor and then taking action from there with our database consultants.
I was just curious if these CPU’s from 19, would make any difference.
If your application inside the guests can make use of advanced CPU features make sure you have raised the clusters EVC level to a point where it supports them.
EVC is a compatibility feature so you can have mixed generations of processor in a cluster but it hobbles all your cores to the lowest feature set. Possibly cutting off advanced cpu features that could help.
Also raise the vm hardware compatibility level to that for 6.7 if the VMs are older or were created with a lesser compatibility level.