Xeon GOLD 6246 performace?

zyxep · September 13, 2021, 1:02pm

Hi,

I watched Wendell’s video about 15 Million IOPs and it started some thinking in my head.
we (my work) are currently migrating to a new datacenter (on-prem) where we have servers with an Xeon E5-2640 V4 @ 2.40Ghz over to an Xeon Gold 6246 @ 3.30 GHz.

We are based on a VMware environment and we have done some performance testing, even got some external consultancy on to optimize some database softwares we uses.

But we are not quite happy about the performance on this new CPU.
But i mean, CPU between Q1 2016 & Q2 2019, shouldn’t it make tomcat and cassandra in this case faster almost out of the box?

Or is the CPU from 2019 so “fast” that the apps we are using can’t keep up? (Just like Wendell said in his video about 8380).

I mean, our new CPU 900 MHz faster on base-clock, that should give a significant increase in speed, wouldn’t it?

thro · September 13, 2021, 2:11pm

900mhz = what… roughly 35%.

You need at least 10% to really notice without benchmarking the thing. And that’s assuming you were 100% cpu bound (and not IO or network throughput bound) before.

I don’t know your specific use case, but the classic case for VMware deployments (including mine at the moment) is that you run out of disk IO performance and get bottlenecked there because the tools for measuring it are crap, and the promised performance metrics for a lot of NAS/SAN boxes are very hand-wavy because storage is such a “well, it depends!” black art.

Also be aware that VMware performance graphs are… of questionable accuracy.

You can throw more and more cores at a VM and (the default, CPU graph) performance monitoring will show you more and more free CPU inside the guest, but it can mean that the host was simply unable to service it.

Also check that if you upgraded CPUs and are using Resource Pools that you’ve expanded them appropriately.

That’s a another black art in itself.

wendell · September 13, 2021, 2:53pm

This.

VMware did this thing with 7 where they decided everyone has flash-based san and host-based caching no longer makes sense. Is that the case with your app?

If you can add some local flash storage to a host and add a datastore do it and use that as a baseline for testing. Suddenly you may be able to get a flash based san.

Work toward understanding where your bottlenecks are.

Database usually means transactional which means confirmed writes before returning. In a SAN environment that’s at least 4 round-trips to the SAN. On a spinning rust SAN in this day and age that will be glacial.

You can try a 3rd party i/o filter for esxi 7 that will add back host caching, if you have flash on which to cache in each host, but that makes admins and even consultants nervous because bugs will lead to datastore corruption. It’s safer to test by creating a local flash datastore just to sanity check before heading down that path. Or just put the $ toward a small flash-based san.

thro · September 13, 2021, 3:04pm

Oh really. I’m on 6.7 at the moment and have host-based caching.

It’s read-only and largely ineffective, so far as i can see, but nice of them to add it in vSphere 6 (IIRC) and then kill it next major release.

Been a vSphere admin since 3.5 and it seems that VMware are pretty determined to piss off their customer base with the license shenanigans, forced upgrades to enterprise plus, being dicks to deal with porting licenses within our different subsidiaries, etc.

(e.g., we bought some licenses for a subsidiary, we can’t port them back to the parent company now they’re unused and we host our child company’s stuff, we need to buy again… ffs… going hyperV because spite as much as anything else).

As an aside, at least with the limited testing i’ve done thus far, hyperV seems more performant, too (and given we’re paying for system center for SCCM anyway… SCVMM is free…). The vSphere licenses are paying for 2x as many hosts.

wendell · September 13, 2021, 3:08pm

Give xcp-ng a try. You might the surprised at some of the features and migration stuff.

It’s crazy how much faster vsan is but then again maybe not. VMware still leads in features and performance. Often by 10%+ even over hyperv. Hyperv out of the box allows dangerous write back caching in a lot of semi out of the box iscsi scenarios so make sure that’s not actually your perf gains

thro · September 13, 2021, 3:11pm

Cheers for the heads up. The performance was based on a single box deployment vs. standalone ESXi (no san, local SSD only), it just felt snappier

But yeah will keep an eye out for the cache issue, i’m going to be doing iSCSI to PureStorage and legacy netapp.

Agree vSphere has the feature set but…

one OS vendor to blame. we’re stuck with MS as it is for the guests so…
license cost saving buys 2x the boxes = negates the performance difference… i have the space to do that right now…

wendell · September 13, 2021, 3:16pm

Server 2022 changes the licensing game iirc. Idk if they went through with it. No more free and the cals are weird again

thro · September 13, 2021, 3:31pm

Sigh…

zyxep · September 13, 2021, 8:37pm

So many awesome replies, thanks, would be hard for me to quote so many things here.

But we are also v6.7 based as far as i remember.
The hardware specifically is Dell VXRail, so server with built in storage in a configuration where it uses a NVMe for caching and then SSD’s as storage.

I’m not sure what our old VMware cluster is other than it’s a LUN around PowerEdge servers.
We are just surprised that our guests machines doesn’t seem to leverage the cpu(s) properly, like VMware isn’t putting enough to them.

We are trying to gather more stats from our vendor and then taking action from there with our database consultants.

I was just curious if these CPU’s from 19, would make any difference.

thro · September 13, 2021, 10:03pm

If your application inside the guests can make use of advanced CPU features make sure you have raised the clusters EVC level to a point where it supports them.

EVC is a compatibility feature so you can have mixed generations of processor in a cluster but it hobbles all your cores to the lowest feature set. Possibly cutting off advanced cpu features that could help.

Also raise the vm hardware compatibility level to that for 6.7 if the VMs are older or were created with a lesser compatibility level.

zyxep · September 14, 2021, 9:23am

it’s a brand new cluster with brand new VM’s based on Ubuntu 20.04.

thro · September 14, 2021, 9:49am

Check the virtual hardware version on the virtual machine, if its old it will potentially limit performance. Less likely though, but worth checking.

system · June 15, 2022, 3:50am

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.