Hey everyone,
My company is pushing forward with standing up a multi-node cluster in a colocation site and using VMware’s Site Replication product so that, in the event our main office cluster goes down, it can fail over more-or-less seamlessly to the off-site cluster without too much down time. Our current solution is that our SAN is replicated to a second office 250 miles away from the first, but in case our main cluster goes down, restoring everything from the SAN is completely manual so it’s not really viable.
I’ve been put on as the lead on this project and for the first time since taking this job I feel like I’m actually really out of my depth. For context at this time two years ago I was working in MSP hell fixing label printers and un-fucking Outlook PSTs for the billionth time. Now I’m one of two tier 3 server admins at my current company. I’ve been able to hit the ground running with most things but trying to come up with these specifications for a server cluster that’s probably going to cost over $80k is… phew.
Our current main VMWare cluster is HPe DL360 (gen10 i think) hosts totaling 216 cores and 2.75 TBs of memory across the hosts, along with a pair of PureStorage SAN arrays. The hosts have no internal storage. Right now metrics show CPU usage across the hosts never goes above 30% even during peak times, but memory usage easily exceeds 60-70% during peak loads. 95% average of IOPs is right about 6k, excluding the spikes from when the SANs and Veeam run their backup jobs.
For the colocation site we’re looking at getting 3/4ths to 2/3rds the performance of our current cluster, and want to use internal storage pooled together using VSAN rather than purchase another SAN array. Does anyone have any tips for how to spec out what would be need to cover those needs? Maybe sites that let your contrast and compare different configured systems that might show potential max throughput and such?
Any help is appreciated.