So for vsan the writes will sync to other cluster members. you can disable that but that wait for other vsan members to confirm the write is perhaps the source of the slowdown?
Do you have a non vsan data store you can test to confirm?
As far as testing writes snyc, would that be the disable object checksum within storage policy? I did run a test with that disabled and I still had similar low write speed.
We are currently evaluating a two node VSAN ESA setup with two HPE DL385 g11 servers, both populated with dual 9274F epycs and 24*64 GB dimms.
The nodes each have 7 Samsung 7.6 TB PM1733 and four 25GBps nics, we run a stretched cluster with an external witness.
We are seeing about 750.MB/s write performance in CDM with Robo raid 1.
I am also observing higher than expected latency, disabling the redundancy in the policy and forcing the data to be stored on the preferred node does give us a 20% performance boost on sync writes however this is still way short of the single disk performance.
I will be running some tests with HCI bench this week and looking into the disk topologies and potential tuning options.
So far I’ve observed the following:
Data seems to be striped and mirrored within a node, and then also mirrored across the nodes for redundancy.
The disk topology causes a single write to be written
On multiple mirrors within and across nodes.
Performance scales with sequential write speed for the individual disks.
9K MTU does not impact sequential write performance on our setup, we seem to be hitting a hardware bottleneck.
The platform does feel responsive despite the lacklustre benchmark scores. We need to validate the behaviour with some more concurrent workloads.
Two node (and?) stretched clusters cannot enable RDMA which does not help matters.
A write test with HCIBench generated 3 GBps throughput on the cluster while peaking at 13k iops
Read peaks at 700k iops
Raid 5 with multiple nodes should perform better and scale well. (Still need to validate this claim.)
Are you able to run HCI bench on the platform?
Could you see if performance improves with one node powered off?
Are you able to test with direct attached connections between the nodes?
Small update:
I’ve been able to run HCI bench against our two node cluster with 6 disks each assigned to vsan esa.
The results are largely okay, but the througput is disappointing for us.
Our current policy:
Site disaster tolerance Site mirroring - stretched cluster
Failures to tolerate No data redundancy
Number of disk stripes per object 1
IOPS limit for object 0
Object space reservation Thick provisioning
Flash read cache reservation 0%
Disable object checksum No
Force provisioning No
Encryption services No preference
Space efficiency Compression only
Storage tier All flash
The result:
Datastore = VSAN
JOB_NAME = job0
VMs = 6
IOPS = 251148.12 IO/S
THROUGHPUT = 981.00 MB/s
R_LATENCY = 0.62 ms
W_LATENCY = 1.09 ms
95%tile_R_LAT = 1.00 ms
95%tile_W_LAT = 1.00 ms
So you are using 25 Gbps which should allow the setup to reach 3+ GBps throughput in the vSAN backend. You could try connecting the host with a DAC cable to eliminate buffer or flow control issues in the switch. I would however update vSAN to 8.0 update 2 first, it improved our write performance.
It also includes some additional insight and monitoring which might prove helpfull.
We’ve updated vSAN to 8.0u2 and this has cut our Crystak Disk Mark reads in half :S, we still see 4.5 GB/s for SEQ1M 8Q1T but previously we saw about 10 GB/s there .
The sequential writes have improved, the throughput with HCI bench is now between 1500 and 1600 MB/s this is also the case in CDM where we previously hit about 900 MB/s.
Performance for parallel workloads seems fine however.
I suspect our dual 25GB/s links for vSAN and 50Gbps Inter Site Link are the limiting factor for the 2 node setup since when iops and cluster throughput increases write latency shoots up. I think Wendell was spot on thinking it seemed a latency issue, so testing with a DAC cable would be wise.
You can configure this without downtime if you’ve got two uplink for vSAN traffic.
Based on this Dell writeup and our own experiences stretched cluster really benefit from more bandwidth and lower latency.
vSAN ESA uses only 1 active! vmKernel interface for vSAN traffic so configuring 2x 25GBe seems to still result in max throughput of 1500 MB/s for two nodes in raid 1 this is due to replication traffic between nodes eating up half the bandwidth. We see the vSAN back-end hit 3+ GBps throughput when running workloads, this cluster througput should scale with more nodes since your highway has more lanes available when more nodes are participating.
random rant:
VMware is really doubling down on the marketing about ESA being awesome but for me it was a bit of a let down. VMware is also typically tight lipped and evasive about performance we’ve got no official response on our initial results despite engineers mentioning they were lower than expected.
VMware keeps pointing to HCI bench there is however virtually no reference material available to validate your results. Parts of HCI bench are also not updated for ESA so the potential problems reported by the tool can be misleading. vSAN ESA is not a bad product but if I had the freedom to buy “non enterprise” stuff as our manager puts it I would’ve gone with a Truenas system since we would be able to optimize it for our workloads.