vSAN 8.0 ESA Kioxia CM6-V Write speeds in VM

Hello all,

Im running a gigabyte R283-Z92 rev. AAE1 with AMD EPYC 9374F 768GB ram and 10 CM6-v 3.2 TB KCM61VUL3T20 drives

I have two units running in a 2 node cluster

When I run diskmark in my VM I get around 4500MB reads which i feel is in line but my writes are around 350MB

Any thoughts as to why the writes are so bad

Ill gladly add information as needed im trying to keep this simple to start

Vsan? Sync writes? Maybe it’s replicating the wires according to your ha policy?

I have the storage policy set to raid1 with a fault tolerance of 1

I tried changing the policy to others and always around the same write speed

So for vsan the writes will sync to other cluster members. you can disable that but that wait for other vsan members to confirm the write is perhaps the source of the slowdown?

Do you have a non vsan data store you can test to confirm?

Datastore on Disk
vSAN

Ok so I removed a disk from the vSAN datastore and put a datastore directly on one disk.

I moved the VM over to it and ran diskmark and my writes are what I would expect.

The second pic is with the VM on vSAN.

I’m assuming I have a bottleneck within my setup or is the that the expected write performance on vSAN

Additional information the NICs that are being used for vSAN are Intel E810-XXVDA4 with for links to my core switch ( DELL S5248F-ON) from each server

I really appreciate you taking time to assist.

As far as testing writes snyc, would that be the disable object checksum within storage policy? I did run a test with that disabled and I still had similar low write speed.

After looking over things during my diskmark tests I noticed that I’m getting high write latency which I think is the source of my issue now?

Yeah Just Vsan Things ™

The e810 and switch setup for jumbo frames?

https://kb.vmware.com/s/article/2053145

Some tweaks there allows more data to be in flight.

E810 25g or 100g? Vsan really does benefit from 100g…

But this sounds like a latency/ring buffer type issue

Vsan running alone on those nics also?

E810 4x 25g links to switch per server with all used for vSAN alone on VMkernal

I have jumbo frames on with 9000 mtu on all devices

We are currently evaluating a two node VSAN ESA setup with two HPE DL385 g11 servers, both populated with dual 9274F epycs and 24*64 GB dimms.
The nodes each have 7 Samsung 7.6 TB PM1733 and four 25GBps nics, we run a stretched cluster with an external witness.

We are seeing about 750.MB/s write performance in CDM with Robo raid 1.
I am also observing higher than expected latency, disabling the redundancy in the policy and forcing the data to be stored on the preferred node does give us a 20% performance boost on sync writes however this is still way short of the single disk performance.

I will be running some tests with HCI bench this week and looking into the disk topologies and potential tuning options.

So far I’ve observed the following:

  • Data seems to be striped and mirrored within a node, and then also mirrored across the nodes for redundancy.
  • The disk topology causes a single write to be written
    On multiple mirrors within and across nodes.
  • Performance scales with sequential write speed for the individual disks.
  • 9K MTU does not impact sequential write performance on our setup, we seem to be hitting a hardware bottleneck.
  • The platform does feel responsive despite the lacklustre benchmark scores. We need to validate the behaviour with some more concurrent workloads.
  • Two node (and?) stretched clusters cannot enable RDMA which does not help matters.
  • A write test with HCIBench generated 3 GBps throughput on the cluster while peaking at 13k iops
    Read peaks at 700k iops

Raid 5 with multiple nodes should perform better and scale well. (Still need to validate this claim.)
Are you able to run HCI bench on the platform?
Could you see if performance improves with one node powered off?
Are you able to test with direct attached connections between the nodes?

Small update:
I’ve been able to run HCI bench against our two node cluster with 6 disks each assigned to vsan esa.

The results are largely okay, but the througput is disappointing for us.

Our current policy:

Site disaster tolerance Site mirroring - stretched cluster
Failures to tolerate No data redundancy
Number of disk stripes per object 1
IOPS limit for object 0
Object space reservation Thick provisioning
Flash read cache reservation 0%
Disable object checksum No
Force provisioning No
Encryption services No preference
Space efficiency Compression only
Storage tier All flash

The result:

Datastore = VSAN

JOB_NAME = job0
VMs = 6
IOPS = 251148.12 IO/S
THROUGHPUT = 981.00 MB/s
R_LATENCY = 0.62 ms
W_LATENCY = 1.09 ms
95%tile_R_LAT = 1.00 ms
95%tile_W_LAT = 1.00 ms

Resource Usage:
±--------------------------------±-----------±-----------------±-----------+
| Resource Utilization |
±--------------------------------±-----------±-----------------±-----------+
| vSAN Cluster | cpu.usage% | cpu.utilization% | mem.usage% |
±--------------------------------±-----------±-----------------±-----------+
| vsphere01 | 77.01 | 61.58 | 13.2 |
| vsphere02 | 76.41 | 61.05 | 18.94 |
±--------------------------------±-----------±-----------------±-----------+
| Average | 76.71 | 61.32 | 16.07 |
±--------------------------------±-----------±-----------------±-----------+

We will collect some additional data with less and more disks assigned to vsan in order to obtain data about the scaling with number of drives.

I also want to look into the Numa topology however I am not sure about the topology of the HPE platform and wiring of the drive cages.

What we found via a reddit post was that VMware does not support NVME behind tri-mode controllers.
https://kb.vmware.com/s/article/88722

So you are using 25 Gbps which should allow the setup to reach 3+ GBps throughput in the vSAN backend. You could try connecting the host with a DAC cable to eliminate buffer or flow control issues in the switch. I would however update vSAN to 8.0 update 2 first, it improved our write performance.
It also includes some additional insight and monitoring which might prove helpfull.

We’ve updated vSAN to 8.0u2 and this has cut our Crystak Disk Mark reads in half :S, we still see 4.5 GB/s for SEQ1M 8Q1T but previously we saw about 10 GB/s there :exploding_head:.
The sequential writes have improved, the throughput with HCI bench is now between 1500 and 1600 MB/s this is also the case in CDM where we previously hit about 900 MB/s.

Performance for parallel workloads seems fine however.
I suspect our dual 25GB/s links for vSAN and 50Gbps Inter Site Link are the limiting factor for the 2 node setup since when iops and cluster throughput increases write latency shoots up. I think Wendell was spot on thinking it seemed a latency issue, so testing with a DAC cable would be wise.
You can configure this without downtime if you’ve got two uplink for vSAN traffic.

https://infohub.delltechnologies.com/p/100-gbe-networking-harness-the-performance-of-vsan-express-storage-architecture/

Based on this Dell writeup and our own experiences stretched cluster really benefit from more bandwidth and lower latency.
vSAN ESA uses only 1 active! vmKernel interface for vSAN traffic so configuring 2x 25GBe seems to still result in max throughput of 1500 MB/s for two nodes in raid 1 this is due to replication traffic between nodes eating up half the bandwidth. We see the vSAN back-end hit 3+ GBps throughput when running workloads, this cluster througput should scale with more nodes since your highway has more lanes available when more nodes are participating.


random rant:
VMware is really doubling down on the marketing about ESA being awesome but for me it was a bit of a let down. VMware is also typically tight lipped and evasive about performance we’ve got no official response on our initial results despite engineers mentioning they were lower than expected.
VMware keeps pointing to HCI bench there is however virtually no reference material available to validate your results. Parts of HCI bench are also not updated for ESA so the potential problems reported by the tool can be misleading. vSAN ESA is not a bad product but if I had the freedom to buy “non enterprise” stuff as our manager puts it I would’ve gone with a Truenas system since we would be able to optimize it for our workloads.

1 Like

Hi, did you solve your case? I run on vSAN 8u2 OSA architecture and very same issue. Kioxia CM6 on board as cache drive.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.