I am running a homelab with a multipurpose Dell Poweredge server it runs my network (untangle) and like I mentioned in the title the server is running the DellEMC customized version of Esxi 7.0update 2
so, here is the problem
esxi boots off my Silicon Power Ace A55 1TB SATA SSD and its working great vms are nice and responsive until I do something like unzip a large file. it will start off great and then the transfer speed will drop to 0 and it will come to a screeching halt. it will transfer at 0. and then if I look at esxi and monitor for that VM the disk latency is in the THOUSANDS I have seen as high as 8,000ms!. I stop the transfer I HAVE to wait about 1-2 minutes while the rest of esxi slows to a crawl even the webui is slow during this time. after it “clears itself” its perfectly fine until the SSD faceplants again
the VM I was talking about-its a fresh install of Windows server 2019
Server hardware specs
Dell Poweredge R320
48GB of RAM
dell PERC H710P mini
the Silicon power Ace SSD I mentioned as well as a 2TB Seagate SAS 7200RPM drive.
the other VM that is running is the untangle VM-used as a Router.
the SSD was purchased in may 2021 (newegg). I will note I had no SSD issues with this drive back when it was only running Windows server 2019 (bare metal no VMs)
If you haven’t already, check to see if there are firmware updates available for the SSD and the motherboard.
I had an older AM2 motherboard I used primarily a fileserver (and Plex), and it had the same issue with a Samsung EVO SSD (even though other SSDs worked fine with it). It was exactly as you described; system would boot and run fine, but under heavy file copies, transfers would stall.
What fixed it in my case was to update the firmware both on the motherboard, and on the SSD. I’m not sure which one fixed it exactly, as I did both around the same time.
Whilst looks like your case is dead hardware… I’m not sure if ESXi supports things like trim properly especially on consumer SSDs; in my experience this sort of gear fares far better with HyperV in terms of both performance and stability (and yes i’ve run both back to back on same hardware - hyperV does far better in terms of thermal management, etc. - at least with ESXi 6.7 when i tested).
If this is something you want to run stuff on reliably for home use, i’d suggest not using ESXi personally. Consider HyperV, Xen, Linux+KVM, etc.
The VMware HCL exists for a reason… whilst stuff not on it MAY work… it may also “sort of” work
That’s why I tried flashing the PERC card back to a Dell perc h710p-dell firmware.
Reinstall esxi this way did help as the dell card should be handling TRIM now. Performance improved but it was still tanking
Then my server starting blinking the fault light on my SSD…that’s when I took it out and found the issue
I actually think I helped kill it. I completely forgot until now that I ran DBAN on it (3 pass) oops. My bad