WD SN750 reset bug?

Hi folks,

I have a WD SN750 PCIe NVMe SSD that I believe has a bit of a reset bug. It works great when passed through to my Windows guest. I can reboot the guest, but if I shut down the guest and try to start it again, it hangs.

It could be specific to my platform, which is an ancient Asus P9X79 Pro motherboard and Xeon E5-1680 v2 processor. (It’s my office “beater.”) I should bring it home and try it in one of my newer systems…

Anyway, I swapped it out for a Inland Premium drive and it works fine. Curiously, I had a WD SN530 in this system previously and that worked fine, too. I wonder what’s so different about the SN750?

Anyone else ever experience this issue or something similar?

Mike

Interesting. I have the same drive on a Z590 board and I came across your post coincidentally. I have two VMs, one with W10 and one with W11. The 11 one I can reboot just fine, the 10 VM only reboots fine when the stars align right.
The difference between them: The W10 VM uses the 750 drive, while the 11 VM does not.
Must be a weird controller bug or something.

1 Like

This isn’t really an answer to your question, but FWIW.

I have an SN750 in my current system, which originally ran an Asus X99-WS/IPMI; Windows would throw a BSOD caused by the PCIe root complex once every couple hours or so, and looking in the event log showed several hundred WHEA Event ID 17 warnings per second:

A corrected hardware error has occurred.

Component: PCI Express Root Port
Error Source: Advanced Error Reporting (PCI Express)

Bus:Device:Function: 0x0:0x2:0x0
Vendor ID:Device ID: 0x8086:0x6F02
Class Code: 0x30400

TL;DR, the SN750 was causing the root port to report a lot of PCIe link errors, most correctable but some would crash the system. I’ve upgraded to a Z590 Maximus XII Apex, and it still throws a warning every now and then but I guess the better PCIe integrity gave it enough margin to not fail completely. It’s been limping along for a year and a half now without further ‘issue’.

I know it’s basically unrelated, but the problems I’ve had, and things I keep hearing like the issues you’re having now, I’m not sure I would trust the controller on the SN750 drives for use in anything I even remotely care about.

1 Like

An update of sorts. I recently encountered the same exact bug with the original SN530 drive in an H670 (LGA1700) board. So the passthrough/reset bug doesn’t seem limited to the SN750 drive and older platforms.

Another update. I’ve done some more testing with a variety of SSDs. Lo and behold I have the same issue with a Samsung 980 (non-Pro). What do all these drives have in common? HMB. So if you’re going to pass through a PCIe NVMe SSD, make sure it doesn’t use HMB. Most drives that come with DRAM should be fine.