NVMe Issues On X570

I put a 970 Pro in a cheap PCIe expansion card. The system immediately became unstable (cursor lag in windows -> monitor went blank, LEDs on GPU turned off). I assumed it was the expansion card so I swapped out that 970 Pro with another 970 Pro on the mobo’s M.2 slot. Same issue.

Ran the NVMe extended self-test in the Gigabyte BIOS. Passed after 15 mins, but the GPU shut off before I could exit BIOS.

All evidence points to a bad 970 Pro, but in the back of my mind, I’m wondering if that cheap expansion card did something to a beautiful piece of engineering.

I used a generic one for years, originally on my x79 system before upgrading to a x399 with onboard slots. Never had a problem. I think I paid $20 for it from Amazon, the key feature I bought it for was the aluminum heatsink.

They’re really simple, and passive as far as I know. I mean, not a lot to screw up just some traces connecting one type of slot to another. It’s certainly possible I suppose, you got a bad one. The check I guess would be to test both drives using the onboard slot?

Yeah, the 970 Pro showed the same behavior in the board’s slot. Plus it showed same behavior in BIOS, so that pretty much rules everything else out.

Just wary of sticking another expensive drive in this expansion card. For $10, it should be just some simple traces. The only thing on the board is a small cap on the power lanes and 2 status LEDs. Not much to screw up.

So why did you put it in there in the first place, and was the system 100% stable before?

And then you mention that the GPU shut off?

I don’t know if the NVME self test is a good test by the way.

I had a cheap M2 expansion card for both my wifi card and M2 SSD. My Samsung 960PRO still works great.

I ordered a Fenvi M2 wifi card expansion once, that was badly designed. Didn’t kill it though.

can you link the expansion card

For reference, here’s the one I’m using

I had a 970 Pro in the board’s M.2 and wanted more 970 Pro goodness so I added the expansion card.

It’s a new build. I can’t say 100% stable, but the second I added the expansion card the mouse cursor began to stutter and the GPU would shut off.

I’m not sure exactly what the NVMe self-test does, but I gave it a shot to see if the issue persisted outside of Windows (which it did).

doesn’t have any voltage regluation chips to change voltage to go wrong, just a capacitor to flatten out the voltage so I don’t think its the card

try reseating the CPU
you can also try running the PCI-E lanes in forced 3.0 mode

EDIT: Skip to post #12

Three days later…it was not a problem with the expansion card or the 970 Pro.

TL;DR
My X570 Aorus Elite was not stable with an NVMe connected to the chipset while a powered USB hub was connected to the CPU lanes. Moving the powered USB hub to a port on the chipset resolved the problem.

Actual Symptom
I did a fair amount of CPU/Mem stability testing when OCing cheap Micron E die to 3666MHz CL16. At the time, there was one NVMe connected to the CPU lanes. For this issue, I loaded some dummy files onto the drives and tested with Aida64’s System Stability Test:

image

It was not immediately apparent that one NVMe on the chipset lanes was an issue, but Aida64 test proved otherwise. It never made it past the 5 min mark before the GPU shutdown.

It was very obvious that a second NVMe to the chipset was a problem. The mouse cursor would lag, and I couldn’t even load Aida64.

Things I Tried

  • Stock settings by clearing the BIOS
  • Forcing PCIe 3.0
  • Three different PCIe 3.0 drives (970 Plus, 970 Pro, Sabrent Rocket)
  • The board’s M.2 slot vs PCIe x4 expansion card

The Clue
After the GPU shut off, the mobo continued to run the fans and RGB. Pushing the power button turned off the fans, but the RGB only dimmed to a faint glow.

I thought it entered a weird power state that was affecting the tests, so I flipped the PSU switch. It continued to glow. I pulled the power cord. It continued to glow. I pulled the entire power strip. It died.

I have a number of USB devices with auxiliary power - BenQ monitor hub, Ohaus scale, multi-function printer, 3D scanner, but the RGB was somehow drawing power from a TP-Link 9-Port USB 3.0 Hub.

The Solution
The hub was connected to one of the four, blue USB ports that ran to the CPU. Moving it to a red USB port on the chipset solved the problem. I’m over an 1:30 with the USB hub and all three NVMe’s playing nice together.

image

CrystalDiskMark reports all the drives run at their spec. I was also curious as to the transfer speed between the drives This is what robocopy reported from loading the dummy files onto the drives for Aida64’s read tests:

image

PS
The powered USB hub solved another issue, namely the poor circuitry on my Blue Yeti.

EDIT: Skip to post #12

Replaced the 970 Evo Plus 1TB on the CPU lanes with an Adata SX8200 2TB. Dozens of restarts and stability testing on my bench. Everything’s fine until I transfer to my desk. “No operating system found.”

BIOS won’t show that the SX8200 is even installed. The only difference between the bench and desk is peripherals, so I started unplugging things from the USB ports on the CPU lanes (blue, back panel).

Taking out the webcam made the drive reappear. Stability test went okay. Two restarts later and no operating system found. It’s an intermittent problem even with one, simple HID device in the blue ports.

Now I have everything running through the hub on the Chipset (red, back panel). Restarted a dozen times and ran a stability test. So far so good.

I’m guessing it’s the IO die on my CPU.

@wendell is always warning us about cheap display port cables…

Turns out the NVMe issues were from a dp to dp cable with 3.3V power on the 20th pin. After a soft power down, the monitor was backdriving power to the GPU leaving the system in a dirty power state. That caused all the weirdness with the drives, and the powered USB hub just made it worse.

I would have never thought of the dp cable, but there’s a lot of reddit posts where a dp cable was reported to cause issues with NVMe’s (at least on Gigabyte mobos).

image

On a dp to dp cable, the 20th pin should be disconnected. Don’t use a cable if there’s continuity between the 20th pins on each end. It helps to have a 30g (0.25 mm Ø) piece of wire (i.e. one strand from a stranded copper wire) to place in one end. One in four of my dp cables have 20th pin continuity.

http://monitorinsider.com/displayport/dp_pin20_controversy.html

This was a very difficult problem to work on. There wasn’t an issue on my bench with a DVI cable, or from a hard power on (everything physically unplugged -> plugged in), or from a restart. The issues only appeared after a second power on with the bad dp cable causing a dirty power state.

A powered USB hub made the instability worse, but seems fine with a different dp cable. I also realized the power supply’s physical switch is useless with powered peripherals. You have to unplug everything.

4 Likes

I just realized… This was probably also GeraldUndone’s weird phantom issue, too, he was having the other day.

My good dp cables are too short. I think I’ll try to salvage this one by removing that 20th pin.

image

Should be easy with my laser welder. Well, turning thin gauge wire into a ball of plasma is easier than actually welding it.

I’ve always wanted to try fixing a broken CPU pin, but I don’t have one that needs fixin (hint hint).

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.