NVMe + Linux stability issues

@Mastic_Warrior thanks, but it doesn’t appear to be relevant
@cekim the firmware is up to date and tried multiple secure erases :expressionless:

That was a question, not a suggestion… I’m in the same boot… can’t reboot my machine without hard reset.

What are the temps of the drive? As a last ditch effort, keeping temps down might help. Something like the cryorig heatsink, or a Waterblock. Personally, I keep my 960 pro under water just so I don’t have to deal with some edge cases that might cause throttling.

@Zerophase I am going on a slight tangent here; i hope u r only cooling the controller? Because lower operating temperatures are bad for NAND endurance. ref: https://www.anandtech.com/show/9248/the-truth-about-ssd-data-retention

So, I’ve had a break through with may stability issues.
I took my computer to the shop I bought it from and we tried swapping out parts.
Changing the motherboard solved the problem :slight_smile:

Unfortunately we didn’t have another motherboard of the same model around but we tested it with a Asus Z370-P and it was all good.
So, I returned my Gigabyte board and purchased the Asus board.

I have a 960 pro 512gb m.2 with Ubuntu 18.04 installed. I previously had a 850 pro 256gb sata ssd with 16.04 installed.
Both had no issues. If you want me to verify any software, firmware or configs let me know.

It’s like 50c under load and I’m sure I’ll be replacing this drive within 5 years.

Was it a used motherboard? Sounds like a broken pin.

@SudoSaibot: thanks but this is clearly motherboard related.

@Zerophase: it was a brand new motherboard and a broken pin doesn’t make sense because it was stable on Windows.

Glad you got it fixed :slight_smile:
We tend to trust hardware is good and often dont have drop in replacements to test. I have only had it happen to me once when my MB in a Q6600 system started corrupting the BOOT SSD and I needed a platform update because it was all EOL, well overdue to update at least.

@thoughtlessruvi I usually avoid Gigabyte they’ve always seemed like a reliable bargain brand to me. I usually just stick to MSI, EVGA, and ASUS. I’ve had a great experience with ASUS boards and cards so far, but I hear their customer service is a bit lacking. Might switch to EVGA in the future, I hear they treat you very well.

Update: updating firmware alone was not enough to address the “failed to unmount” issue. Had to also run secure erase. So far so good after that.

2x960pro in raid0 on Asus hyper x16 in Centos 7.4.