Looking for new m.2 SSD

Hello Forum,

after 2 Samsung pro drives dying on me after just short over 2 years and third one in the process of dying I am looking for something new. Video editing and gaming are the main workloads on this machine with some lama shenanigans here and there.
What is the current consensus on the β€œbest” drives I am looking for 2TB total. Can be 2 or one drive depends on what makes more sense.

What does the smart data look like on those drives? are you just exceeding their write life? if thats the case, then one of samsungs datacenter SSDs would at least have enough write endurance.

If its some other failure, if its a hardware failure rather than a smart related failure, you may try installing two drives, one on the board, and one on a riser card, to see if perhaps the board has some sort of electrical fault.

if you are just looking for a cheap drive. microcenters inland drives are usually on sale and are quite reliable. ive not had any trouble with an sk-hynix drive ive been running for the last year or so. but its almost exclusively games and school work on that machine. nothing heavy.

3 Likes

Just too many I/O errors, the smart test from the useless samsung tool does not even go through before throwing errors^^

2 Likes

Any enterprise grade U.2 will outlast a 990 Pro on average 10 fold (from our after action reports).

What’ crazy is the 990 Pro will outlast WD, Silicon Power, and most other consumer drives 3 fold on average. Yet a U.2 from Samsung, Kioxia, Micron will be 10x better than the 990 pro for read / write intensive work loads.

3 Likes

iirc samsung even has some datacenter class SSDs on m.2 interface.

however, they just reported too many i/o errors, which makes me think that there is something other than simple as exceeding drive write endurance. that feels more like there is some sort of fault like the data between the device and the drive is getting dropped, if this were a sata drive i would try swapping the sata cable, things get a bit more difficult with m.2

@akarin , does the drive still fail smart tests in another machine?

1 Like

The drive show bad sectors on multiple machines they are just garbage.

This is the smart data from one of the drives:

Modellname, Samsung SSD 980 PRO 1TB
Seriennummer, S5GXNX0RB24131K
Drive Type, NVMe
Ergebnis,Byte End,Byte Start,Description,Raw Data,Status
,0,0,Critical Warning,0,OK
,2,1,Temperature (K),324,OK
,3,3,Available Spare,98,OK
,4,4,Available Spare Threshold,10,OK
,5,5,Percentage Used,5,OK
,47,32,Data Units Read,53619034,OK
,63,48,Data Units Written,75555641,OK
,79,64,Host Read Commands,551749733,OK
,95,80,Host Write Commands,1142717970,OK
,111,96,Controller Busy Time,4085,OK
,127,112,Power Cycles,2020,OK
,143,128,Power On Hours,10175,OK
,159,144,Unsafe Shutdowns,57,OK
,175,160,Media Errors,4855,OK
,191,176,Number of Error Information Log Entries,4851,OK
,195,192,Warning Composite Temperature Time,0,OK
,199,196,Critical Composite Temperature Time,0,OK
,201,200,Temperature Sensor 1,324,OK
,203,202,Temperature Sensor 2,331,OK
,205,204,Temperature Sensor 3,0,OK
,207,206,Temperature Sensor 4,0,OK
,209,208,Temperature Sensor 5,0,OK
,211,210,Temperature Sensor 6,0,OK
,213,212,Temperature Sensor 7,0,OK
,215,214,Temperature Sensor 8,0,OK

but bad sectors are bad news and cause problems all day lol

Pretty much SN850X versus 990 Pro for DRAMed TLC NGFF at the moment. Lots of HMB drives with good performance as well, SN770 and a few others of which are TLC rather than QLC.

I’d expect any drive to have errors if the link’s unstable, wear life’s exceeded, or the thermals are problematic, though.

Source? Samsung specs the usual 600 writes for TLC.

id also like to see their source as well. but the spec sheet is only what they are basing the warranty on.

its not all that uncommon to exceed the write rating.

as you can see the usage is nothing out of the ordinary and maybe this batch was faulty I have no idea, thermals are a non issue as well… So I try to rma but since this is samsung nothing will come from it

That’s anecdotal from our roughly 500 machines in active deployment.

Western Digital makes a fantastic game drive, and is definitely faster than the Samsungs but they die under enterprise workloads (we suspect it is heat related).

Silicon Power last about a year in 24/7 deployments

Samsung has had a notable increase in drive failures since 2020.

The Pro line is better, but as stated: enterprise is the way to go.

I recommended a U.2 as the heat dissipation is grossly superior.

Heat kills M.2’s at a wild rate.

1 Like

I agree on heat but in my pcs that should not be an issue but yeah I would love to see better consumer motherboard designs taking into account m.2 cooling

If it’s a 2020 or 2021 drive, it’s likely just trashed and done.

But we’ve seen an increase in Samsung failures.

Drives are cheap, data is not.

If a drive is suspect, we pull and will use it for one of my media PC’s at the house.

Some are great and we never see another problem for a couple years but most are no longer recognized within 1 year of first acting up.

Samsung killed off their Enterprise sku m.2’s

Kioxia and Micron still make them if you’re married to the form factor.

Not gonna happen, the enterprise abandoned M.2 in favor of e1.s, e3.s, e1.l in that order.

Even U.2 is being phased out for e3.s

Interesting. In our workloads I mostly see 2 and 4 TB SN850Xes bench a bit worse than 2 and 4 TB 990 Pros but run a bit cooler under real workloads while maintaining slightly higher average throughput. Since WD’s using 85 instead of 70 Β°C NAND total thermal margin’s quite a bit wider, particularly as the Samsungs tend to go regularly to ~67 Β°C unless they’re under finstacks with local fans. We’re pretty read heavy, so haven’t written enough to any of them yet to assess wear life.

With enterprise drives running ~3x the power, it rather has to be. Non-rackmount installs tend not to have the LFM U.2 expects, though, which I expect might be the situation here.

Our busy NGFFs are mostly under Cogage H2 or HR-09 Pros, which are probably lower thermal resistance than U.2 shells, but most of what I’ve got suggests lack of IHSes and level surfaces results in controller and NAND packages being the limiting factor.

Logging NGFF temperatures is a good check but, yeah, desktop and workstation type workloads typically don’t run the drives hard enough for thermals to be an issue.

lol I wrote β€œin just over 2 years” the drives are all from march 2022. I had older samsung drives that run 5 years without any issues so I it is what it is.

Yeah I want off this fucking ride where pc workstations are just need parts replaced every 2 years so what do?
There are no motherboards in this formfactor that support does drives no?

Yeah, even servers are getting replaced every 3 years now.

Operating systems are increasingly demanding dedicated TPM, so in the next year probably 80% of all machines in Enterprise environments will be deprecated.

Not because the CPU is no longer supported, or legacy components.

No dedicated TPM module: no AD security GPO’s can be applied and no Windows 11 24H2

I want off this wild ride. Honestly since I do not need a lot of power I will just say fuck it and ride it out on old hardware. You know if those PCIe to U2 cards are any good?

Lol at work our olderst servers are 12 years and still going strong…
Yeah and fuck running anything microsoft in your datacenter

Lots of great contributions.

One resource I have found very helpful is this guide:

It is sourced from a Reddit user who became well known for making buying guides and transitioned the information to his own website.
https://www.reddit.com/r/NewMaxx/comments/dhvrdm/ssd_guides_resources/

1 Like

works great, because it’s dead simple

nice I will source some Kioxia CD8s from work and try if they work with adapters on my mainboard

1 Like

Add a fan, those bastards get hot in most machines without ducting