Intel Optane 800p Clear Write Protection After Reaching Lifetime Writes

Soo, I have an Intel Optane 800p 118G drive that started exhibiting some interesting behavior. The SMART is:

Model Number:                       INTEL SSDPEK1W120GAH
Serial Number:                      REDACTEDREDACTEDREDA
Firmware Version:                   K4110410
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
NVMe Version:                       <1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          118,410,444,800 [118 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            5cd2e4 66c2140100
Local Time is:                      Sun Feb 25 20:23:00 2024 PST
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0006):   Format Frmw_DL
Optional NVM Commands (0x0046):     Wr_Unc DS_Mngmt Timestmp
Log Page Attributes (0x02):         Cmd_Eff_Lg
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     3.60W       -        -    0  0  0  0  1000000   50000
 1 +     2.50W       -        -    0  1  0  1  1000000   50000
 2 +     1.80W       -        -    0  2  0  2  1000000   50000
 3 -   0.0080W       -        -    0  0  0  0  1150000   50000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- media has been placed in read only mode

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x08
Temperature:                        51 Celsius
Available Spare:                    100%
Available Spare Threshold:          0%
Percentage Used:                    105%
Data Units Read:                    290,395 [148 GB]
Data Units Written:                 749,243,228 [383 TB]
Host Read Commands:                 3,969,933
Host Write Commands:                9,567,877,299
Controller Busy Time:               0
Power Cycles:                       39
Power On Hours:                     8,416
Unsafe Shutdowns:                   11
Media and Data Integrity Errors:    0
Error Information Log Entries:      0

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

Self-tests not supported

The strange behavior was, it was working fine and then all of a sudden performance dropped to <5MB/s. Obviously the smart data says there is a critical warning and the drive is in read only mode. Well itā€™s not actually read-only itā€™s just very slow writes. I donā€™t know how much/if there is actually any spare cells on the memory (or however you would phrase it), but the smart data says 100% left and also says there is no recorded media errors. So, I did a ton of searching around and found this intel doc which I think is technically for different optane drives, but it mentions that at 105% used life the drive goes into a write protect mode that throttles performance to <30MB/s:

ā€œPercentage Usedā€ Indicator
This is IntelĀ® Optaneā„¢ SSD endurance indictor. It will be zero when the SSD is first installed, and may
remain at zero for a while. For more details on why this occurs, see this Technical Advisory:
Intel Support
center-ssds.html
As the SSD endurance is consumed, the indicated value will increase.
A value of 100 indicates that the estimated endurance of the device has been consumed, but may not
indicate a device failure, as the value is allowed to exceed 100. Once the value reaches or exceeds
105, the drive will enter write protect mode, in which write bandwidth maxes out at <30MB/sec.
Any value from 95-100 should be interpreted as a warning that endurance consumption is
approaching its maximum; once it reaches 105 performance may begin to suffer.

Now, this drive has been used and abused in various workload tests and as a zfs log in an NFS write intensive environment. Apparently inefficiently so since it was hardly ever read from in comparison, but Iā€™m still a relative zfs novice so please withhold your shame fingers. I digress, It has since been repurposed (the first ā€˜Rā€™ is ā€œresuseā€ right?) and for what I need now, iā€™m not concerned about if the drive starts experiencing errors or the 3d-xpoint starts failing, I just want it to maintain as much performance as it can (basically transient OS drives).

So to the point, I have been trying to find a way to, I donā€™t know, clear that critical/write-protection flag of even just the SMART data. Obviously, I have basically come up empty since I donā€™t think youā€™re supposed to be able to do that. I guess since the drive is basically useless maybe I could try using the nvme cli tool to dump the firmware and try to modify it to not care about the written data? Or maybe there is some magic register I can write to using nvme to set the ā€œData Units Writtenā€ back to 0? I guess, iā€™m posting to just open the floor for ideas, suggestions, guidance, etc. So, let me know if you have any! Thanks in advance!

4 Likes

Interestingā€¦ I wonder for which Optane SKUs this behavior applies to, because Iā€™ve heard some writing well past (more than double) the warranted amount of data without any slowdowns. I wonder if this is actually an artificial limitation. And if so, this is the first Iā€™ve seen anyone encounter it in the wild.

Was a common ā€œfeatureā€ on Intelā€™s NAND flash SSDs, too. Once they hit the TBW limit they would go read-only and you couldnā€™t even write to them at all. Iā€™ve never heard of one of them being reverted back into an usable state afterwards, but thatā€™s just me.

This is pretty concerning for anyone who bought optane for the thrash/scratch drive. I think we need a solution.
I wonder if anyone at Solidigm can shed any light on whatā€™s going on? @wendell seems to have some contacts there, and would probably also be interested in the long-term longevity of optane storage? I hope.

Bumping this to theorize as to whether it blows a fuse or just stores a value somewhere on the drive?
I wonder if itā€™s possible hotwire flash the chips to zero/unprogrammed? I would think it needs to store that stuff somewhere, so unless itā€™s media in the controller that holds the smart data, hotflashing the chips themselves might make it possible to reset the counter, and if itā€™s not a physical change, ie a blown fuse to permanently set the read-only flagā€¦

Even if it is a permanent state change, resetting the counter when it hits 75% used or so might be a viable option.

:croissant:But, I am a slug and do not understand such complexities, so maybe Iā€™m just making a fool of myself. :yay:

It would actually be a really great test of the old optane media to keep pushing a drive like this until itā€™s unusable too. Intelā€™s ratings are quite conservative for their early optane drives, and I donā€™t think they were based on the expectation of media failure so much as not wanting to risk warranting a new and still untested technology for such a heavy thrashing scenario until itā€™s characteristics in the market were better understood, or so it seems to me.
I and We would very much like to see how far this media can be pushed.

If itā€™s stored on the 3D XPoint media like the rest of the data and metadata, and Intel is telling the truth about its unpowered data retention period (3 months), then plug it in in 3 yearā€™s time (assuming engineers tend to engineer for ten times the warranted characteristic) and see if it still remembers that itā€™s supposed to be read-only or that itā€™s been written to death once before. :slightly_smiling_face:

Once the P4800X has reached its write endurance rating, it is specified for unpowered data retention of 3 months, which is standard for enterprise SSD. Intel doesnā€™t say anything directly about data retention when the drive is new, but the do caution that the drive will perform background data refreshing. When the drive is powered on, it will devote more time than usual to background data refreshing for a period of about three hours, to clean up any data degradation that may have occurred while the drive was off for an unknowable period of time.

Ugh. I also saw this from the corner of my left eye. I should have not bought the 905P on sale after all and spent 50% more to get the P4800X.

When the P4800X exhausts the rated write endurance, it will switch to a ā€œwrite protectā€ mode where writes are throttled to 30MB/s. This is a gentler end of life strategy than the hard read-only mode that some of Intelā€™s previous drives have used. The mode switch happens when the ā€œPercentage Used Estimateā€ SMART indicator reaches 105%.

Sauce: Intel Optane SSD DC P4800X 750GB Hands-On Review

Q: How well do Optane SSD drives perform for long term file storage? Is the performance better than NAND for this application? Is there an expected failure time for long term storage of dead files? If I save a file and donā€™t access it for three to five years, will it still be there when I do need it?

A: Optane SSDs have the most benefits for acceleration application for dynamic data. Optane data persistence is the same as any other enterprise SSD, having data retention specifications of a minimum of 3 months in a power-off condition and end-of-endurance life. Optane endurance is significantly higher than NAND SSDs with a capability of up to 60 drive-writes-per-day, as compared to 3 drive-writes-per-day with Intelā€™s highest production NAND SSD today. NAND or HDDs may be better aligned for storing static, cold data.

Sauce: Intel Optane Memory AMA Recap

Now for the potentially good/bad newsā€¦

What is important is that the electrical characteristics have been made public, similar to Intelā€™s announcement regarding the first generation. The read delay time is 100ns and the write delay time is 500ns, the same as the first generation. The data retention period was announced for the first time as 7 years (at 40Ā°C). Moreover, it is not very long (normal non-volatile memory products have a data retention period of 10 years (85ā„ƒ)). Rewrite cycle life was again not published. Power consumption was also not disclosed

Sauce (originally Japanese): 怐ē¦ē”°ę˜­ć®ć‚»ćƒŸć‚³ćƒ³ę„­ē•Œęœ€å‰ē·šć€‘3D XPointćƒ”ćƒ¢ćƒŖ恮å…Øä½“åƒć€ć‚ˆć†ć‚„ćę˜Žć‚‰ć‹ć«ć€‚Micron恌ē¬¬2äø–ä»£ć®ęŠ€č”“ę¦‚č¦ć‚’å…¬č”Ø - PC Watch

This was released just a few months ago. So if Micron is getting back into the 3D XPoint business, then you donā€™t really care that your 800P is dead. If micron isnā€™t, and there are no comparable technologies coming to market, youā€™re waiting a decade for your 800P to maybe forget that itā€™s supposed to stay dead.