Soo, I have an Intel Optane 800p 118G drive that started exhibiting some interesting behavior. The SMART is:
Model Number: INTEL SSDPEK1W120GAH
Serial Number: REDACTEDREDACTEDREDA
Firmware Version: K4110410
PCI Vendor/Subsystem ID: 0x8086
IEEE OUI Identifier: 0x5cd2e4
Controller ID: 0
NVMe Version: <1.2
Number of Namespaces: 1
Namespace 1 Size/Capacity: 118,410,444,800 [118 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 5cd2e4 66c2140100
Local Time is: Sun Feb 25 20:23:00 2024 PST
Firmware Updates (0x02): 1 Slot
Optional Admin Commands (0x0006): Format Frmw_DL
Optional NVM Commands (0x0046): Wr_Unc DS_Mngmt Timestmp
Log Page Attributes (0x02): Cmd_Eff_Lg
Maximum Data Transfer Size: 32 Pages
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 3.60W - - 0 0 0 0 1000000 50000
1 + 2.50W - - 0 1 0 1 1000000 50000
2 + 1.80W - - 0 2 0 2 1000000 50000
3 - 0.0080W - - 0 0 0 0 1150000 50000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- media has been placed in read only mode
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x08
Temperature: 51 Celsius
Available Spare: 100%
Available Spare Threshold: 0%
Percentage Used: 105%
Data Units Read: 290,395 [148 GB]
Data Units Written: 749,243,228 [383 TB]
Host Read Commands: 3,969,933
Host Write Commands: 9,567,877,299
Controller Busy Time: 0
Power Cycles: 39
Power On Hours: 8,416
Unsafe Shutdowns: 11
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
Self-tests not supported
The strange behavior was, it was working fine and then all of a sudden performance dropped to <5MB/s. Obviously the smart data says there is a critical warning and the drive is in read only mode. Well itās not actually read-only itās just very slow writes. I donāt know how much/if there is actually any spare cells on the memory (or however you would phrase it), but the smart data says 100% left and also says there is no recorded media errors. So, I did a ton of searching around and found this intel doc which I think is technically for different optane drives, but it mentions that at 105% used life the drive goes into a write protect mode that throttles performance to <30MB/s:
āPercentage Usedā Indicator
This is IntelĀ® Optaneā¢ SSD endurance indictor. It will be zero when the SSD is first installed, and may
remain at zero for a while. For more details on why this occurs, see this Technical Advisory:
Intel Support
center-ssds.html
As the SSD endurance is consumed, the indicated value will increase.
A value of 100 indicates that the estimated endurance of the device has been consumed, but may not
indicate a device failure, as the value is allowed to exceed 100. Once the value reaches or exceeds
105, the drive will enter write protect mode, in which write bandwidth maxes out at <30MB/sec.
Any value from 95-100 should be interpreted as a warning that endurance consumption is
approaching its maximum; once it reaches 105 performance may begin to suffer.
Now, this drive has been used and abused in various workload tests and as a zfs log in an NFS write intensive environment. Apparently inefficiently so since it was hardly ever read from in comparison, but Iām still a relative zfs novice so please withhold your shame fingers. I digress, It has since been repurposed (the first āRā is āresuseā right?) and for what I need now, iām not concerned about if the drive starts experiencing errors or the 3d-xpoint starts failing, I just want it to maintain as much performance as it can (basically transient OS drives).
So to the point, I have been trying to find a way to, I donāt know, clear that critical/write-protection flag of even just the SMART data. Obviously, I have basically come up empty since I donāt think youāre supposed to be able to do that. I guess since the drive is basically useless maybe I could try using the nvme
cli tool to dump the firmware and try to modify it to not care about the written data? Or maybe there is some magic register I can write to using nvme
to set the āData Units Writtenā back to 0? I guess, iām posting to just open the floor for ideas, suggestions, guidance, etc. So, let me know if you have any! Thanks in advance!