Ssd data retention

We have all heard that SSDs can lose data after a period of time unplugged. That’s not the question.

But, the question is: “What is the procedure to refresh the voltages on the cells of a nand SSD?”. Does it just need to be plugged back into a power source periodically? Obviously this would before the degredation of the data (for whatever value of time that is determined to be). If just plugging in the SSD and powering it on isn’t enough to refresh the cells, what is? And is the procedure different for slc, mlc, tlc, etc…

6 Likes

This is a very good question for which I have never found a clear answer, and may be different from drive to drive.
I also would like to know; does it recharge like a battery when plugged into power at all? Does it automatically recharge when the cell is read? Does it need to be rewritten to refresh it’s charge?

As some general advice, I would suggest using a journaling checksummed filesystem and regularly check your data for integrity, just to see. Don’t trust anything important to long term flash until you have a better idea of how it lasts. You can use something like parchive to write recovery data to/for the drive, and then check the data against the recovery datao every 6~12 months and see how long it takes for things to rot/how badly/quickly the rot progresses after the first bad byte, etc.

2 Likes

I wish more people would be asking this question, especially since I see SSDs being recommended for NAS builds which tend to have data at rest for years.

I’m certain that the majority of consumer SSDs won’t even refresh their cells automatically while under power (there are some exceptions to this but they are rare and undocumented).
The good news is that it takes quite along time for actual data loss to occur on SSDs, the problem is that their read speeds decrease as the NAND cells lose charge.

​​​ ​ ​

​​​ ​ ​
As to what will actually refresh NAND cells, I know hdsentinel can do it because I used it to increase SSD average read speed by ~27 times after this benchmark of the read speed degradation:


^^this SSD had a mix of aged data on it with the oldest being 2 years old, power had been applied to it throughout it’s life and GC and Trim had been running.
The SLC cache mechanism “running out” can be mostly dismissed here because of the pattern of slow blocks shown.

For Linux, badblocks should be able to accomplish the same refresh operation.

8 Likes

So if I am reading the image correctly, the variation in color is a variation in read speed for a block? And presumably that is due to voltage or charge degredation?

Sort of, the variation in color is a combination of read+write+read speed of a particular block. Because writing is happening the SLC cache comes into play, but it does not account for the very low average speed of 64.27MB/s of the drive (firecuda 520 is ~600MB/s sequential when the SLC cache is exhausted and everything is being written to TLC).

This was the post I was pulling from:

The other benchmarks on the post are actually 100% read speed, but they don’t correlate file size to read speed which would be an important metric to consider. Also once the speed test is done on the refreshed drive and it achieves the 2519MB/s average readspeed, the file dates become meaningless.

3 Likes

presumably, slow leak will affect more level cells, more quickly than single level cells?

because slow leak means the charge may change the stored value?

2 Likes

Yes, that is definitely true.

NAND wear level is also a pretty big contributor to how fast the leak occurs. Temperature to a lesser degree (pun intended).

huh.

I sem to recall, back around the time MLC was coming out, that there was a whole thing, about drives supposed to last, X months at room temperature

I presumed a change of temperature, did affect the degradation.

Obviously, us humans like to remain around room temperature, but equipment with drives… not so much?

obvs since then, there are QLC and suchlike

like page 26 / 27

but that was way back in the day

and consumer drives (luckily) no longer cost $2/gb

1 Like

or, drawing from the same source, Anadtech did an article 10 years ago, the ending paragraph suggesting:

so for new drives the data retention is considerably higher, typically over ten years for MLC NAND based SSDs. If you buy a drive today and stash it away, the drive itself will become totally obsolete quicker than it will lose its data.

(Source like 10 years old, and info / tech has moved on since…)

Well this is the complicated part. The SSD controllers are advanced enough to apply charge drift offsets (and slower higher precision voltage sampling) to cells to kind of guestimate what the weaker cell’s intended data should be; there are also ECC routines to prevent corruption from happening that are run on weaker cells and ECC outcomes can even be applied to calibrate the cell drift offset tables.---- this is where the read slow down comes from.

Yeah, that’s were the whole, write to NAND while it’s hot and then store it while its cold for maximum benefit came from. I think the other factors like wear level and cell bit depth are much bigger factors but weren’t as well captured in that spec due to the time the spec came out.

4 Likes

I think that is 100% a realistic statement for MLC NAND (doesn’t address read slow down though).
I’ve got a power mac G5 that has some super old 3.5" SATA SSDs that I haven’t used for like 5 years that I’m really tempted to turn on now.

1 Like

There’s also the factor of stacked/not stacked nand. Planar flash loses data more quickly, and the retention time lowers much more quickly with write wear compared to stacked.
I can’t find the document anymore, though; it seems like every scientific paper is being scrubbed from the internet these days.
Anecdotally, I have a planar TLC drive, and it’s the only drive I’ve ever had bitrot on that I know of.

Stacked/3D vs planar NAND is interesting because 3D NAND actually introduced a bunch of new problems but the manufactures mitigated some of the “old” planar specific issues (actually issues specific to small nodes that planar NAND required to achieve reasonable density) by going from the 15nm node for late stage planar to 45nm node for early-ish 3D NAND.

This paper talks in excruciating detail about it:
https://www.mdpi.com/2073-431X/6/2/16

This is a reasonable takeaway from the paper:

"The revolutionary move to 3D NAND resulted in a larger cell size, relieving the impact of traditional reliability issues, but gave rise to new and subtle reliability concerns, owed to the floating body and polysilicon channel of 3D NAND cells. "

2 Likes

The interesting part of page 26 27 seems to be that the client ssd (consumer?) Retain data better than the enterprise SSD, at the same active temp and cold storage temp.

Which is great news for me, maybe. My initial question to create the topic was intended to determine what is needed to keep the data on cheap SSDs readable. I have been practicing 3-2-1 data storage for a few years now, on the cheap. While spinning rust hard drives are the cheapest per unit data, having all spinning rust violates the two formats part. So my other storage media is bluray. (Not ideal, but I use DVDisaster and create eccfiles to help with degredation). However ssd prices are coming down and are occasionally very close to the price per TB of bluray discs. And factoring the density into the equation, 4tb consumer and drives look good compared to 200 ish bluray.

If just having bad bits refresh the nand once every 6 months to a year is good enough, I might switch my new data to SSD. Or have the “2” in 3-2-1 be a 3 and be 3-3-1, with one rust drive, one and drive, and one optical drive.

2 Likes

I think you should run a TRIM command and that should brush over all the cells in the drive to make sure everything is as organized as possible. And, in turn, refresh them since it’s going to go over all of them.

1 Like

Oh, that’s a good idea.

Not asking you directly, more thinking out loud here, but: is there a difference between running trim or using bad bits (as suggested above)? Not that I have any expertise at either.

What I am wondering specifically, is which method would be quicker, or have the least amount of data “written”.

To that last point, is it better for the endurance of the SSD to somehow “refresh” the voltage vs “rewriting” the voltage in the cell? Assuming there is a difference, which might be incorrect anyways.

My guy says that there is either no difference, or the difference is negligible. However, I was VERY surprised to see that consumer SSD might have better data storage ability (pg 26 and 27 in the PDF linked up thread).

And aren’t those the good days? When you learn something pleasantly surprising?

1 Like

Unfortunately TRIM is only a mechanism for the OS to tell the SSD which flash pages of deleted files are okay to clear/pre-erase.
TRIM is quite fast and doesn’t add much wear to SSDs, but all it really does is reclaim performance from the SSD “filling up” with deleted files.

Unless new technology comes along, the only way to refresh NAND cell voltage is to rewrite them, which does add P/E cycles to the NAND which I assume is part of the reason consumer SSDs don’t implement a cell auto refresh subroutine.

That Temperatures and data retention chart from the Jedec talk came directly from Intel ~15 years ago, I strongly suspect it doesn’t reflect the reality of SSDs today.

6 Likes

Well, that last part is disappointing.

In the hypothetical of using cheap but dense flash storage for medium long term (not ultra long term) it seems that running trim might not be needed if the drive didn’t have anything written to it or deleted between refresh of the cells using a tool that would rewrite the cells?

Essentially, if there is nothing to trim, it is a waste?

I don’t think we’ve regressed too much from what that chart shows, even though we’ve gone from SLC/MLC during the chart’s creation to TLC/QLC now.

I’d be comfortable storing data on a modern consumer SSD for 6-18 months, I’d just expect a slow down in the read speed by the end of that time.

correct

3 Likes