More videos about cold storage

I felt inspired by the recent talk about VM to Replica IOPS to bring up the topic of cold storage again. A SAN sometimes is fully active storage with no need for cold storage for a small company, but once you get into video production, reliable cold storage is a really important part of a post production house.

I feel like Level1 should explore LTO drives, M-Disc, and plain HDDs (more likely helium sealed) as cold storage options for video. Right now it’s pretty much those who already know just know how to properly cold storage, but there’s a whole generation of YouTubers that don’t cold storage archive correctly and just re-grab from YouTube instead of finding their source files and redoing stuff.

If things are going more the way of vSAN, Cold storage needs equal attention because where would those projects go if there needs to be constant space being taken up by RAW video?

TL;DR: Large scale filesize cold storage is something I feel like needs more coverage. Large scale active storage gets too much attention.

4 Likes

If there was a way to backup the filesystem of unmounted drives then browse them as if they where. So I would know which tape/HDD to mount. That would be neat.

That’s more taken care of by external cataloging, whether through inventory or serial number tracking. I agree there should be an easier way, but the hard work has to be done first logging contents, then the data can be recalled by inventory/catalog number, like a library.

2 Likes

Most serious backup applications do this with catalogues and/or content indexing.

But I haven’t really seen anything video specific. Ideally for video production you’d want some sort of low quality preview content kept in warm storage for browsing without going all the way to tape.

Yeah, they’re called proxies. Avid Media Composer transcodes theses all the time.

1 Like

I think with any data being stored long term on various types of medium is the support of the hardware.

It would be interesting what ideas people come up with.

Looking at hardware support, nothing beats the backwards compatibility of USB.

You sure? RJ45 anyone? :wink:

With regards to cold storage vids, Linus did a (fair) few:
https://www.youtube.com/results?search_query=petabyte+project

HTH!

I thought the OP was taking about the storage system itself? Connectivity is something to consider but I was talking about was things like how old LTO tapes need a drive within about 1-2 gens to be able to read them. That would mean in 30 years if LTO is still around, you may end up having to fix or find an old drive to read your tapes if the one you initially bought broke.

Optical media is getting more and more rare. If it disappears from the market, you’d be in the same boat…trying to fix or find one that works.

In both cases, the media is robust but without a drive to read them, you’re going to have a problem…

Hard drives kind of have the opposite problem. There isn’t really anything special you need to read data from them but they themselves can fail with time.

My backup plan for the bulk of my data could be better but pretty much exclusively rely on hard drives. Many many hard drives.

Yeah, and that’s what I want to be seen from a video on cold storage: Strategies for different data types, or large volumes of data.

1 Like

HDDs are 2x the cost per byte / relative to LTO-6/LTO-7 tape per byte. ($20 / $10 per TB).

An LTO-7 drive costs around $3k.

1 tape drive + bunch of tape becomes cheaper per byte at around 300TB of stored data. (20 drives or 50 tapes).

If you want some kind of redundancy, your end up with 2 tape drives (break even at 600TB of stored data / 40 drives).

Now, what kind of raid do you do?

If you’re storing cold data and large data archives, you could afford to only store e.g. 10% redundancy.

On HDDs, if you have lots of HDDs, you can afford to spread your data and parity chunks using some kind of 2d erasure code (ceph?) such that probabilistically end up having to read data+parity chunks from 8 drives to reconstruct a missing chunk.

With tape, if you want to do some kind of raid to reconstruct something, OMG this is horribly laboriously painful.

If you have a tape library with robots and tens of drives, sure… raid on tape can work, but at that point your paying more per tape than you would for a tape sitting on a shelf, or in some safe somewhere.


There just isn’t many use cases where tapes end up being cheaper or more convenient than drives.

Well… no. If you have enough disk space to temporarily hold copies of several tapes’ worth of data, it’s quite easy. Tell par2archiver what level of redundancy you want, and it’ll generate it for you. Write the parity information to tape(s), and if/when you need to recover from a missing tape or damage spread across multiple tapes, you dump all the tapes to disk again, tell par2 to do its thing and repair it all for you.

Same procedure for discs (CDs, DVDs, Blu-ray), as tapes.

Your hard drives setting on a shelf will stop working in a matter of a few years. Your tapes will be perfectly fine for decades. Your figures fail to include a lot of replacement hard drives and labor to periodically check and repair/replace them.

2 Likes

Stuff like this is exactly why we need a video about this. Some people get stuck in the hard drive mentality and forget about extra prep work you can do for stuff like LTO.

I have several HDD’s (from different brands) that approach the 70k power-on hours in my file server. FYI, that’s almost 8 years. True, I didn’t hammer them like a media production house (LMG!) or a data centre would, but they’re being used regardless. None of them has any significant error rate ATM. SSD’s, yes they loose their charge (and therefore data) over time, but HDD’s are pretty data resilient.

par2archiver or zfec or other similar solutions was what I had in mind… yes…

Ignoring temporary space requirements, for a second… in either case (HDDs or tape with e.g. par2archiver) if you want to have low probability of data loss, and at the same time low redundancy overhead, you end up risking having to read many tapes/drives worth of data during recovery/reconstruction of a single tape/drive worth (e.g. do you want to protect against 2 tapes failing out of 5, out of 10, or out of 20 ; let’s use 20+2 for 10% overhead).

How quickly you reconstruct data becomes a function of how quickly you can read data off of, for example 20 tapes worth of data and/or 20 HDDs worth of data

Even if you have cold storage spread across 2000 tapes, you need to block recovery until you fully read 20 tapes - how quickly is a function of how many drives you have,… and not how big your archival storage is.

With online HDDs it is a much smaller logistical challenge, if you have 2000 online drives, and your data is using 20+2 parity, you can organize your data when writing it such that you can use all 2000 drives simultaneously and read 1% worth of data off of each in parallel, to reconstruct 1 drive worth of data. You would store that reconstructed result spread across the ~2000 drives that you already have.

Recovery is much quicker with online HDDs, because of the increased read/write throughput that scales with the size of your archive (or a bit slower because you’re always adding larger capacity drives as time goes and bigger HDDs appear).

On the other hand, managing a pallet of tapes sitting around doesn’t require power/network/servers. Which increases cost of HDD capacity by around 15-50% per byte once you get into ~1000 drive range (the bigger the scale the lower the overhead, takes about $10M/year to get to 15-20% online overhead).

I said “setting on a shelf.” As in “cold storage” (the topic), not “online,” not getting patrol reads, not having sectors rewritten, and just sitting around losing its magnetic charge.

“If it’s online, it’s not really a backup.”

1 Like