What threads should I read on File Storage?

I did a search and got very confused. @wendell knows his stuff on this topic but so much of what I have watched goes down a road that is unrelated to what I am interested in setting up for myself.

After watching the recent Linus Tech Tips video where their servers lost a bunch of data again, it got me thinking again.

If someone wanted to setup a reliable, expandable, network storage setup where its only role is to store files onsite long term what would be a good thing to read about this stuff

It doesnā€™t have to be something that you work off of it just has to reliably store files that you arenā€™t currently working on?

Also Where can I read about ways to reliably backup off site something like this without it costing an arm and a leg?

I donā€™t need it for vmā€™s or anything like that any time soon, purely file storage.

ā€œOptimalā€ approach varies by individual needs, this is why thereā€™s a gazillion threads and ways of doing stuff, and this is why things are confusing.

The main things to understand are that raid isnā€™t a backup, and backups are intended to be a copy of your data youā€™re willing to lose, because you still have your main data, and if you donā€™t want to pay for some kind of subscription service you have to fall back to relying on your own knowledge and diligence.

Options for storage Iā€™m aware of:

  • btrfs for small mirrored raid.
  • mergerfs + snapraid for data hoarding
  • zfs for large raid (often TrueNAS scale)
  • ceph on baremetal with one osd per disk for small cloud setups
  • ceph in rook for medium cloud setups (medium= linode, hetzner, digital ocean)
  • pay Amazon, Microsoft, Google and have their people run your own buildings air gapped from their own cloud (e.g. if youā€™re a government of some kind) for single digit billions per year
  • do an Apple and negotiate to pay some hyperscaler double digit billions per year at a discount to store icloud data

Then thereā€™s LizardFS, and other lesser known storage options.

Then thereā€™s various Acronis, BackBlaze, Tar Snap and friends that offer cloud storage.

Then thereā€™s rsync, rclone, restic, duplicity, duplicati, bup, borgbackup,ā€¦used either with your disks, or some cloud storage.

Then thereā€™s StorJ and various other space exchanges with random people.

Then thereā€™s the option of shipping an 18T USB HDD + NanoPi to a friend.

ā€“

Everyone needs to store data, everyoneā€™s needs are slightly different, thatā€™s why thereā€™s so many options.

1 Like

Itā€™s worth noting in Linusā€™s case, if he had used TruNAS and set up email alerts, he wouldnā€™t be in this situation. The default TruNAS install performs a scrub every 35 days. He and/or his team set up those NASā€™s manually and skipped some essential steps (no scrub or alerts).

2 Likes

Exactly the channels/articles you have been following so far, but you need to understand that for reliability you have to pay in some way ā€¦ it can be your time and your hardware, someone elseā€™s time and your hardware, and every combination you can think of you/someone else/your hardware/a storage provider

The more you try to do yourself the more the onus of reliability falls on your knowledge and experience, and also on how much you are potentially willing to lose ā€¦
If your purpose is to learn how to make systems that run at home reliable and not have issues like the guys at LTT, you need to read about systems/networking/storage and get familiar with procuring hardware and balancing perfromance/consumption/spend/ease of use
You will soon find out that the golden goose doesnā€™t exist, otherwise weā€™d all be having it , and that thereā€™s some content that is not on youtube, not because itā€™s a secret and people want to make money out of it, but because itā€™s complex, requires a lot of time and experiments and money ā€¦ and then some. It looks to me like that is not sellable content on modern media channels, as opposed to blowing up/zapping stuff, assembling splendidly looking pCs, doing cryo overclocking and mad scientist stuff ā€¦

1 Like

I think for me I am looking for a back to basics starting point. Like for people who have never really needed raid before and are entering into it new.

My personal experience with raided drives have been ones you buy externally that are already setup and you plug them in the same way you would an external drive.

This is a simple but to the pint primer:

The gist of it is: a RAID array will tolerate some amount of failure as opposed to a single disk solution, the more redundancy/resiliency you choose, the more you will pay in space and cost of additional drives.
RAID is NOT a BACKUP - you will read and hear this repeated everywhere, and it refers to the fact that a RAID setup will make loss of data due to a disk failing less likely, but it will not prevent user error (files deleted, drives formatted) or catastrophic server failure, for that you need backup, and there you will go down another rabbit hole ā€¦

I completely understand raid not being a backup. If the drives themselves werenā€™t so expensive lto tape even at a small scale looks interesting for keeping off site somewhere as the shelf life they advertise sounds really appealing and it sounds safer than keeping a bunch of drives on a shelf somewhere.

What really instigated this topic is that I have just brought a video camera for some hobby stuff. It can shoot uncompressed 4K Raw. You wonā€™t even get 2 hours of video out of 500GB. That is very quickly going to be a problem.

Thatā€™s why, unless you have unlimited budget, everybody establishes a workflow where they try to keep the raw/uncut/unprocessed camera footage for the least amount of time possible, and they only backup the end result ā€¦
It really boils down to:

  • do you need a fast, ssd backed scratch area where you dump your files and work on them with whatever workflow? This will ideally be raid-10 for max perfromance, and supported by something like ZFS snapshots t oavoid ā€˜mishapsā€™ during the post production cycle
  • how big are you post produced artifacts and how long do you need to store them, this second tier can be slower and use some form or RAID5-6 RAIDZ1-2, unraid parity or whatever
  • do you need a backup, can it be local to another storage or remote somewhere else

This will establish your ideal goal in terms of storage and number of appliances requirements.
From there budget and time will be your limit, and your workflow will dictate some of the choices.
If you havenā€™t thought about that and just bought a 4k camera because it was on sale ā€¦ youā€™ll need to decide whether you want to go YOLO and use a bunch of USB drives, establish a filing system, and pray that the drives are still working some years down the line, or whether you need a proper NAS, the cost and number of which will depend on the factors above.
Only you know your workflow, your budget and your expectations in terms of availability and reliability of your data so youā€™ll have to do a lot of homework to rceate what works best for you or, if you happen to have loads of money and not so much time, you can just define your goals and then have a professional deisgn/quote/build it for you ā€¦

I definitely didnā€™t buy anything on sale. This is more developing a new workflow and seeing what is going to be best for it.

Anything Iā€™m working on at a given time will be on my PC. It is the long term on-site storage Iā€™m thinking about.

Iā€™d be quite content with something I can have a server but with expansion options for later. Say start of with so many drives and add more as needed.

I guess what Iā€™m picturing in my head is something I can have attached to a network whether in a rack or Nuc or even in a pc case attached to some kind of drive bay that gives me room to add drives as time goes on.

That is your first requirement, a NAS as opposed to direct attached storage.
Now, do you want your NAS to be relatively cheap, expandable and fast, and can you compromise on space used and power consumption? If so, going with an older generation rack server from dell or HP will give you capacity, room to grow, and relatively low prices, but you will pay with noise and space occupied ā€¦

I know some people like to work on their files straight from their nas but my plan is to move what ever Iā€™m working on from the nas to my PC and then return it when finished.

Obviously I donā€™t want to wait forever to save files to it but it doesnā€™t need to be able to constant read write all the time. I just want it to be storage.

That is fine, and it means that you can do with a NAS that only supports 1Gbps as opposed to 10Gbps networking, that will save you a lot of money.
Will you be needing snapshots (i.e. point in time saves on your nas that will cover you from accidentally deleting files)?
Do you want an off the shelf product or do you want to go DIY?

Snapshots sound handy. As for off the shelf it diy. Iā€™m happy to go diy. But if off the is going to be that much better Iā€™ll look into that.

For now Iā€™m trying to learn what my options are and what I am going to wind up needing to spend.

Off the shelf will give you a solution that works from day 1, you usually pay with higher price for same feature set, less options for expandability (unless you add $$$ ), vendor lock-in and less options in terms of what software you can run

DIY gives back all that - flexibility, lower cost, you decide the pace you want to grow at and you have better options for incremental upgrades, allowing you to spend money in stages.

You pay with your time, all the responsibility of making it work on you and, usually a less ā€˜ergonomicā€™ experience as youā€™ll inevitably end up making some mistakes/having to replace hardware and all the stress that comes with that.

So, if all you want is a quick and dirty solution that will allow you to focus on shooting film and working on your workflow, something like a DS720+ (DS720+ | Synology Inc.) will be more than enough to get you started on a relatively low budget and still leave you room for expanding lated with a DX517 unit.

An equivalent model from QNAP would be the TS-253D
You will get your initial nas for ~600USD (plus storage) and that will give you 10-20TB of raid-1 capability to get you started, with all bells and whistles of the Synlogy/QNAP Oses/interfaces

For the same amount of money you will struggle to find an equivalent DYI solution that gives you new hardware, the same small form factor , networking and expandability, but youwill have plenty of options if going older generation ā€¦

for example the same amount of money would easily net you a Microserver Gen8 with a 4 core xeon, 16GB of RAM, 4 drive bays and IPMI and a full pcie slot where you could either add 10Gb networking or an HBA for an external disk shelf when you need more space ā€¦ or an old supermicro 1u rack server with a xeon that will be much more powerful but suck 150+watts of power at idle ā€¦

Thank you very much for your clear and understandable posts! I really appreciate it.

1 Like

Tons of video, i assume youā€™d like to see more than 100MB/s streaming performance.

Iā€™m guessing youā€™ll need about 50 - 100T of space and 10Gbps network.

Iā€™d consider:

  • a local machine with snapshots + off-site backup
  • two local machines with snapshots + off-site backups.

ā€¦ so either 2 or 3 machines.

In terms of drives - 8x 18T drives in each machine, configured as ZFS raidz2. Machines can run TrueNAS scale as an appliance distro.

Cost wise thatā€™s about:

  • 16 drives * $300 per drive == $4800
  • 2 cheap Ryzen hosts with 10Gbps nics ā€“ ballpark about $500 per machine.

So about $6000 for about 60TB of space - or $100 / terabyte, (because youā€™ll want to upgrade before youā€™re full) which is about 5x the cost of raw drive bytes.


If you need about 10T, and or if youā€™re fine with 1Gbps (100MB/s) speedsā€¦

youā€™d could buy 4x 18T USB 3.0 drives. Buy a pair of Odroid n2+ for $150 each, and run OMV (board + emmc os storage + case + goodies)

So thatā€™s $1500 for 10-ish or $150 / terabyteā€¦ almost 10x raw byte cost.


How about this:

put raw bytes into your desktop you use for editing video (cheap can even do raid0).

Backup to external USB connected disks + pay Acronis or back blaze for off-site backup storage.

ā€¦ not sure how much this would cost, probably cheapest optionā€¦ relies on having decent internet upload

then why is LTO too expensive for you? If you are going to take the added step of transfering the storage from a warm or cold storage area to a hot storage area, work on the data, then save it off to the warm/cold storage, then LTO is going to be the cheapest cost in the long run. Yes, the drives are more expensive upfront, but spending 20 - 80 USD for a 1.2TiB to a 12TiB (compressed) tape is cheaper than spending 140USD, every time you need to add another 8 TiB. LTO also holds up better as a long term backup solution.

With that said, you really need to understand what your workflow is and what your requirements are. That will help as you decypher all of the storage information and philosophies out there.

I have been doing some reading about truenas and I must admit I feel a lot more confident in understanding how things work.