I did a search and got very confused. @wendell knows his stuff on this topic but so much of what I have watched goes down a road that is unrelated to what I am interested in setting up for myself.
After watching the recent Linus Tech Tips video where their servers lost a bunch of data again, it got me thinking again.
If someone wanted to setup a reliable, expandable, network storage setup where its only role is to store files onsite long term what would be a good thing to read about this stuff
It doesnāt have to be something that you work off of it just has to reliably store files that you arenāt currently working on?
Also Where can I read about ways to reliably backup off site something like this without it costing an arm and a leg?
I donāt need it for vmās or anything like that any time soon, purely file storage.
āOptimalā approach varies by individual needs, this is why thereās a gazillion threads and ways of doing stuff, and this is why things are confusing.
The main things to understand are that raid isnāt a backup, and backups are intended to be a copy of your data youāre willing to lose, because you still have your main data, and if you donāt want to pay for some kind of subscription service you have to fall back to relying on your own knowledge and diligence.
Options for storage Iām aware of:
btrfs for small mirrored raid.
mergerfs + snapraid for data hoarding
zfs for large raid (often TrueNAS scale)
ceph on baremetal with one osd per disk for small cloud setups
ceph in rook for medium cloud setups (medium= linode, hetzner, digital ocean)
pay Amazon, Microsoft, Google and have their people run your own buildings air gapped from their own cloud (e.g. if youāre a government of some kind) for single digit billions per year
do an Apple and negotiate to pay some hyperscaler double digit billions per year at a discount to store icloud data
Then thereās LizardFS, and other lesser known storage options.
Then thereās various Acronis, BackBlaze, Tar Snap and friends that offer cloud storage.
Then thereās rsync, rclone, restic, duplicity, duplicati, bup, borgbackup,ā¦used either with your disks, or some cloud storage.
Then thereās StorJ and various other space exchanges with random people.
Then thereās the option of shipping an 18T USB HDD + NanoPi to a friend.
ā
Everyone needs to store data, everyoneās needs are slightly different, thatās why thereās so many options.
Itās worth noting in Linusās case, if he had used TruNAS and set up email alerts, he wouldnāt be in this situation. The default TruNAS install performs a scrub every 35 days. He and/or his team set up those NASās manually and skipped some essential steps (no scrub or alerts).
Exactly the channels/articles you have been following so far, but you need to understand that for reliability you have to pay in some way ā¦ it can be your time and your hardware, someone elseās time and your hardware, and every combination you can think of you/someone else/your hardware/a storage provider
The more you try to do yourself the more the onus of reliability falls on your knowledge and experience, and also on how much you are potentially willing to lose ā¦
If your purpose is to learn how to make systems that run at home reliable and not have issues like the guys at LTT, you need to read about systems/networking/storage and get familiar with procuring hardware and balancing perfromance/consumption/spend/ease of use
You will soon find out that the golden goose doesnāt exist, otherwise weād all be having it , and that thereās some content that is not on youtube, not because itās a secret and people want to make money out of it, but because itās complex, requires a lot of time and experiments and money ā¦ and then some. It looks to me like that is not sellable content on modern media channels, as opposed to blowing up/zapping stuff, assembling splendidly looking pCs, doing cryo overclocking and mad scientist stuff ā¦
I think for me I am looking for a back to basics starting point. Like for people who have never really needed raid before and are entering into it new.
My personal experience with raided drives have been ones you buy externally that are already setup and you plug them in the same way you would an external drive.
The gist of it is: a RAID array will tolerate some amount of failure as opposed to a single disk solution, the more redundancy/resiliency you choose, the more you will pay in space and cost of additional drives.
RAID is NOT a BACKUP - you will read and hear this repeated everywhere, and it refers to the fact that a RAID setup will make loss of data due to a disk failing less likely, but it will not prevent user error (files deleted, drives formatted) or catastrophic server failure, for that you need backup, and there you will go down another rabbit hole ā¦
I completely understand raid not being a backup. If the drives themselves werenāt so expensive lto tape even at a small scale looks interesting for keeping off site somewhere as the shelf life they advertise sounds really appealing and it sounds safer than keeping a bunch of drives on a shelf somewhere.
What really instigated this topic is that I have just brought a video camera for some hobby stuff. It can shoot uncompressed 4K Raw. You wonāt even get 2 hours of video out of 500GB. That is very quickly going to be a problem.
Thatās why, unless you have unlimited budget, everybody establishes a workflow where they try to keep the raw/uncut/unprocessed camera footage for the least amount of time possible, and they only backup the end result ā¦
It really boils down to:
do you need a fast, ssd backed scratch area where you dump your files and work on them with whatever workflow? This will ideally be raid-10 for max perfromance, and supported by something like ZFS snapshots t oavoid āmishapsā during the post production cycle
how big are you post produced artifacts and how long do you need to store them, this second tier can be slower and use some form or RAID5-6 RAIDZ1-2, unraid parity or whatever
do you need a backup, can it be local to another storage or remote somewhere else
This will establish your ideal goal in terms of storage and number of appliances requirements.
From there budget and time will be your limit, and your workflow will dictate some of the choices.
If you havenāt thought about that and just bought a 4k camera because it was on sale ā¦ youāll need to decide whether you want to go YOLO and use a bunch of USB drives, establish a filing system, and pray that the drives are still working some years down the line, or whether you need a proper NAS, the cost and number of which will depend on the factors above.
Only you know your workflow, your budget and your expectations in terms of availability and reliability of your data so youāll have to do a lot of homework to rceate what works best for you or, if you happen to have loads of money and not so much time, you can just define your goals and then have a professional deisgn/quote/build it for you ā¦
I definitely didnāt buy anything on sale. This is more developing a new workflow and seeing what is going to be best for it.
Anything Iām working on at a given time will be on my PC. It is the long term on-site storage Iām thinking about.
Iād be quite content with something I can have a server but with expansion options for later. Say start of with so many drives and add more as needed.
I guess what Iām picturing in my head is something I can have attached to a network whether in a rack or Nuc or even in a pc case attached to some kind of drive bay that gives me room to add drives as time goes on.
That is your first requirement, a NAS as opposed to direct attached storage.
Now, do you want your NAS to be relatively cheap, expandable and fast, and can you compromise on space used and power consumption? If so, going with an older generation rack server from dell or HP will give you capacity, room to grow, and relatively low prices, but you will pay with noise and space occupied ā¦
I know some people like to work on their files straight from their nas but my plan is to move what ever Iām working on from the nas to my PC and then return it when finished.
Obviously I donāt want to wait forever to save files to it but it doesnāt need to be able to constant read write all the time. I just want it to be storage.
That is fine, and it means that you can do with a NAS that only supports 1Gbps as opposed to 10Gbps networking, that will save you a lot of money.
Will you be needing snapshots (i.e. point in time saves on your nas that will cover you from accidentally deleting files)?
Do you want an off the shelf product or do you want to go DIY?
Off the shelf will give you a solution that works from day 1, you usually pay with higher price for same feature set, less options for expandability (unless you add $$$ ), vendor lock-in and less options in terms of what software you can run
DIY gives back all that - flexibility, lower cost, you decide the pace you want to grow at and you have better options for incremental upgrades, allowing you to spend money in stages.
You pay with your time, all the responsibility of making it work on you and, usually a less āergonomicā experience as youāll inevitably end up making some mistakes/having to replace hardware and all the stress that comes with that.
So, if all you want is a quick and dirty solution that will allow you to focus on shooting film and working on your workflow, something like a DS720+ (DS720+ | Synology Inc.) will be more than enough to get you started on a relatively low budget and still leave you room for expanding lated with a DX517 unit.
An equivalent model from QNAP would be the TS-253D
You will get your initial nas for ~600USD (plus storage) and that will give you 10-20TB of raid-1 capability to get you started, with all bells and whistles of the Synlogy/QNAP Oses/interfaces
For the same amount of money you will struggle to find an equivalent DYI solution that gives you new hardware, the same small form factor , networking and expandability, but youwill have plenty of options if going older generation ā¦
for example the same amount of money would easily net you a Microserver Gen8 with a 4 core xeon, 16GB of RAM, 4 drive bays and IPMI and a full pcie slot where you could either add 10Gb networking or an HBA for an external disk shelf when you need more space ā¦ or an old supermicro 1u rack server with a xeon that will be much more powerful but suck 150+watts of power at idle ā¦
Tons of video, i assume youād like to see more than 100MB/s streaming performance.
Iām guessing youāll need about 50 - 100T of space and 10Gbps network.
Iād consider:
a local machine with snapshots + off-site backup
two local machines with snapshots + off-site backups.
ā¦ so either 2 or 3 machines.
In terms of drives - 8x 18T drives in each machine, configured as ZFS raidz2. Machines can run TrueNAS scale as an appliance distro.
Cost wise thatās about:
16 drives * $300 per drive == $4800
2 cheap Ryzen hosts with 10Gbps nics ā ballpark about $500 per machine.
So about $6000 for about 60TB of space - or $100 / terabyte, (because youāll want to upgrade before youāre full) which is about 5x the cost of raw drive bytes.
If you need about 10T, and or if youāre fine with 1Gbps (100MB/s) speedsā¦
youād could buy 4x 18T USB 3.0 drives. Buy a pair of Odroid n2+ for $150 each, and run OMV (board + emmc os storage + case + goodies)
So thatās $1500 for 10-ish or $150 / terabyteā¦ almost 10x raw byte cost.
How about this:
put raw bytes into your desktop you use for editing video (cheap can even do raid0).
Backup to external USB connected disks + pay Acronis or back blaze for off-site backup storage.
ā¦ not sure how much this would cost, probably cheapest optionā¦ relies on having decent internet upload
then why is LTO too expensive for you? If you are going to take the added step of transfering the storage from a warm or cold storage area to a hot storage area, work on the data, then save it off to the warm/cold storage, then LTO is going to be the cheapest cost in the long run. Yes, the drives are more expensive upfront, but spending 20 - 80 USD for a 1.2TiB to a 12TiB (compressed) tape is cheaper than spending 140USD, every time you need to add another 8 TiB. LTO also holds up better as a long term backup solution.
With that said, you really need to understand what your workflow is and what your requirements are. That will help as you decypher all of the storage information and philosophies out there.