Return to Level1Techs.com

How do you configure 100TB+ ZFS arrays?


#1

Hello everyone,

I have a friend who owns a small video editing company.

He has about 70TB of data stored with no redundancy and no backup.

He asked me to help build a FreeNAS box to edit on and maybe store all of the data.

How in hell do you configure a single pool of 100TB usable with redundancy? How do you back up such a pool?

The first thing that came to my mind was a stripe of mirrors. Lots of mirrors. Great for editing performance. But then again, mirrors do have a very soft spot in both drives of a mirror failing. And there is such a thing called Murphy’s Law.

On the other hand, RAIDZx takes forever to resilver and stresses all of the drives. Obviously, a single RAIDZx vdev would not suffice. We are talking about stripes of RAIDZx vdevs. And this is a pool that you need a small fortune to expand, if you have to replace a whole vdev.

Also, how do you back up all of this data?

What do you think? Any experience with such large pools?


#2

Not sure if possible, but why not set up two equall machines where both hold all the data? Basically a basic cluster (using Gluster or similar).


#3

This is not a small task. You have just described a need for an $18,000+ system from 45Drives.

Hardware and Points of Failure
You will need 12TB hard drives to make this happen. You might be able to save some scratch by buying consumer grade hard drives and solid states (some would criticize this decision, I’m pretty luke-warm on it).

But if you’re going to do this all in one machine, you’re going to need 24x 12TB hard drives, and that will get you ~103TB of usable space in a pool of mirrors configuration.

Pool of Mirrors
In configuring a server on 45Drives’ site, I selected two different manufacturers of hard drives. You will want your mirrors to consist of one drive from both manufacturers.

Don’t Forget the Backplane
The next single point of failure is the backplane. I’ve not used 45Drives before, so I don’t know how their backplanes are configured. I’ve setup servers where all drives from one manufacturer go on one backplane, and the other manufacturer on the other backplane. For this amount of data, you want to consider this.

What About Striped RAIDZ?
You may be able to get good performance by together by striping together 8x 3-disk RAIDZ vdevs, and that would net you more space, assuming the same hardware. But rebuilding RAIDZ* is harder on everything involved in the rebuild process, since RAIDZ* is mathematically more complicated than mirroring. It also wouldn’t provide you with any additional redundancy should something go wrong, and guarding against backplane failure is also more complicated, if not just simply out of the question.

What About Striped RAIDZ2?
Along those same lines, a set of striped RAIDZ2 vdevs would provide you with roughly 4x write performance, as compared to the striped RAIDZ of 8x write performance, and the pool of mirrors’ 12x write performance. You are adding an additional layer of redundancy, but similar to the striped RAIDZ setup, you’ve knocked out the ability to guard against backplane failure.

Can I Break Up The Data?
The next immediate thought is that you might be able to do something by breaking up the data into actively used vs archival/rarely used. Lord knows that was my first thought. But alas, it will not do. Actively used data will become rarely used data, and so your rarely used pool would need to be configured assuming that all of that actively used data will be coming over sooner rather than later. Your rarely used pool would offer literally the same conundrum that you’ve got right now.

Backups
Now how do you backup such a pool? Potential an identical server with more aggressive compression? I know that video files in their final form are already encoded and compressed to all hell and back, but my understanding is that the pre-edited, raw footage is often more receptive to compression. But I could very well be wrong on that.

In the case of backups, you may be able to gain some ground by separating the data. If they can come to you with what is absolutely critical for them to backup vs data that they can live without. My thought here, not knowing the industry very well, would be that videos that are completed may not need the raw footage backed up, as long as they can get to the completed product.

Useful Tool
https://wintelguy.com/zfs-calc.pl

Use this when designing your solution. Make sure to check the 20% Free Space limit check box. Because I can tell you from experience, when a zpool hits 80% full, performance (reads and writes) tanks.


#4

Imo, a box of mirrors backing up to a box of raidz2s (6 drives each) + hot spare is the way to go.

45 drives is good. They make a 60-bay option as well. Both are loud. If there’s no dedicated server closet, get the 30-bay lite version without dual power supplies.

Ideally, you should also have additional backups:

Live offsite
Offline onsite
Offline offsite


#5

Everything here is really really good advice. But one thing I really want to point out is this,

I wouldn’t even use the new server as a hot editing station. It should only be used for archiving data. Have the workstation have a few large space drives to do the editing on then when finished with the work dump it into the archive.

Just my two cents.


#6

This is an excellent point. Moving video files from the server, to an editing work station does create a point of failure where work can be lost (e.g. someone worked on a video for a few hours and then their hard drive blew up. The original video is still on the server, but their work is lost), but it does greatly simplify the task of setting up a file server. It is extremely hard to design a file server to be performant, and spacious. Especially if it needs to be performant for multiple people.


#7

Small note here: if it’s a Mac shop, a network time machine backup will solve this problem if they’re working on projects locally.


#8

Such an informative post! A crude first thought was actually using a used Dell R520/R720/R720XD with one or two Netapp DS4246’s and WD 8TB Reds, kind of similar to the original Level1techs FreeNAS box. The 8TB is just vfm.

That though relies on humans doing what they are supposed to. Also, friend insists on editing videos on the NAS for some reason. It also means more points of failure. A stripe of 15 mirrors seems to offer way more performance, even for 5 editors (failed to mention the 5 editors in original post). My main concern is backing the thing up.

Another possibility would be a smaller faster pool (maybe flash based) and a large, redundancy and capacity oriented one. That is no substitute for backups though and seems unnecessary. You might as well get the performance from the original big stripe instead of an expensive flash based pool and back that up to a fewer drive, RAIDZx based pool.

It is actually all Mac and Hackintosh based because editors use FInal Cut.


#11

I would say having multiple FreeNAS boxes setup in a cluster or two 45drives boxes may be what you want to go with for business data. Additionally, I’d highly recommend some form of offsite backup as well, but I’m guessing you could do something like reconfig your buddies current hardware setup once all the data is moved to the new system.

Also, backups!


#12

Just to reiterate something that I said in another post but not here. Redundancy is not a backup solution. Remember the 3-2-1 storage method.

3 copies of your data

2 different storage solutions

1 offsite backup


#13

If noise isn’t an issue and budget is, this is a good deal: