Return to Level1Techs.com

Building a 500TB Server - Distributed vs Native

I’ve talked about this before in various posts, but the lab I work in at a university was recently awarded a very large grant for a brand new Cryo-Electron Microscope. A single data collection of a single sample can exceed multiple terabytes, and so one of the things we’re considering is a unified storage system, somewhere in the 500TB range.

My first instinct would be to do some combination a Dell server, with ZFS and RAID 6, (ZRAID 3?) and pipe in a bunch of disk shelfs to that, through a hardware RAID card (H700 or something). From what I understand, I can add disks to the ZFS pool without rebuilding, and I would have double parity across the system. (With the number of drives, I want more than single parity).

However, other people running these microscopes have used chunkservers with software such as Gluster, or MooseFS. I’ve never really worked much with distributed filesystems and I’m nervous about the reputation they have for being unstable, and generally not well supported. It also seems like they are paying more for redundant hardware that isn’t needed (many labs do things like buy off the shelf supermicro servers and just stack them together with MooseFS). I’m also concerned about the labs ability to maintain this system once I graduate, or drop out (lol).

Would love to hear thoughts on this.

1 Like

Do you have to build it yourself, or rather, is that what you’re looking for is the experience of building something?

500TB isn’t really that special anymore, off the shelf hardware can get you there no problem.

A Synology or QNAP NAS will easily handle this volume, and then connect whatever servers and workstations you want to it for accessing the data.

Start with a QNAP ES1640dc v2 and several EJ1600 v2 disk shelves. If your grant is really big, buy two of everything and setup replication. Put one in your lab and the other across campus.

I think you need to consider redundancy, budget, and future growth before making a decision.

1 Like

Qnap uses mdraid which you don’t want handling 500tb. I have had sudden unexplained data loss on a qnap. It’s not fun. Synology is better if you want a platform with a friendlier learning curve.

It is possible to stack gluster on top of ZFS, although I have not done it.

You can add Identical zraids or mirrors to a pool. You cannot add disks to a zraid.

Don’t mix ZFS and hardware raid.

2 Likes

What about ceph?

QNAP uses ZFS (FreeBSD) on their enterprise grade gear…

https://www.qnap.com/en-us/product/es1640dc%20v2

Something else to consider, since you’re at a Univeristy, is using I2 for redundancy. Other labs might offer to warehouse your data if your director agrees to a MOU about sharing it…

https://www.45drives.com

https://www.ixsystems.com/storage/

or

Not all of it. I inherited a 3u qnap with redundant power supplies and 2 disk shelves and it was all mdraid.

1 Like