I’ve talked about this before in various posts, but the lab I work in at a university was recently awarded a very large grant for a brand new Cryo-Electron Microscope. A single data collection of a single sample can exceed multiple terabytes, and so one of the things we’re considering is a unified storage system, somewhere in the 500TB range.
My first instinct would be to do some combination a Dell server, with ZFS and RAID 6, (ZRAID 3?) and pipe in a bunch of disk shelfs to that, through a hardware RAID card (H700 or something). From what I understand, I can add disks to the ZFS pool without rebuilding, and I would have double parity across the system. (With the number of drives, I want more than single parity).
However, other people running these microscopes have used chunkservers with software such as Gluster, or MooseFS. I’ve never really worked much with distributed filesystems and I’m nervous about the reputation they have for being unstable, and generally not well supported. It also seems like they are paying more for redundant hardware that isn’t needed (many labs do things like buy off the shelf supermicro servers and just stack them together with MooseFS). I’m also concerned about the labs ability to maintain this system once I graduate, or drop out (lol).
Would love to hear thoughts on this.