Aversion to RAID - what is the reality and what are other solutions?

Hey!
In the past few months I am getting a lot of people who have quite a strong aversion to RAID in real-world use to which I have no response just one awkward silent shrug.

They claim that RAID is “unreliable” and “unmaintainable”. What is the reality? Is there any other option to RAID?

You could use distributed file systems that just spread data to more machines (=drives). Basically you have all your machines set up in a network, then slap some kind of virtual file system on top.

HDFS is one example. It is a super interesting thing to read about and tinker with, but makes 0 sense to use for a “normal” enthusiast or small data center.

Another example (and something I want to try out) is BeeGFS. Just like HDFS, it makes no sense to use at “home” scale.

I think the most sane way to do insane file storage is GlusterFS.

And there is of course ZFS, but we do not talk completly sane here!


In a way, when your hardware raid controller kicks the bucket, you are done.
Software raids come with other problems but are much easier to recover in case something happens.

What a crock of crap.

First, you must have the mindset that RAID is not a backup solution. Say it with me… “RAID is not a backup, it is a data storage solution.” You have to think of RAID as one entire drive with a filesystem on top of it. Once you keep this mindset of “This RAID array is just one disk” then you are going in the right direction. With RAID you have some protections with your data because of the redundancy and parity within the array itself. RAID is a data storage solution that helps expand drive space and increase R/W performance on the drives.

Secondly, there are many many different ways to implement RAID. If you are going to have hardware RAID then there are a few more risks. For example if your RAID controller dies it can be very difficult to restore the lost data. You need the exact same model of controller with the same firmware in order to do so but some cheaper RAID cards don’t even have this feature. If you go software RAID (mdadm, zfs, glusterfs, etc etc) then you have more flexibility in terms of failures or errors. I personally use software raid in my lab with a few RAID50 arrays and I get very good performance out of my drives. Professionally I have used both depending on the use cases.

Here is the important part though… all of the data should be backed up remotely! I have an offsite backup solution for all my stuff and a know a lot of workshops that have remote or tape drive backups. If they don’t have a backup then they don’t have a “panic button” plan.

RAID is great. Backups are even better. RAID is not a backup. RAID is a data storage solution.

5 Likes

Well the only way to store data reliably is to store multiple copies of it, on independent devices, preferably located far apart.
You need to decide what do you need: high availability, strong protection against permanent data loss or both. Usually having both requires multiple solutions.

ZFS is pretty good for data integrity purposes, especially if you forgo RAIDZ and go for straight mirrored zpools. Does not even need much RAM if you disable deduplication.

There are also a bunch of clustered file systems like Lustre, Ceph, GlusterFS, DRBD to name a few of the top of my head. They can achieve high availability if configured properly, with multiple head nodes.

GhostMech’s input is solid. I concur.

RAID is incredibly useful. What’s garbage is hardware RAID controllers, and on-board RAID implimentations.

Software RAID is flexible, well-supported, allows moving arrays between systems, provides performance benefits, and offers additional fault tolerance.

Distributed storage is great… if you have a phenomenal network dedicated to storage, and a ton of servers. Very useful when you’ve got petabytes of storage and multiple datacenters. Adds way too much management overhead to be of real use to average consumers, and even enthusiasts. If you’ve got a multi-system homelab, it’s worth playing around with. But it’s not a replacement for RAID or backups. It’s primary use is scalability.

Ceph is nice. The architecture is clear (basically microservices) crush maps are easy to understand. But for small amounts of storage e.g. <10 machines or <20 disks stick to ZFS. Ceph really likes racks and racks of stuff, that’s where it shines.

1 Like

most people that will claim raid is unreliable because they have little or no experience in its set up
raid arrays are just the opposite they are quite reliable and depending on the choice can be set up for redundancy, speed, and error correction.
If anything raid arrays are more versatile
given that some systems can be set up with raid or without it up to the user what they want.
performance wise raid definitely with frequent backups off site.