I have 3x3TB WD red drives in RAID 5 and I've been reading about raid 5 and it seems like that might have been a terrible idea. The machine has been online for over a year and the array is working fine but I'm still slightly paranoid and have been contemplating upgrading to RAID 6. I'm looking to get advice from people more knowledgeable than myself. Is my current setup just asking for trouble? Is switching array types more trouble than just leaving it be? The data on the machine is DVD and Blu-ray rips. It's just a media server but it would take a lot of time to manually recover if the array died. I know BTRFS RAID 10 would probably be the ideal setup but I don't have a practical way to ditch MD at this point as I don't have any backups.
RAID 5 is fine, I had ~8 servers at my old job running that and they were all fine, but you need to remember that you can only lose 1 disk. Moving to raid 6 wouldn't be worth the time investment.
My other servers used RAID 15 and I became a big fan of that but it requires 5 disks minimum. The increases in R/W and redundancy would be worth moving to this array.
And as always, RAID isn't a backup. Backup your stuff regardless.
Edit: You can also find a video from a while back where Wendell explains the negative aspects of RAID and how it's mostly antiquated these days, but I disagree with that video for the most part. It's still worth a watch though.
Are you using software or hardware raid? Specifically which one?
I would recommend raid 6 if you have the extra money. like @Yockanookany said, raid isn't a backup, but the theory goes as follows:
Your raid 5 system works well for a while. Then 2 years down the line, you've got a dead drive. No problem, you replace it and the array starts to recompute parity. All is going well until you hit about 80% when another drive fails. Now the array can't be repaired.
This happens because when an array rebuilds parity, it stresses the disks and can push another drive (especially if it's in the same batch, read similar serial numbers) over the edge and kill it.
I rebuilt two arrays in RAID 5 and can tell you it's never a relaxing experience, but that's why you do proper backups. If it all goes to pot, you replace the drives and recover. Sure you have downtime, but it's better than data loss.
Unlike sgtawesome I won't recommend RAID 6 at all, again pointing towards RAID 15, but RAID 6 is still better than 5 (also almost all controllers support 6. Getting RAID 15 on some controllers is a PITA). It has more disk cost but you can lose two disks as well so you can sleep a bit better.
It's also great to have a hot spare on your array as well. That's how both of my servers rebuilt but it's still spooky to get that e-mail "RAID Disk Failure."
True. I guess I'm all caught up in datacenter levels of reliability.
Lately I've been doing a lot of work with SAN engineering and I've been thinking that RAID has no place anymore.
With things like OpenStack, you've got Ceph which is so resilient to drive failures that my boss sometimes goes through the datacenter and pulls a random drive from the active cluster, and if you don't need high availability, you've got unRAID where if a drive fails, you only ever lose that drives data, just restore from backup.
I don't believe in hot spares. I've always kept them cold. Call it paranoia.
I noted in my original post that it's MD software raid. I also don't have unRAID and I'm not sure how I feel about it since it's proprietary and costs money but at the same time it does have some nice features.
Yeah I know raid isn't a backup but I have no place to store a back up :/.
Damn, how'd I miss that.
also don't have unRAID and I'm not sure how I feel about it since it's proprietary and costs money but at the same time it does have some nice features.
It's nice. If the GPU passthrough was a bit more robust, I would be using it on my primary workstation. I get the issues with the proprietary aspect, but the other option is to check out SnapRAID. It's the open source variant. (in fact, unRAID may be based off this)
If you can spare $60/year check out crashplan. Unlimited offsite backup, Windows, Mac, Linux.
Raid 5-esq technologies are considered non-recommendable for array's using larger than 2TB disks. The rebuild times of disks in higher than that capacity means that the chance of a second drive dying during a rebuild is very likely. Raid 6 is considered the standard for larger than 2TB disks right now, as it posses multiple advantages. The added minimum drive in a raid 6 means that during a rebuild there are at least 3 drives that can spread the load, reducing the chance a second drive dies. In the case of a second drive dying, the remaining drives still have enough data to rebuild the entire array. This is why raid 6 is now the go-to, and will be until capacities reach a point where triple drive redundancy is required.
Traditional Raid as a technology is also not preferable anymore either. It provides no protections against bit-rot, which can cause data corruption and loss. This is why there has been a movement towards technologies like BTRFS and ZFS, which both provide protections against bit-rot. Currently ZFS is the more matured of the two, with some known kinks still prevalent in BTRFS. Both technologies can loosely be thought of as ECC protection, but for hard drives and ssds. ZFS currently has fully functional Raid 5, 6, and 7 equivalents. Same rules, ZFS Z1 is = Raid 5, ZFS raid Z2 = Raid 6, and etc;.
Eh I missed that. I'm not a fan of software raids unless they're getting to the filesystem RAID systems like ZFS. I'm not sure if you'd get any real benefit from moving from raid 5 to 6 in software, but I'm not familiar enough with software raids to be a true spokesman about this though.
That said @thecaveman information is spot on and he's right about RAID 6 being the best available. The only reason I push RAID 15 is due to the large performance increase and redundancy. All disks I used with it were 1tb so I never ran into drive size limitations but I don't believe that's an issue in 15. Maybe he can inform me on that even though I don't deal with that directly anymore.
The implication that traditional raid is dying is absolutely true, but I'm still going to disagree on small scale integration. Any business that uses a baremetal server to host few VM's or use the basemetal itself as the host is probably not going to have the resources to push ZFS out (I didn't, I looked into it deeply), or have the staff to recover from it as easily or quickly. ZFS is great, though.
@SgtAwesomesauce I had a 15 server environment and managed a 100 employee medical facility solo. That's a far cry from your line of work (and still is what I do now), so you're completely right about speaking out about what I said based on the information I know about datacenters (which isn't a ton.) That said, I kept hot spares because it was easy, basically. I don't know about openstack. I'll have to look into that one.
With MD, you're better off than with a Perc card, in my opinion. That said, I've had too many problems with those cards going bad and eating all the drives. That said, I'm going to say go with what you're comfortable with. MD raid is usually just easier to recover.
Yeah, I'm dealing with 1.5PB of storage in our Ceph cluster alone. Don't get me started on the bare metal that's spread throughout the DC randomly. We're going to be deploying SSD nodes soon to increase our storage to 2PB.
Small scale may be different. My scale is either my lab (single proxmox server and a NAS) or datacenter. I don't really know best practices for small scale. As far as ZFS goes, people recommending 1GB for every TB is total BS. I usually just recommend 2-4GB of ram for it. Anything below that is tough on ZFS.
Hot spares aren't a problem, I'm just paranoid. (also a bit crazy, but who isn't?)
OpenStack is kinda like AWS or RackSpace. It's for building a private cloud.
RAID 5 is OK for home use but I still wouldn't recommend it even with smaller disks, especially when another disk is $100ish dollars. Doesn't make sense to me to risk important data over $100 or so. Additionally in the home environment you are less likely to notice a failed drive right away, unless you have alerts or audible alarms.
The problem with RAID 5 is that during the time your first disk fails and the time you replace it and rebuild the array, you can lose another disk. At this point you would have data loss. This is even more possible if you bought all your disks at the same time and they were manufactured one after another.
RAID 6 resolves this issue with two parity disks. You can lose one disk, replace/rebuild, lose another disk during rebuild, and still be OK.
In the Enterprise world RAID 5 is considered dead.
That's the logic behind backup. Spend the money to get that data offsite. Not that people listen to that either.
People have started to even report second disk losses in arrays with 6TB and larger devices. Crazy.
This is why there has been a movement towards distributed spare (allowing many-to-many rebuild) and T10-PI (block-level checksums). FTFY =)
Ok. I'm aware of the advantages of BTRFS but I have no way of switching to it. I'm aware it has no stable raid 5 or 6 but BTRFS raid 10 works fine and would work for me but I have no easy or practical way of switching to it.
Hmmmm ok. I'll have to take a look at that. $60/year isn't bad.
Definitely worth considering if your going to be switching raid technologies anyways. You can't just change a raid 5 to a raid 6 after all. The data has to come off the drives for them to be switched to a different raid level or technology.
Actually not with MD. From the mdadm man page
Grow: Grow (or shrink) an array, or otherwise reshape it in some way. Currently supported growth options including changing the active size of component devices and changing the number of active
devices in Linear and RAID levels 0/1/4/5/6, changing the RAID level between 0, 1, 5, and 6, and between 0 and 10, changing the chunk size and layout for RAID 0,4,5,6,10 as well as adding or
removing a write-intent bitmap.
Did not know that was a thing, I stand corrected. Well in that case, yeah go straight for the Raid 6 upgrade.
That's impressive. Would back up the data somehow though. That's screams to me of failure.