Homelab storage and backup systems RAID or no

I am in the process of upgrading my homelab consisting of a web system, a storage system and an additional backup system that backs up the two, and stumbled upon a dilema …
Currently my storage system has 3 disks, with no raid … 1 SSD for boot, 1 SSD for storage that i need fast access to and one HDD for slower storage. All this is backed up by a separate system that also has 1 SSD for boot and 1 larger HDD for the backups. I’ve run this setup for quite some time and nothing failed so far.
Lately I’m starting to think if is my data really safe this way, or is it a ticking time bomb …

I’m looking for advices if I should consider adding redundant disks using RAID and if so, should I add it to both systems (storage + backup), just on the backup one, just on the storage one? Uptime is not necesarily my main priority, but not loosing the data or it’s integrity is important to me.

Integrity to me jumps out as the key word here. Are you currently saving the data on some sort of checksummed filesystem? (examples: ZFS, BtrFS, or ReFS).

1 Like

Ask yourself the question how comfortable would you be in case any of the drives fail? How about two?

What if any of the systems would develop an issue that takes them offline for an extended amount of time (think fried mobo, contracted virus, etc.)?

How about the user error scenario: an accidentally deleted file (or folder) that you need to get back?

If you feel comfortable that you can recover from any of these scenarios, you’re good to go.

Otherwise, there are different solutions to any of these issues, but all of them have in common that they add redundancy to existing storage.

Both systems use regular ext4 in the current configuration.

I would suggest reading up on checksummed file systems. Wendell has made good videos about this, so search youtube some. I would think about transitioning to one (I’m a ZFS fanboy, but BtrFS is fine too I guess). Without such file systems you’re leaving yourself vulnerable to bit-rot and other silent error corruption. If not losing your data is the most important thing, that’s the first change I would make.

1 Like

The point of raid is to minimize the down time. Ideally, it allow you to replace the faulty disk while keeping the system online.

For integrity, you don’t need raid. You need some file system such as ZFS that validates checksum all the time even just with a single disk.

1 Like

I meant to say this as well. Ultimately that’s what hardware redundancy is always for, keeping the plates spinning when stuff goes pear shaped.

The reason the checksums are important is that at the moment if a couple bits flipped on your primary server on data you’re not accessing actively, you wouldn’t know. If you didn’t catch it before the next backup that could get pushed onto your backup copy as well without you realizing. With a checksummed filesystem, if the data read doesn’t match the checksum it will flag the data as corrupted. Then you can restore from backup and continue on your merry way.

There’s a lot more detail and edges cases to consider, but that’s the gist of why you want that.

An extremely valid point. Haven’t thought of this in this manner. Thank you.

Happy to help. You said in your first post you were worried you might have a ticking time bomb, and you do a little, but not for the reasons you initially thought. :slight_smile:

Good luck on your descent into the madness that is data integrity. :rofl:

(j/k, but this is definitely where you’re going to start to diverge from the normal IT folks. :stuck_out_tongue: )

Another thing to add is that even with a check-summing system like ZFS, unless scrubs are performed regularly, unrecoverable bitrot can set in with no mitigation from ZFS.

It seems like many ZFS users are either misinformed (probably less so on this forum) that ZFS will handle their file integrity automatically and don’t scrub or don’t scrub because their pools are too fragmented and ETAs stretch into weeks making them untenable.

1 Like