BTRFS Raid Confusion

aodix · December 6, 2023, 12:47am

I know this is going to sound very vanilla on this site but I just rebuilt my media server like a week ago and went with Proxmox with qty (6) used 6TB HGST drives in RADIZ2 for 24TB of usable space…I then created an SMB share directly from Proxmox and share it to my VMs. Some folks would pass individual disks or pass an HBA card through to a VM, that would have worked, too, but this was easy for me and gives me some freedom, and the performance losses of SMB are inconsequential for my use case.
As for expansion, I MAY add in another RAIDZ1 array for qty(3) 4TB disks and if I do then I will use MergerFS to combine those two ZFS pools into one virtual pool in Proxmox and then share that MergerFS through SMB. The nice thing there is my clients won’t care - theyll just see that the pool is bigger (or smaller) one day. MergerFS is great.
Anyway probably not what you’re looking for but I thought I’d share what worked for me.

blooper98 · December 6, 2023, 2:13am

I only quoted the devs, I would love it if the project got to the point where they feel their RAID56 implementation is suitable for production.

slidermike · December 6, 2023, 12:47pm

I use btrfs raid 1 on one of my unRaid nvme cache pools for almost two years, zero issues.
It is generally considered best as you said not to choose btrfs for raid 5 but rules are meant to be broken sometimes. There can be use cases where a non-standard design is the preferred option.
Love tech, 10 ways to skin the cat.

SomeDudeInAZ · December 6, 2023, 4:29pm

Sokay. I appreciate it. I keep forgetting about proxmox.

SomeDudeInAZ · December 6, 2023, 5:32pm

So it looks like the ultimate decider here is basically going to be what kind of raid level I’m going to set up.

BTRFS seems to have a “lock” on 0/1/10 where btrfs raid 1 (and thus 10) is interesting enough to play with given how it does raid 1 - where apparently things are mirrored on different drives (??) rather than across all of them?

ZFS / Truenas on the raid 5/6 I’ll need some more research on how it handles expansion. Make the zpools small enough and it gets “easy”. Minimizing the “tax”. But will that let me do what I want later? Not sure. More research is required.

Something tells me that, regardless of what I decide next weekend when I build this, I’ll be revisiting it in the future.

Thanks folks

risk · December 6, 2023, 6:30pm

Wrote a guide here a while back…

… needs some updates, but all the broad strokes stuff should still be fine.

Practice setting it up in a VM, because why not, … and go from there.

I’m auto updating in the background both my NAS and my router, that are both based on that setup, they’re on the testing track, and I’m not going to lie, about once a year it won’t reboot and I need to come in with a flash stick and fix something.

Haven’t lost data with btrfs since … about 2015 iirc.

I also have another jumble of very large drives, running mergerfs, snapraid and samba. It’s a very very surprisingly good working setup for lots of media and archival storage and secondary backups. Adding a drive and rebalancing using rsync of all things is a bizarre experience, … but it works.

How many drives do you have?

SomeDudeInAZ · December 6, 2023, 8:23pm

Thanks. I’ll check out the guide - didn’t even think to look there tbh.

At the moment 4 drives. I might just say f it and get some more – got some accounting to do first. Which (of course) will also dictate what I do.

Marten · December 6, 2023, 11:17pm

Unraid is it realy btrfs raid ?

Marten · December 6, 2023, 11:18pm

I tryed unraid and it would load a btrfs raid of drives

xradeon · December 6, 2023, 11:59pm

In Unraid when you create a “pool” (what used to be the “cache” drive is now just a generic pool since you can create more than one) it will use BTRFS by default, however, you can change it to ZFS if you wish. The normal array drives I think use XFS by default.

quilt · December 7, 2023, 1:16am

BTRFS RAID 1 simply stores each block twice on different drives. On which drive exactly is kind of random. RAID 10 does the same, while striping across the other drives.

So when using RAID 1 on btrfs you don’t have your disks ‚mirrored‘. No two disks have to hold the exact same data. But since each block is guaranteed to be duplicated across two disks you can lose one disk and still be safe.

SomeDudeInAZ · December 9, 2023, 8:00pm

Thanks for that, because I was still trying to wrap my head around btrfs raid 1. Good explaination.

As for this project, my grandiose plans have had to change somewhat. Buying the hardware to set up a server and maybe splurging on a few more drives (despite the credit-card interest…) will have to be put on hold.

On the plus side, my shiny new water heater / water softener / water filter all look really nice and happy in my garage.

Ah, the many joys of home ownership…

So the new plan is…

Take my 4 drives, install them in my desktop (4 sata ports avail) and set them up as btrfs raid 10.

Then, since I already have one of these:

I’ll move them over to there. and can do my future backups that way. Leaving everything on the mobius set up as jbod shouldn’t break anything raid-wise. Hell, in my experience it doesn’t even change the mount points for the drives.

This allows me to swap the drives out keeping the “relative safety” of a disconnected system in case of a random powersupply fryup.

lapsio · December 31, 2023, 6:23am

Sorry for necrobump but I’d just like to mention that I’m using BTRFS RAID6 since the very first release (kernel 4.6 around 2016 I believe? maybe earlier) and I did loose array only once and it was with old, fatally flawed implementation (I did not loose any data though, it just started flipping to read-only mode and I had to build new one and migrate data). It started as 3 bay RAID5 and then I kept rebalancing it over and over and over again to 14 bay now (14x2TB). Ability of btrfs to freely change RAID profiles and extend RAID is amazing, afaik ZFS only recently added this functionality.

Performance is okay, I’m getting around 800 MB/s read over 10G connection to NAS but I’m using LUKS and reading 14 encrypted drives at once is quite taxing, CPU gets hammered pretty hard.

RAID56 in btrfs is fine. Devs are really careful with wording because there was fatal flaw that could trash your array while doing scrub in old implementation (I believe my lost array was affected by this issue, what’s worse it was unfixable for existing arrays, you had to make new filesystem because there was no in-place disk format data conversion implemented) and not many people use it so they play it very safe. Btrfs really struggles with reputation of unreliable over complicated thing so they really try to focus on stability and raid56 doesn’t help them with that.

But I’m using btrfs exclusively everywhere since kernel 3.x back in like 2013 or something and I did not loose any data (though I did loose few filesystem on laptops because back in the early days it really didn’t like kernel panics and dirty shutdowns, now it’s much better tho). I also use btrfs at work on servers that we use in production in big companies as integrators. And it’s fine.

twin_savage · December 31, 2023, 10:55pm

imo BTRFS native Raid 6 is a complete nogo and I would strongly discourage raid 5 use. However BTRFS on top of mdadm 5/6 is serviceable.
These are the main issues currently affecting the reliability of BTRFS raid 5/6 in my eyes:

BTRFS raid5/6 scrub performance makes it unsafe, taking weeks when it should take <24 hours because of how it is implemented
Power failure or kernel panics will still corrupt metadata if on raid 5/6; supposedly a metadata journal is going to eventually be implemented to help with this issue but it hasn’t been developed yet as I understand
BTRFS ENOSPC issues still persist
RAID write hole still exists (to be fair mdadm also suffers from this)
incorrect dev stats output misleading users to what drives are failing-- I’m not 100% sure this is still an issue with the most recent build

lapsio · January 1, 2024, 4:16am

BTRFS raid5/6 scrub performance makes it unsafe, taking weeks when it should take <24 hours because of how it is implemented

^ this is getting a bit better. I recently performed scrub on 28TB array and it took something along 4 days. It used to be over week on older versions. That said please note that btrfs raid 5/6 corrects data on read/write so I don’t really perform scrub that often.

The worst part is powerloss corruption and that’s real [F] and I gotta agree with that. Though it depends how often you experience such since it’s not like every panic or unexpected power off will damage array

twin_savage · January 1, 2024, 6:57am

The geometry of your array has pretty high IOPS/TB which definitely contributes to this time. High capacity hdds would probably be on the order of 6-10 times slower than yours which would push scrubs pretty high. For comparison of scrub times, your array using hardware raid 5 would likely be ~3.5 hours.

This can get you into trouble if scrubs aren’t done often enough (assuming bitrot happens often, which isn’t usually the case); without an active read to all files on the array, files can bitrot past the point of them being recoverable with parity.
ZFS is in the same boat, and it seems like many users of it are content with scrubbing every 1-4 weeks.

modzilla · January 6, 2024, 7:49am

But I would always run the metadata as raid1c3. That way, as far as I know, you should be safe in that regard.

twin_savage · January 6, 2024, 8:02am

You should be pretty safe doing that.

Marten · January 6, 2024, 9:17am

I would say removing a failing drive is stupid if you can replace it.
I wasted several days trying to replace a smart drive fail.

SomeDudeInAZ · January 6, 2024, 5:49pm

28 days is hardly a “necrobump”