Support hard drives of differing / mismatched sizes
Enable incremental upgrading of hard drives in batches as small as one
…before someone suggests a dumb choice like ZFS. (ZFS suggestion privileges may be purchased for the low price of gifting me the 3 additional drives necessary to complete the pool.)
Anyway, now that I’ve pissed off half the ZFS fanboys…
I have 2 big hard drives.
I will be using SnapRAID as my backup. With a single data disk, currently there is no point in using MergerFS.
Currently I am leaning towards XFS on both drives.
Data drive:
ext4 can be the easy option, but it’s just so meh.
xfs has some cool features. For example I can put the journal on the boot SSD and get a boost in perf (and some insignificant chunk of extra space). Plus, what else am I gonna do with all that free SSD space?
P.S. I am quite new to linux and not used to its directory structure so I’m taking suggestions on where to put this journal (using btrfs subvolumes).
The other option is btrfs. It has some really cool features like snapshots and transparent compression - the usefulness of which is questionable on what is supposed to be a media library, with already compressed data. Tho it’s probably gonna have some amount of small, highly compressible files. Subvolumes with the correct mount options could be neat. Snapshots, I’m not sure I have a use for.
Parity drive:
The parity drive cannot be ext4, because SnapRAID stores parity as a single file and ext4 has a max file size limit lower than the capacity of my drive.
xfs would avoid this problem and be the other simple choice.
btrfs is all nice and fancy, but I’m not sure the parity drive would benefit from compression much. Snapshots could be cool tho. For those that are aware, windows has something called Volume Shadow Copy. It basically makes incremental backups of a file. The combination of SnapRAID’s parity file + taking regular snapshots of it can potentially implement the same mechanism. How to manage and automate that, I have no idea.
You know you can run ZFS with a single disk, right?
I’m not sure about ext4 vs xfs, but btrfs gives you the ability to scrub and verify that your data is still in tact. For long term data storage this is really useful because when your drive starts to fail you can verify that there hasn’t been any damage.
Snapshots can also be really useful for both local and remote data backups in conjunction with a tool like btrbk. Local backups allow for quick restore when you accidentally (or someone else maliciously i.e. ransomware) delete a file. btrfs (and zfs) snapshots are efficient in the sense that they do not us additional storage space for multiple copies unless necessary.
Lastly btrfs (and zfs) will allow you to ‘‘stripe’’ your data across multiple drives when you eventually buy more. With btrfs you can also remove drives if you so wish.
Also, the scrub part is SnapRAID, regardless of which FS it sits on top, so we have that ability covered. Also covers the local backup thing.
I have not researched btrfs raid/drive pooling too much, but zfs raidz expansion has been coming soon™, right around the corner™ for a while. I will put money on seeing fusion before raidz expansion.
I think if you wish to use SnapRAID your conclusion of ext4/xfs makes sense. You can actually convert ext4 to btrfs later if you change your mind.
I’ll use terminology like btrfs (zfs) to help with some of the difference in terms.
When you add a device (vdev) to a filesystem (pool) you expand the size of the filesystem (pool) by the size of the device (vdev). This is sometimes called striping (or sometimes RAID0) because the data is shared across all devices (vdevs), but without any parity data. Each device (vdev) will still have integrity data (learn more here), so you can verify that the data is correct, but if you lose a device (vdev) you will lose the data.
btrfs compensates for this by allowing data on a single device to fail without losing data on other devices, at least with a btrfs “single” profile. Raid profiles will be different.
zfs compensates for this by allowing a single vdev to be composed of multiple devices with parity (i.e. mirror or raidz), but if you loses a vdev you lose the entire pool. A vdev can be a single drive.
raidz expansion would allow you turn a vdev with 5 disks at raidz2 into a raidz2 with 6 disks. You can always add more vdevs.
Ok, that makes sense. But it also confirms that zfs is worse than useless. I can make a 2-device vdev and 1-vdev pool now and when I get more drives, I can do nothing with them that won’t be actively harmful to the pool.
Now that we have that fact established, can we please stop talking about ZFS?
Like for example how do I put my xfs journal on another device (my boot ssd, which is much faster)? It’s formatted with btrfs. Do I make a subvolume for it? Do I need to mount it? Can I expose it as a device somehow - because mkfs.xfs docs say journal expects a device - or at least a partition?
Or like any other useful recommendation or comment that does not involve ZFS. Please.
Well considering how much money went into R&D for it (which was literally billions) with genuinely smart engineers at its helm. Its literally the big dinosaur in the room that helps keep the vast majority of the internet running by keeping data safe.
You might not like the features and the companies involved but it is what it is: the best file system for servers and NAS.
Institutions’s critical data runs on it. Your bank, your government, your hospital, pretty much stuff that matters.
That is why you find ZFS cultist everywhere. It is mature and had time to mature, which the same cannot be said for the alternatives.
So give in and join the ZFS cult, the cult with integrity.
Maybe I’m just unlucky or stupid, but I’ve done this more than half a dozen times, and I’ve had it work at all maybe 3 times. None of the ext4 to btrfs conversions were without issues, either.
It’s nice as a last resort, if you absolutely need to convert unimportant data from ext4 to btrfs and have no free space to copy back, but it’s not something I would recommend anyone do with any real data.
I also don’t really see how snapshotting parity data is ideal for a volume shadow copy. Wouldn’t it make much more sense to just use normal subvolume snapshots for this? Whatever folder you want to have shadow copies, make it a subvolume and mount it wherever it would go. Make a cronjob or script to snapshot on boot/login/every hour/whenever you want.
Looking at the plans and requirements, I also don’t understand the problem with ZFS or why you don’t want to use MergerFS. You can make single device vdevs and combine them after the fact with MergerFS, is what it looks like to me. In fact, isn’t that the whole point of MergerFS? That you can have a bunch of random disparate-sized disks, appearing as a single disk with same-named directories on each appearing as the same directory for the purposes of reading content?
It seems like, with snap raid and MergerFS, the whole point is that you don’t need to worry about adding new devices of mismatched sizes regardless of filesystem. It’s not like you’re creating a Raid5 volume through zfsutils, and therefore need matched drives so the OS can balance striping and parity; you’re using individual disks as individual disks with individual filesystems, and then presenting them as a single disk with a single filesystem for convenience, with a separate parity disk entirely. No need for disk matching, striping, or any worries about compatibility at all. You can add as many ZFS single device vdevs as you want, just like with xfs or ext4. With BTRFS, you have an option to do that built into the FS, similar to ZFS, but what’s the point if you’re using MergerFS anyway?
I get literally none of the integrity benefits for my data (excluding metadata, which xfs/btrfs also delivers). @regulareel Magic some integrity my way. I’ll be waiting.
Well, the CTO of iXsystems said something like “single disk ZFS is so pointless it’s actually worse than not using ZFS”.
Allegedly.
@alkafrazin I have only one data disk. Nothing to mergerFS yet. I’ll put it when I get a second drive.
It’s very possible I’m misunderstanding how parity works.
This whole linux servering thing is very new to me.
Otherwise what you said about Snapraid/mergerfs is true and sort of the whole point. It’s just that people refuse to offer any advice or even acknowledge anything that’s not related to zfs.
The thing is that if you care about your data integrity than a single disk is simply not enough. When you use a single drive, the filesystem is essentially unimportant because when that drive fails the data is gone. No amount of filesystem magic can save you. You need backups, and you want redundancy.
You’ve correctly determined that you want a easy to use and scalable backup system. Snapraid seems great. It seems that many people use other options - but don’t let that stop you.
For a single drive, use the defaults of your OS (ntfs for windows server, ext4 for debian, etc) and start getting other functionality going. If you want a specific filesystem feature than use those, but then you wouldn’t ask the question “which to use” but rather “how to use” - perhaps you want another thread for xfs questions.
When you add more drives you will want to consider how software RAID systems work (mergerfs, btfs, zfs, etc) to give you redundancy in addition to your backup system. Worry about that when you save up for another drive .
Well, the advantage of ZFS is that it has a lot of the typical COW features you see on BTRFS, and is really heavily used, well documented, and very stable. The main reason to use it over BTRFS for a single-device vdev would be for that stability.
Otherwise, BTRFS does a very good job of detecting bitrot according to what I’ve read. In fact, based on what I read and can no longer find in the sludge of AI generated garbage, it can be a bit of a problem; btrfs did a better job of detecting errors in data than zfs, but was less good at recovering said data, and more difficult to recover and move on with using the possibly-corrupted data.
Sad I can’t find it anymore, it was a pretty educational comparison of several journaling filesystems and how well they could detect and recover from errors, with BTRFS actually topping the charts in error finding, and ZFS trailing a bit but being better at just carrying on anyway.
That said, if you really don’t want to use ZFS, I would say BTRFS is the best, for it’s ability to detect silent bitrot. ZFS has more stable soft-raid than BTRFS or MDADM, and supports dataset encyrption, but BTRFS is more available on freshly installed systems, and is better integrated with the rest of linux. You don’t really need the performance advantage of a less reliable filesystem, like XFS or EXT4, and with the parity data, in theory, a total device failure should be recoverable. The availability of scrubbing your filesystem for errors is more valuable than just some extra write performance when full.
Speaking of which, anecdotally, I’ve found ZFS chokes very hard when full compared to other filesystems, even worse so than btrfs, but none of them handle it well, especially anything COW. It’s best to leave a good chunk of disk free for performance reasons, maybe ~5% or so should be enough?
I wonder how the server and storage world survived before ZFS was invented (written) by Sun? It did, so your statement can’t be true.
Most of that ‘maturing’ was done by Sun users, well before any Linux user came in contact with the FS. BTRFS was specifically build as Linux-native alternative for ZFS after the already mature ZFS was ported to Linux.
Please leave that kind of comments alone, they’re not helpful. At all.
Was it this?
Anyway, alternatives for ZFS and BTRFS include BcacheFS or simply JFS+LVM+mdadm-RAID.
So, effectively ZFS reduces the capacity of your drives? No thanks
Personally I use btrfs for data drives and ext4 for parity (although I would use XFS if my parity was too big for ext4). As you say compression is pretty much useless for media files but snapshots can be handy. You can use snapshots with snapraid to remove the risk of data loss from having to restore a disk after too many file changes or deletes have happened. You can do this by creating a snapshot before you run a sync and this way snapraid is never out of sync . The only disadvantage to this (other than the added complexity) is that snapraid doesn’t support using inodes to detect move operations on snapshots which just means that if you move a file snapraid will treat it as a remove and add rather than simply updating the metadata during a sync.
It’s also just handy to have a couple of snapshots just in case of accidental deletes or modifications, snapraid can help with this too but snapshots are much easier.
That’s a pretty good idea. And the snapshots are never expected to be significant because they will for the most part be static files.
I also found this: SnapRAID-BTRFS - Self-Hosted Show Wiki
However I was also looking through XFS today (already went through all the btrfs docs last week).
It is much more light-weight and presumably performant, especially with a lot of data. Multiple accounts (here and elsewhere) suggest that CoW filesystems become bogged down when filling up to capacity.
You can also put the journal on a different device which seems like an excellent way to utilize the free space and IOPS of the boot SSD while alleviating the IOPS load on the spinning rust.
And it also has enough safety features to detect data corruption and ensure its own consistency.
I am not sure what a good size for the journal is tho. Wikipedia says
In XFS, the journal primarily contains entries that describe the portions of the disk blocks changed by filesystem operations. Journal updates are performed asynchronously to avoid a decrease in performance speed.
So I’m considering that perhaps 256 MB is more than enough. (64 MB is the min size and I discovered that 2038 MB is the max size. )
All filesystems will bog down, especially with frequent modify/rewrites. CoW filesystems are more susceptible for this and the price you pay for transactional nature (integrity), snapshots,etc.
Defragmention, maintenance and reasonable snapshot policy will alleviate most of this.
Recommendation usually is max 80% capacity before upgrading. Filesystems already show slow down at 50% depending on what you are doing with it. And this is independent from drives slowing down, making it a pseudo-exponential relation the more full a FS get.
I ran a ZFS pool at 93% cap with 75ish% fragmentation and it wasn’t funny. After deleting 2k snapshots and moving stuff between datasets, things went butter smooth again.
And if you’re storing mostly media, that’s easy for a FS. Write once, read often. No problem. VMs and small stuff changing all the time is the worst for a FS.
Media is easy mode. Your FS loves it because large continous blocks, your drives run in sequential mode all the time. Everyone is having a blast (except for the capacity counter).
And reading media is usually playing media, so a couple of MBits/s…nothing the slowest HDD can’t handle.