I wanna make a raidz1 with a few 18tb drives and a few smaller ones pooled together as a composite drive. The layout would be the following:
1x 18tb
1x 18tb
1x 18tb
2x 8tb, 1x 2tb
1x 6tb, 4x 3tb
So each line would be 1 “drive”. I would pool together the smaller ones and make a zvol on top of them then add it to the raid array. So a 5 “disk” raid array with each 18tb.
Is this a bad idea, can this lead to some really bad situation that I should be aware? I am only thinking that the composite “drives” would just have a higher possible failure rate than the real drives. Also if a part of composite drive dies lets say the 3tb one, could i just somehow replace that part of the zvol an keep the other disk and the scrub fix it?
So I guess what matters that this would be 11disk array with the tolerance of 1 disk failure.
I’ve been through this kind of thing before. Not entirely impossible, but there are some things to consider:
- Each of those lines:
1x 18tb
2x 8tb, 1x 2tb
1x 6tb, 4x 3tb
will have a slightly different actual size in terms of blocks, so you need to make sure you don’t end up in a situation where you can’t replace one with the other when something breaks
- How do you plan to merge the non-uniform drives? LVM2? Non-uniform MD? ZFS doesn’t have the tools to merge smaller into larger and splitting the larger into smaller is a terrible idea here.
- The general ZFS community will frown upon anything non-standard, so don’t even try getting their help on anything “custom” like this. Better yet, make sure they never learn about your setup
- It’s possible that failures within the “composite” drives will be less problematic if ZFS is presented with “partially OK” data after disk replacement. Whether that’s actually the case remains to be seen though, ZFS community would rather tell you it’s not a good idea rather than answer a question
.
- Plan ahead with drive replacement. What happens when the 8TB drive dies? Do you replace just it, or the whole row? What about the other drives?
I did something similar by “emulating” a 4TB drive with 4x1TB, but I planned ahead with replacements for these 1TBs and merged them with MD.
(I’ve discussed it ~a year ago here)
Good luck & have fun with your build.
Mainly I was thinking about the composite drives being stripped zpools than they would have a zvol on top of that so I would add that zvol to the main pool that is the raidz1 with 6 “disks”.
What I am not sure in this case that if, lets say in the composite disk, one fails, could I possibly replace that drive in a way like that i can replace only that LBA range with a new disk and only that part has to be healed. Maybe this could be better achieved with lvm2 so the disks act in linear fashion and parts of it can be swapped(?)
I see you went with lvm2 in your 4tb “composite” disk. Now I am thinking about going that way.
Yeah I was looking for information and saw some similar ideas in other forums and they were quickly shutdown. With this I know, I might coloring outside the lines a lot.
For backup I have a few 4tb drives that would be the replacement for failure. As a quick solution I would “shape” them into a 3TB, 6TB or 8TB as a quick fix.
Also, I really appreciate your reply!
Yes, apparently it can cause deadlocks (unless that’s been fixed lately which seems unlikely):
https://lists.freebsd.org/archives/freebsd-current/2022-November/002788.html
Could be interesting to see if it’s even possible to create the recursive pool on linux. I wouldn’t use it for anything real though.
Yeah, I wouldn’t recommend nesting ZFS, but laying ZFS on top of LVM or MD is free of these issues.
Actually I went for MDRAID for the 4x1T composite, but for anything more exotic LVM should be more user-friendly.
Thanks for pointing out, I will than definitely stay with lvm2 and/or mdadm for the logical disks instead of nesting.
I see, thanks, so raid0 of 4 disk.
Okay so now I see the right options would be to create (linear) lvm2 logical volume, (with the possibility to utilize mdadm raid0 on same size disks in but this would be mostly useful if all the disk would be the same size in the logical volume).
Than use this logical and physical disks/volumes (also maybe better to just use lvm2 on top of the 18tb physical disk to have a “unified” volumes(?)) with zfs.
I think this is a great setup to learn the ins and outs of zfs. I would not consider storing anything in that pool that you’re not prepared to lose.
While technically true, expected reliability is a matter of statistics. The more drives you connect to a computer the likelihood of any one drive experiencing a failure goes up more than linearly.
Often the issue is temporary and it goes away with the next boot (e.g. signaling issue, power draw issue, wiggling cable, etc.). The impact on your pool is disproportional, though. The directly connected drives perform a quick resilver, any failure in the md/lvm array requires a full sync. At that point you will ask yourself why you’re doing that to yourself.
Another issue is that every drive consumes a noticeable amount of energy (5-12W depending on activity level). So, the relative energy consumption per TB storage is heavily uneven in your pool.
Lastly, raid and zfs technologies, while similar from a 10,000 foot view, are quite different. Each is reasonable complex in their own right. Using both, especially in combination, is not advisable except for an educational setup. The chance of user/admin error goes up a lot.
There are extensive studies on “raid” failure rates. Failure rates (obviously) increase with number of connected drives. Single drive failures are reasonably high (single digits), double drive failures are orders of magnitude lower. The point is again: setup is fine for homelab, educational purpose, but sketchy from a reliability point of view. In the end you are deciding how comfortable you are with the risk presented by your setup.
Note, that until now I have not commented on performance: the write performance of your proposed configuration is limited to that of a single drive (~ 100 iops - abysmal), read performance in a raidz1 config is relative to n-1 of components. So, about 400 iops.
If you consider a different setup (I completely ignore the 2TB drive):
vdev1: raidz1 3x 18TB
vdev2: raidz1 2x8 TB+1x6TB
vdev3: raidz1 4x 3TB
Storage capacity is much lower (36+12+9 TBs), but performance and reliability is way up.
In the end you need to prioritize the goals of your config.
Thank you!
Pros you mentioned (about the layout/setup) outweigh the cons (which mostly just size of usable storage).
I will use your recommended layout in my setup.
With this setup lvm, mdadm is not needed which are linux features so I can and will use TrueNAS CORE expecting a more novice friendly (less possibility for user error in the setup) and more mature zfs implementation.
Also just one more thing, some of the drives, are dm-smr (the 8tb and 6tb ones). I know this is can be (really) bad from a write speed, but I am okay with that as this is what I have and want to utilize them (in the future I definitely plan to rotate this out of the pool for larger cmr drives).
As far as pool failure goes, the possibility of smr drive flushing its cache (or going AWOL for smr reasons) maybe make the pool report failure? Also incase (physical) failure as far as I understand if I use cmr drive for resilver it should be okay time wise.
So I think from a io perspective maybe the best way would be to make two pools. One where the smr drives are and another where the cmr drives are, this would split the usable space into two but the smr drives would not choke the cmr drives out. This would not change the raid layout on disk level but just the pooling of them(?)
So vdev1, vdev3 pool1(cmr) and vdev2 pool2(smr)
I like that
Another point I want to make is that the slowest drive inside a vdev sets the pace for all other drives. The operation is only done after the last drive reports back…and HDDs are slow, but a 2-3 TB drive is way slower than a 18TB drive. And I wouldn’t accept 2TB HDD performance to set the speed limit.
Having different performance characteristics on different vdevs isn’t a problem as ZFS uses allocation throttle and will give slower vdevs less work to improve overall performance.
The pool config that @jode made is solid and reasoned and probably similar to what I would do if I had this “zoo” of different HDDs myself.