Hello there,
I’m currently a little bored so I am looking into ZFS a bit. I was thinking about building a new(-ish) NAS sometime again when I get the time, so this is more or less a preparation, though I probably won’t use it for a while. Anyway though… to the questions (sorry it’s going to be one of those posts).
So the first thing I didn’t quite get what the point and/or difference of a vdev compared to a pool is.
What I understand from wendell’s videos is that a vdev is a set of disks that handle their own parity, and a pool consists of one or multiple vdevs. So far so good, but I don’t quite understand the point.
If a vdev is handling its own parity, why would I throw multiple vdevs into a pool and not just use multiple pools?
From what I gather a pool can write the data over all of its vdevs, which would make sense just to increase the total storage capacity when adding drives later.
However at that point I wonder how the writing and file integrity works over multiple vdevs. Since each vdev could potentially handle its own parity differently (e.g. vdev1 is a RAID1, while vdev2 is a RAIDZ2, and vdev3 is a RAID0 for some reason), how does ZFS decide where to store the data?
For illustration: I have block of data I write to the storage pool, and the data happens to land on vdev3. Now one of the drives in vdev3 dies for inexplicable reasons without warning, what now? Since RAID0 is not exactly recoverable I’m wondering how this is handled. I mean, sure the combination doesn’t make a lot of sense to do, but it is technically possible, right?
The same goes of course for 2 drive failures in vdev1 or 3 drive failures in vdev2, because I had a bad batch of drives or whatever.
On that end: Yes I know you’re supposed to have a separate set of backups in a different location, but that’s another story.
So, going on I tend to stumble onto more terminology that I’m not quite sure what it means. In his videos wendell mentioned datasets every now and then, but I’m not quite sure what exactly a dataset is in ZFS terms.
From what I understand from the FreeBSD page “dataset” is just a generic term with no inherent meaning as it doesn’t necessarily say what kind of dataset it is?
From the word alone I would have thought that each dataset would have a different purpose (for example in a NAS, one dataset would be the mediastorage, and one would be personal files), but that doesn’t seem to be the case.
Speaking of mediastorage and personal files, would this be a reasonable use of multiple pools or would one just use a regular directory structure for that? Or is that where “filesystems” (weird name IMO since ZFS is also a filesystem?) would come in? On the aforementioned FreeBSD page it says that fileystems are mounted somewhere and act like any other filesystem (as if it were a separate disk?). So creating a filesystem and mounting it as /home (or whatever else) would fit that description.
And the last thing I stumbled were volumes. On the page above it says they are block devices, useful for creating other filesystems on top of ZFS or as iSCSI extents. So from what I can tell, they don’t have much use for a home-NAS?
So, if you read so far, thanks for the attention. I’m sorry for the long post, but ZFS seems to be a riddle for me…