OpenZFS has added lots of features in the last few years, one of the more interesting is the Special VDEV which is intended to use fast (typically NVMe or at least SATA/SAS SSD’s) to store metadata and small files to speed things up.
You could set up multiple pools as you suggest, or you could center your pool around the hard drives and use your various NVMe drives in ways that speed up the hard drive pool.
In my experience, while I absolutely love ZFS, all-NVMe pools under ZFS never really perform as well as one would hope. Don’t get me wrong. They perform faster than hard drives, but for whatever reason they never quite reach their potential. This likely has something to do with how ZFS is optimized around hard drives from the core up. Maybe this will improve over time.
My main pool (which resides on my main proxmox box) has similar use cases to yours, and it is structured like this:
raidz2
16TB Seagate Exos x18 (7200rpm, SATA)
16TB Seagate Exos x18 (7200rpm, SATA)
16TB Seagate Exos x18 (7200rpm, SATA)
16TB Seagate Exos x18 (7200rpm, SATA)
16TB Seagate Exos x18 (7200rpm, SATA)
16TB Seagate Exos x18 (7200rpm, SATA)
raidz2
16TB Seagate Exos x18 (7200rpm, SATA)
16TB Seagate Exos x18 (7200rpm, SATA)
16TB Seagate Exos x18 (7200rpm, SATA)
16TB Seagate Exos x18 (7200rpm, SATA)
16TB Seagate Exos x18 (7200rpm, SATA)
16TB Seagate Exos x18 (7200rpm, SATA)
special
mirror
2TB Gen3 MLC Inland Premium NVME drive
2TB Gen3 MLC Inland Premium NVME drive
2TB Gen3 MLC Inland Premium NVME drive
logs
mirror
375GB Optane DC p4800x
375GB Optane DC p4800x
cache
4TB WD Black SN850x (gen4)
4TB WD Black SN850x (gen4)
This pool stores all of my data, including personal files, house shared folders, medial library etc. etc.
The main data pools are RAIDz2. I have two of them, in part because I started with a 6-drive pool like you are intending, and added a second VDEV as my needs grew. ZFS also uses the two RAIDz2 VDEV’s in parallel which speeds things up. I rather accidentally fell into this configuration, but as luck would have it it has worked very well for me.
The small files and metadata lookups are lightning fast because I have configured the special VDEV to handle these.
The special VDEV is a three way mirror, as if you lose it you lose the entire pool, so I wanted to match the redundancy level of the main data VDEV’s (the two RAIDz2’s) Either of those can lose two drives without losing the pool, so I wanted to do the same with the SSpecial VDEV.
The log drives (or slog as some call them) are dedicated drives for the ZFS Intent Log (or ZIL). These drives are only ever written to in normal operation. They speed up disk writes by committing write intents to NVRAM and reporting this back to the writing process, so it can continue writing. The log drives are never read from unless something goes wrong. Data is instead committed directly from RAM to the main pool. Thge only time they are ever read from is during boot/import of a pool that has suffered an unclean shutdown. ZFS will then read the ZFS intent log and re-assemble the final writes where they belong before finishing the import.
In this application, low latency is king, thus the Optanes. They only ever hold the last few seconds worth of writes, so they can be very small. Smaller, and older Gen3 optanes work very well in this application, and these can still be found relatively inexpensively.
A slog drive isn’t necessarily required. They just speed up sync writes. Without them you may find that some writes are very slow as they go to a ZIL inside the main data pool instead. You can speed this up by configuring your pools to disable sync writes, which makes writes really fast, but now if something happens (server crash or power failure) you lose your in-flight data that is in RAM but not yet written to NVRAM.
Depending on the workload this can be either important or not at all. If you are in the process of transferring a ripped bluray to your storage, this does not matter at all. You are going to re-start that transfer from the beginning anyway, as a partial blueray isn’t of much use. For database writes or VM disk images - however - those last few seconds of inflight data can mean the difference between corruption or no corruption. So you evaluate your needs, and set things up based on those needs.
The cache devices are essentially read cache. They also go by the name L2ARC. (ARC is the cache in main RAM, L2 ARC is the cache on the log devices)
You can lose the log devices without losing your pool, so they don’t need to be redundant. IN my case the two drives are essentially spanned providing a total of 8TB of NVMe read cache for the hard drives.
These can either help speed things up a lot, or not at at all, and it really depends on your working data-set. ZFS really loves RAM, and each time you add features to it, it likes to have even more RAM. So if you add smaller log devices to your pool than your active working dataset, they can actually in some cases slow things down, as now there is slightly less RAM for the main ARC, and the L2ARC is just constantly being thrashed as your working dataset is too large.
Over the years (as SSD’s have become more affordable) I have used the following for log devices:
- none at all
- two 128GB sata SSD’s
- two 512GB SATA SSD’s
- two 1TB SATA SSD’s; and finally
- two 4TB NVMe SSD’s.
In all but the last configuration, in my use case, the cache hit percentages were absolutely atrocious. I kept increasing the size hoping I would get to the point where finally a reasonable enough proportion of my random pool reads would be cached in the log, such that it sped up reads overall. This did not happen (for me, with my use) until the log hit 8TB.
Just like you, my box is a all-in-one system. (though I have added a second smaller node since). It contains storage for my house, but it is also my VM box. I do this with proxmox.
I found that sharing a pool for both my VM’s and my storage needs was suboptimal, as the VM’s would take a hit dsuring heavy disk activity (like dumping a 2TB disk image to the NAS at high write speeds over 40Gbit networking)
because of this, I have additional pools on my box:
My boot pool. (proxmox sets this up for you on install if you’d like, and it works very well)
mirror
64GB Optane M10
64GB Optane M10
logs
mirror
375GB Optane DC p4800x
375GB Optane DC p4800x
This was primarly to give myself some redundancy on the boot pool for the proxmox machine. The Optane M-series are those small Optane drives Intel used to sell as dedicated cache devices on consumer machines.
They aren’t exactly “enterprise” but they feel a lot like it, and they seem way more robust than anything else consumer I have ever used. such is the glory of Optane I suppose.
They are only 2x Gen3, so sequentiuals aren’t blisteringly fast, but they still have better random performance and IO latency than any non-optane SSD.
The best part is, a couple of years ago I bought a 20-pack of the little 16GB variants on eBay, brand new for less than $50 just to play with. The proxmox install on my system is only 5GB, so the 16GB variants would be plenty for this and dirt cheap. (they have gone up a little as peole have discovered how cheap they were and used them for mini-PC’s and as mirrored server boot drives, but they are still a very affordable option)
And here is a real controversial part of my configuration.
You’ll notice I have the same optane drives as log devices on this pool as my main storage pool. I noticed that the Optanes can take pretty much anyhtng you throw at them without slowing down, up to crazy high threaded and queue depth loads.
So I decided to just partition these drives, and use them as SLOG’s across all of my pools in this box. I’d be lying if I said it was my idea though. I got it from the main reviewer over at ServerTheHome (blanking on his name right now) in one of his reviews of Optane drives from years ago, heaping praise on these drives.
Some will tell you this is a terrible idea, but it has worked very well for me and I have had zero problems in my 5 years of using Optane drives this way in the SLOG role (first two 280GB 900P’s and now two 375GB p4800x’s) And since you can lose them without losing your pool, it is pretty low risk. The only way a mirrored pair of SLOG’s will lose you your data is if your box goes down, and both of those SLOG drives fail at the same time.
I expected that doing this might result in slowdowns when more than one pool tries to sync-write at the same time, but this simply hasnt happened. I can’t explain it. Optane’s just are that amazing. Pure magic. it’s a shame they weren’t profitable enough for Intel to keep them around.
Next pool is my VM Data pool:
mirror
1TB Samsung 980 PRO
1TB Samsung 980 PRO
logs
mirror
375GB Optane DC p4800x
375GB Optane DC p4800x
This pool is a dedicated mirror for VM drive images.
Mirrors work best for performance, which is why I used that. I also don’t have a lot of storage needs for my many hosts, many use only a few GB each, so I never needed more than this.
Proxmox - by default - when you assign a ZFS pool as the VM storage pool will create subvols for containers and creates emulated block devices (instead of drive images) on top of the ZFS pool for KVM drive image use. This is actually really cool as these virtual drive images appear on the host system as /dev/zd0 zd1, etc. and behave like real block devices. They seem to perform well in VM’s as well. And you can snapshot them using zfs snapshot which is great too.
I will admit to making a mistake the last time I upgraded this pool though. And that mistake was to use consumer “Samsung pro” drives. I have since learned that Samsung consumer SSD’s (which their 980 and 990 Pro models are, despite the “pro” name do not work well in server applications, especially where there is heavy load or long uptimes. They have a tendency to suffer weird firmware lockups and become unavailable until you power cycle the server.
My next upgrade is going to be to replace these with some more serious enterprise NVMe drives.
Some consumer parts work very well in servers, and I have a long history of using consumer SSD’s from many brands in my servers (including old SATA Samsung PRO drives) and in the over a decade I have been doing this, the Samsung 980 Pro and 990 Pro drives have been the first to bite me in the ass. Even my Micro Center house brand (Inland Premium) Gen3 drives have been bulletproof.
Luckily I haven’t had any data loss result from it though. I guess that is one of the benefits from ZFS and how it handles redundancy.
Yeah, so these Samsung drives won’t be around much longer. You can read more about these issues with Samsung Pro drives in this discussion over on ServeTheHome.
And then there is my final pool in this box.
mirror
1TB Inland Premium Gen3 NVMe
1TB Inland Premium Gen3 NVMe
This is a dedicated pool for one of my VM’s, (or actually, it’s an LXC container, but that’s not relevant).
That container runs a dedicated MythTV backend. MythTV is an open-source project that works as a DVR/PVR system, recording your scheduled shows from your TV service. (Yeah I know, not too many people keep cable subscriptions anymore)
I used to record straight to the hard drive pool, but I found that occasionally my recordings had stutters in them, so I decided it would be better to record to a dedicated SSD pool. It is a TB in size, but a script runs every night at 4am, and moves the oldest recordings to the main hard drive pool in a kind of semi-manual storage tiering solution.
The MythTV database is great like that. it doesn’t care where a file is, as long as it can find it on one of its filesystems that are defined for recording use and currently mounted, so by just moving the files every night it still knows where everything is and just works.
Note the absence of the slog drives in this pool. They are unnecessary here. The entire pool is set to sync=disabled, and thus operates only with async writes. This is one of those situations where if the TV recording isn’t complete, I don’t want it anyway, and incomplete recordings are purged and re-recorded during a different airing instead. So last few seconds of data written is completely irrelevant.
Anyway.
I figured by sharing how I have things set up in a similar configuration, you could learn a little about it, and decide what might work for you. Maybe that is - as you originally proposed - a dedicated fast NVMe pool for your files, separate from your rips, or maybe that is using some of those NVMe drives in Special or log configurations to just speed up the hard drive pool and keep everything in one place.
Or maybe - like me - you want to have a dedicated VM mirror and boot pool.
The possibilities are endless 
I hope this was helpful.