Should NVMe SSD's be used for caching a SATA SSD pool? What about Optane for dedup / metadata devices NVMe pools?

tl:dr

  1. Would SATA SSD pools benefit from an NVMe cache?
  2. Would NVMe pools benefit from Optane dedup / metadata devices?
  3. Would a single NVMe SSD boot drive in Windows benefit from an Optane stick as a cache drive through PrimoCache?

Long version:

Say I have a ZFS RAIDz1 pool consisting of 8x2TB (so 14TB effective space), and I want to speed up the read and/or write performance of the pool.

Would adding in NVMe SSD’s for a read or write cache make a difference to warrant those drives?

On a similar note, I saw that Wendell’s been excited about Optane drives going on sale as the lower capacity M.2 drives are perfect for acting as ZFS dedup and metadata devices (as long as you have at least 2 for parity). But would Optane be helpful for NVMe pools that are already so fast?

Lastly, with my Windows PC, would supplementing my NVMe boot drive with an Optane cache through the PrimoCache app be beneficial or not worth it?

I can answer 1 and 3 quite readily, not sure about 2 as my ZFS testing didn’t get that far, I use ceph but a lot of the knowledge is transferable (I may have got a bit carried away writing this LOL)

For 3 the answer is almost certainly no, unless the existing boot drive is bad, the overhead of caching will likely negate any benefit - and frankly pretty much any SSD does just fine for booting up a computer and app loading quickly in my experience.

For 1, the answer is probably not, with a caveat (below the line)

The problem is that NVME SSDs don’t really tend to be much quicker at random reads/writes (especially on a pool where you have lots of SATA drives for this load to be spread across - so your nvme drive would need to be faster than not 1 sata drive, but four/eight of them working together) and although sequential reads can be an awful lot quicker, it’s unlikely that a cache drive will cache what you want to read sequentially - and honestly, the 500mb/s you’re getting from a SATA SSD is still plenty for the vast majority of tasks, and should scale up with your 8 drives.

For some context on this, I’ve got a 1tb samsung 970 pro in my system and a 256gb sata micron 1100 - sure, sequential reads are 6x to 10x the speed on the NVME drive, but random reads are only 1.5x to 2.5x the speed (going from best case (queue depth 32) to worst case for the sata (queue depth 1))

If you have a large queue depth, the performance of those random reads will improve with your having 8 of the sata drives - best case you will see 8x the performance of a single sata SSD, so actually way better than what a single NVME drive could do and so cache. If you have a lot of random reads at queue depth 1 then sure, you might see a benefit, but this is heavily dependent on the data you want being cached, and ZFS is already really good at caching to RAM… if this is what you want more RAM is a better solution, it’s hugely faster than any SSD (and still quite a bit faster than NVME optane…)

Random writes will scale nicely with the number of SATA ssds being used so there is no point caching them, with the caveat below…


So here’s the caveat to when you might want to add an NVME drive/optane.

Do you have any sync-writes on your pool?
Do your existing SATA ssds have power loss protection?

If you have sync writes, then these go very slowly with consumer ssds. SSDs with power loss protection bend the truth, and will say that data has been committed to the drive when it hasn’t yet - because the SSD knows that with it’s power loss protection it will be able to safely write the data even if the drive loses power, so the data is as good as written.

Assuming that you don’t have PLP on your SATA ssds, you could add a drive that is good at sync writes to store the ZFS Intent Log (ZIL) and this will greatly speed up sync writes on your pool, as they can then be buffered and written to the pool as normal writes.

Small Optane drives are good at this as they are cheap, sync write fast, and you only need a relatively tiny amount of capacity to store ZIL - the caveat to this is that Optane drives often don’t have very long lifespan for terabytes written, but if you don’t have that many sync writes on the pool they might not be doing very much anyway, so are likely ample for home use.

The samsung PM9A3 SSD is a fairly affordable NVME drive with PLP that comes in a range of form factors, and is the one I usually recommend, although Micron also have some good options. If you are going to use one as a ZIL then I’d suggest just allocating a small amount (say 32gb) and using the rest as a ‘fast’ zfs pool.

You can test if you will benefit from a ZIL by setting your ZFS pool to sync=disabled then running your typical workloads/benchmarks. If you notice an improvement then you will benefit.

Make sure to set sync=standard when you finish testing as in flight data is at risk while sync is disabled.

It’s often recommended to get two ZIL devices, but actually your data is pretty much safe with just one - if the one ZIL fails your pool will just revert to it’s performance level without the ZIL.

If you got this far there is a bit more reading here:
https://jrs-s.net/2019/07/20/zfs-set-syncdisabled/

1 Like

That’s a lot of good info! I figure I should provide more info on the system specs so my questions make more sense in context.

CPU: Threadripper 3960X
Motherboard: MSI Creator TRX40 (two PCIE x16 slots and two x8 slots)
RAM: 128GB (4x32GB) 3600MHz
Top x16 Slot: Asus PCIe 4.0 x16 to four M.2 NVMe SSD Adapter - holds 4x 1TB Sabrent Rocket SSD’s
Top x8 Slot: LSI 9207-8i SATA HBA - this will connect to eight 2TB WD Blue M.2 SATA SSD’s.
Bottom x16 Slot: Another Asus SSD adapter like in the top slot. Right now I have two 1TB WD SN750 M.2 NVMe SSD’s in it, and I was planning on getting 2 more drives to max it out, which is where I was thinking of either getting more SN 750’s or getting two Optane drives.
Bottom x8 Slot: a GPU (currently only have a GTX 1050 on hand, working on getting something better for later)
The individual drives don’t have power loss protection, but the whole server is connected to a 1500VA / 900W UPS.

I’ve gone through reading that very same ServeTheHome article you linked when I was building my first NAS, so I know that Async writes will always outperform Sync writes, as Async will use up the system memory before writing the data to the pool, compared to Sync writes which always write to the pool immediately.

This system is less of a NAS and more of a remote workstation that I plan to offload all my programming / computation work to so that my personal PC can still be usable. I only have 10Gbps hardware between the PC’s, so I would completely saturate two SATA SSD’s in a RAID 0 / Striped config, so the drive caching improvements won’t make a difference over the network.

The general idea is to have a Windows VM and 1-2 Linux VM’s on the server that have the different pools mounted and shared between them, where the data would be synced in real time. I think this would mean that I would be running all the pools in Sync write mode, so write caching isn’t going to help much.

That line of thought led me to wonder if getting two Optane drives to act as dedup or metadata devices for either the SATA or NVMe SSD pools would be a good idea, of if I should just add more 1TB M.2 NVMe SSD’s to add to the rest of the NVMe drives I have on hand.

But based on the info you gave, it seems like Optane won’t really be helpful for that kind of stuff and I might as well just expand the pools with similar SSD’s.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.