Right-Sizing ZFS Drives for Boot, SLOG, Special, L2ARC

WynnSmith · March 15, 2024, 8:56pm

My first question is: is this the right place to ask about ZFS? I intend to install Proxmox as my Bare-Metal OS into a box that will be my only server. …but ZFS is a subject onto itself.

My real question is: Can you provide a link that holds my hand to configure ZFS hardware? Over the last several months, I’ve spent a few hours each week searching for a good guide while accumulating hardware for my next server. I have a good handle on most concepts and don’t need any extra help on vDevs or pools, for example.

The trouble is, while ZFS fixes the problem of correctly managing storage spaces so we’re not continually reconfiguring partitions and drives, it introduces a new challenge of making Day-one decisions for the initial build which can’t be reversed without equivalent pain.

I’m looking for a systems perspective. All the guides I’ve found are narrowly focused on one aspect or another, or focused on post-build.

I’m building around an AMD Epyc and motherboard with more PCIe lanes than I’ll use. It has two 10Gb network ports and 128GB RAM. I have two Optane 960GB to use as an Optane mirror and two SSD 960GB for an SSD mirror. The HDs are WD Red Plus.

I know ZFS prefers access to drives rather than partitions, but I expect the Optane and SSDs need to be partitioned for various purposes.

I can install Proxmox to the WD drives with Root on ZFS. Since the server is never shut down and the cache remains warm it would perform fine, but I’m willing to install it on SSDs if it helps with the install of ZFS Boot Menu (ZBM) or diagnostic boot-ups.

Here are the questions I have:

Between HD, Optane, and SSDs where should I put the boot drive, SLOG, metadata’s special drive, and L2ARC? are there others?

How can I determine how large each should be? (I’m familiar with the answer, “depends on your use case.” …which is a non-answer. This isn’t some specialized, one-purpose server.)

Which of these can be put in partitions? (There’s a limit to M.2 connectors, so some drives must be multi-purpose.)

Your comments and advice are welcome.

cowphrase · March 17, 2024, 10:07am

First off I’d always suggest putting root on a dedicated device. It makes it a lot easier to reinstall your OS, or move your ZFS array to another server in the future. Honestly a single M.2 SSD is fine enough for a home proxmox setup, just keep good backups.

For extra devices like SLOG, L2ARC and metadata special device, the right mindset to ask is do you need them? Just to give some more info on them. FYI these can all be added after the fact (with some caveats), so I don’t think you need them initially.

In my opinion an L2ARC provides little benefit for most people, and adding RAM will provide far more benefit if you’re limited by ARC size.
SLOG can be useful for Sync write workloads (like VMs), but you’re still limited by the IOPs and latency of your HDDs. Instead using the SSD for a ZFS RAID1 pool can comparatively provide far more performance.
Special device can provide a lot of benefit. However you should consider a 3-way mirror for proper redundancy. If you lose them all you lose the entire pool. If you’re only storing large media files, the added complexity isn’t really worth it.

Day-one decisions for the initial build which can’t be reversed without equivalent pain.

I’d focus more on making the simplest configuration that meets your requirements. Often you don’t know the limitations of a setup until you’ve had a chance to use it. The only decision in ZFS that I’d consider permanent is your RAID-Z vdev setup - get the number of drives and RAIDZ level correct from the start.

I’m building around an AMD Epyc and motherboard with more PCIe lanes than I’ll use. It has two 10Gb network ports and 128GB RAM. I have two Optane 960GB to use as an Optane mirror and two SSD 960GB for an SSD mirror. The HDs are WD Red Plus.

I’d start with the two 960GB SSDs in a ZFS mirror for proxmox’s root, then the WD HDDs in a raidz2 as a second ZFS pool for VM data (with two VDEVs if you have 12 or more drives).

Then I’d test the Optane drives in their own ZFS pool, see if you can make this work with your setup. For example you probably want VMs / or C: on the optane pool, and a second VM drive on the HDD tool for bulk data. What kind of optane drives do you have btw?

You can repurpose a Optane drive as SLOG later if required.

I know ZFS prefers access to drives rather than partitions, but I expect the Optane and SSDs need to be partitioned for various purposes.

Don’t do partitions for ZFS/special/slog/l2arc.

Trooper_ish · March 17, 2024, 4:27pm

I also concur.

my $0.02

I use a single sata ssd for boot, with lvm/whatever auto config
A pair of NVMe for VM files storage, as a zfs mirrror
And spinning rust for data storage in another zfs array

But, if you write a lot, then one can add a pair of drives as a slog, but I doubt you really need to, unless it’s a work server where data can’t just be resent?

I’d say the optane pair as a zfs mirror for VM drive storage

A l2arc vs special vdev; I would now go special vdev over l2arc, and would be tempted to put the non-optane as an ssd mirror special vdev, to the spinning rust pool.

But it’s your setup. And this just adds a drive for boot.

It definitely is possible to partition drives up, and use different partitions for different tasks, but then you will get delays down the line, as well as having indifferent disruptions when a partition drive dies.

Straight single use drives, are easier to replace.

I almost always partition the drives down a little bit, then use the partition as a vdev provider. Because manufacturers make them different sizes (1tbssd’, and 960gb SSD’s and suchlike. Partition both down to 900gb, and not miss the lost bit) Also means I can use the /dev/disk/by-partlabel to assemble the array, and as I use the serial in the partition label, I know which drive to pull when it dies

WynnSmith · March 18, 2024, 6:08pm

I very much appreciate your detailed reply. Thank you. I wanted to hear your opinion before giving you my ignorant, inexperienced, newby think-so.

My goal is to build a single-box server to do all that I do. With decades of Windows Server experience and very little Linux experience, I’m migrating to Linux and ZFS. I’ll be running several VMs, including SQL, Windows, and Linux. Therefore, I believe SLOG and metadata drives will be a great benefit.

I plan to start the build with just two SSDs, where I’ll install Proxmox. I thought this might be a viable plan since, at 960GB, the SSDs are highly over-provisioned. I hope to keep Proxmox on a protected management network segment.

I plan to install TrueNAS in a VM. To do that, I want to pass through the HDs, Optane, and the LAN network.

What kind of optane drives do you have btw?

I have SSDPE21D960GAM3, U.2 to M.2 cable.

As I understand it, the Optane drives are much larger than you’d need for SLOG or Metadata. My thoughts revolve around the limited number of M.2 slots. I have four. That’s why I thought it might make sense to partition the Optane drives for use for two or more purposes, and the reason for wanting to get the metadata drive size correct on the first try. There’s enough extra space, a third partition could support an L2ARC. Can this work?

cowphrase · March 18, 2024, 11:37pm

IMO you have the wrong thought process here. When talking about ZFS and L2ARC, SLOG, Metdata special device, etc, you only want to include them if they’ll give you a massive benefit. And most of them you can include later. Right now I’m not convinced that you need any of them, so I’d suggest leaving them out. In a year or two when you know ZFS better you can evalutae if you need them from arc_summary output.

L2ARC is situational and you probably don’t need it, Metadata should really be done per drive since it becomes part of the array, and SLOG will kill most SSDs from writes so should be its own device.

I’ll be running several VMs, including SQL, Windows, and Linux.

Make the optane its own ZFS mirror pool and put your SQL drives on there. It’ll have far better performance.

jxdking · March 19, 2024, 12:52pm

This is false claim. All ZFS can see is a block device. ZFS doesn’t use SMART data at all. Thus, partitions will be just fine.

cowphrase · March 19, 2024, 1:06pm

You’re right, but using partitions leads to a more complex pool configuration, and you’re mixing different kinds of IO. The writes from SLOG may affect the reads from your metadata special device (though maybe not for optane).

Considering this is advice for setting up a first time ZFS array with no special requirements, I’d avoid using partitions.

I’ve dealt with a production ZFS system that killed its root SSD because someone decided to put an SLOG device on it as a partition. It resulted in one dead SSD and downtime for a rebuild, all for an SLOG that probably didn’t provide any speedup for its use case. (Not to mention the system had dedupe enabled with far too little ram, after 3 years it ran at a snails pace).

jxdking · March 19, 2024, 1:23pm

I will not suggest to put slog and root on the same drive. As you don’t really need large size for slog (slog won’t grow larger than you physical ram), you can put slog and cache on the same drive with partitions.

To the original post, if it is not mission critical, is fine to setup slog and cache without mirror.

slog is just a copy of unwritten data that is in the ram. Slog is never read under normal condition, and zfs just relies on ram. It is only read when the system is recovering from the unclean shutdown.
cache, if it fails, zfs can still get data from the disk. No mirror is needed.

risk · March 20, 2024, 12:30am

Hmmm how about

2x960G optane disks:

2x 32G partitions - mirror - for your ZIL / slog.
2x 800G partitions - “special” mirror for metadata and small files .

HDDs: main storage for the pool.

What’s the downside?