ZFS Caching - ZIL / SLOG and L2ARC - one, the other, or both?

I’ve built a NAS that currently has a few ZFS pools built, and I’m trying to figure out the best way to cache those pools.

This is an mITX system with the following drive configuration:

  • Samsung 860 Evo 250GB for OS (Ubuntu)
  • Samsung 970 Evo 500GB for caching
  • Three 8TB WD Red NAS drives

Two of those drives are in a ZFS RAIDZ, and the last is in a single ZFS pool (technically a “mirrored” config). I’ll refer to them as Pool 1 and Pool 2, respectively.

I’d like to cache both pools using the Samsung 970 Evo SSD, but I’m not sure what would provide the best performance gains. From what research I did, I read that:

  1. ZIL / SLOG should be small, but fast with high IOPS
  2. L2ARC should be large

So If I wanted to cache both pools pretty equally, I figure I might partition the SSD like so:

  1. 100GB ZIL / SLOG for Pool 1
  2. 60GB ZIL / SLOG for Pool 2
  3. 200GB L2ARC for Pool 1
  4. 140GB L2ARC for Pool 2

I should also mention the system has 32GB of system RAM, where I set the max ARC size to 26GB.

Hoping if anyone more familiar and experienced with ZFS and caching can help me out here.

1 Like

Is there a reason you chose the setup you did with the HDD’s?

Because you might have boosted them but just having a 3 deep mirror, or a raidz with 3 drives (effectively 2 data, 1 parity, but 8tb’ take so long to resilver, might loose a second if a first pops)

Then you could have just used the 970 as a write cache, and read direct from the drives / arc.

Unless the pair of drives (which should be in a mirror rather than raidz, if only 2) is data you care about, and the other is just scratch / media you can recover later?
In which case, the single drive probably won’t need a log? or benefit from an l2arc?

I mean, sure, it’s possible to do what you say;

Zfs only likes using s device for a single pool, even if you split it up, but it will do what you want, but at hobbled speeds.

R/W interference may cripple the m.2 (970 is an m.2 nvme right?)

Those seem large. I’d recommend like an 8 or 16GB SLOG.

Yeah, that’s a seriously odd setup.

I’ve only recently looked into FS like ZFS and BTRFS, so I’m very much a noob at this.

I guess I could just put all of the drives into one pool and separate my data into separate folders.
So the config would end up being 2 drives usable + 1 for redundancy?

Is there a ratio for ZIL / SLOG and L2ARC size to pool / drive size? I couldn’t really find any definitive answers about how big to make any of them.

This is a much better solution. You want to have one large pool and use datasets to split things up.

It heavily depends on use case. I’ve got a use case that heavily leans WORM, so I have 480GB of L2ARC and 16GB of SLOG, in Optane.

You only need as much SLOG as ZFS will allow you to use. For example, ZFS will only cache a few seconds of writes in the log by default before waiting to flush some to disk.

While you can technically change this, it’s not recommended for newbies.

So then a follow-up question: I know for pretty much any filesystem that can combine multiple disks into one vdev, but how does ZFS deal with drives of mismatched sizes?
With my 3 8TB drives, what if I were to replace one with a 12TB drive in the future?
Or what if I add a drive of smaller size than 8TB? What are my options to have resiliency and more storage space?

It won’t accept a smaller drive.

A larger size will be ignored until the vdev is balance of same sized devices, then the system will allow you to go in and manually do a “expandz” up to the larger size.

This is a one way move, it doesn’t allow a shrink.

You can transfer from one pool to another though.
You can also add an identical composition vdev, even if a different capacity

That’s why the general consensus for pools is to use mirrors if you want to grow it over time; it is easier to add another mirror, or swap larger drives into an existing pool, one vdev at a time?

just my 2c

Please just read this. You really don’t need to mess with ZIL or SLOG at all to be honest.

https://jrs-s.net/2018/04/11/primer-how-data-is-stored-on-disk-with-zfs/

1 Like

If you’ve got 3x8TB in a RAIDZ (raid5), and you replace your first disk with a 12tb disk, it’ll only use 8TB until you replace the other two.

You currently cannot add a disk to an existing vdev. If you want to add more capacity, you need to add another entire RAIDZ vdev. (IE: 3 more disks, of identical size)


Sounds like you really do need to read the link @Dynamic_Gravity posted.

Here is what my pool looks like.

I run five , two-disk mirror vdevs with 1 hot spare. This allows me to piece-meal upgrade my storage 2 drives at a time so I don’t have to shell out $$$ all at once. Furthermore, it allows maximum redundancy and performance at the cost of 1/2 my total raw storage capacity. Lastly, the biggest kicker is that if you want to have great performance out of your pool you need to have as many vdevs as you can muster. The biggest mistake that everyone seems to make is that they want to roll with RAIDZX pool because it sounds cool.

[root@sol ~]# zpool status
  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 2h7m with 0 errors on Sun Dec  1 04:07:20 2019
config:

        NAME                                         STATE     READ WRITE CKSUM
        tank                                         ONLINE       0     0     0
          mirror-0                                   ONLINE       0     0     0
            scsi-35000c500344075c7                   ONLINE       0     0     0
            scsi-35000c5003438fcbb                   ONLINE       0     0     0
          mirror-1                                   ONLINE       0     0     0
            scsi-35000c50034290b9b                   ONLINE       0     0     0
            scsi-35000c500341ea143                   ONLINE       0     0     0
          mirror-2                                   ONLINE       0     0     0
            scsi-35000c500341e7f33                   ONLINE       0     0     0
            scsi-35000c50025c5279f                   ONLINE       0     0     0
          mirror-3                                   ONLINE       0     0     0
            scsi-35000c50025c4f27f                   ONLINE       0     0     0
            scsi-35000c50025c4f143                   ONLINE       0     0     0
          mirror-4                                   ONLINE       0     0     0
            ata-HGST_HDN724030ALE640_PK2234P9HAK9WY  ONLINE       0     0     0
            ata-HGST_HDN724030ALE640_PK2234P9K1456Y  ONLINE       0     0     0
        spares
          scsi-35000c500103c591b                     AVAIL
1 Like

Which can be calculated based on the max link bandwidth and time between flushing

1 Like

This is one thing I miss in ZFS from BTRFS - I wish ZFS had a copies=2 that would guarantee separate device placement. This would be equivalent to BTRFS -m raid1 -d raid1

I wouldn’t bother with raidz until you get to 10/15 spinning disks total. Yes it can work with 3 and technically it’s cheaper per byte, but evolving the pool is such a hassle.

1 Like

Ehhhhhhhhhh I disagree. I see your point though.

My number is 4 disks for raidz.

1 Like