I’ve built a NAS that currently has a few ZFS pools built, and I’m trying to figure out the best way to cache those pools.
This is an mITX system with the following drive configuration:
Samsung 860 Evo 250GB for OS (Ubuntu)
Samsung 970 Evo 500GB for caching
Three 8TB WD Red NAS drives
Two of those drives are in a ZFS RAIDZ, and the last is in a single ZFS pool (technically a “mirrored” config). I’ll refer to them as Pool 1 and Pool 2, respectively.
I’d like to cache both pools using the Samsung 970 Evo SSD, but I’m not sure what would provide the best performance gains. From what research I did, I read that:
ZIL / SLOG should be small, but fast with high IOPS
L2ARC should be large
So If I wanted to cache both pools pretty equally, I figure I might partition the SSD like so:
100GB ZIL / SLOG for Pool 1
60GB ZIL / SLOG for Pool 2
200GB L2ARC for Pool 1
140GB L2ARC for Pool 2
I should also mention the system has 32GB of system RAM, where I set the max ARC size to 26GB.
Hoping if anyone more familiar and experienced with ZFS and caching can help me out here.
Is there a reason you chose the setup you did with the HDD’s?
Because you might have boosted them but just having a 3 deep mirror, or a raidz with 3 drives (effectively 2 data, 1 parity, but 8tb’ take so long to resilver, might loose a second if a first pops)
Then you could have just used the 970 as a write cache, and read direct from the drives / arc.
Unless the pair of drives (which should be in a mirror rather than raidz, if only 2) is data you care about, and the other is just scratch / media you can recover later?
In which case, the single drive probably won’t need a log? or benefit from an l2arc?
I mean, sure, it’s possible to do what you say;
Zfs only likes using s device for a single pool, even if you split it up, but it will do what you want, but at hobbled speeds.
R/W interference may cripple the m.2 (970 is an m.2 nvme right?)
100GB ZIL / SLOG for Pool 1
60GB ZIL / SLOG for Pool 2
Those seem large. I’d recommend like an 8 or 16GB SLOG.
Yeah, that’s a seriously odd setup.
I’ve only recently looked into FS like ZFS and BTRFS, so I’m very much a noob at this.
I guess I could just put all of the drives into one pool and separate my data into separate folders.
So the config would end up being 2 drives usable + 1 for redundancy?
Is there a ratio for ZIL / SLOG and L2ARC size to pool / drive size? I couldn’t really find any definitive answers about how big to make any of them.
This is a much better solution. You want to have one large pool and use datasets to split things up.
It heavily depends on use case. I’ve got a use case that heavily leans WORM, so I have 480GB of L2ARC and 16GB of SLOG, in Optane.
You only need as much SLOG as ZFS will allow you to use. For example, ZFS will only cache a few seconds of writes in the log by default before waiting to flush some to disk.
can technically change this, it’s not recommended for newbies.
So then a follow-up question: I know for pretty much any filesystem that can combine multiple disks into one vdev, but how does ZFS deal with drives of mismatched sizes?
With my 3 8TB drives, what if I were to replace one with a 12TB drive in the future?
Or what if I add a drive of smaller size than 8TB? What are my options to have resiliency and more storage space?
It won’t accept a smaller drive.
A larger size will be ignored until the vdev is balance of same sized devices, then the system will allow you to go in and manually do a “expandz” up to the larger size.
This is a one way move, it doesn’t allow a shrink.
You can transfer from one pool to another though.
You can also add an identical composition vdev, even if a different capacity
That’s why the general consensus for pools is to use mirrors if you want to grow it over time; it is easier to add another mirror, or swap larger drives into an existing pool, one vdev at a time?
just my 2c
Please just read this. You really don’t need to mess with ZIL or SLOG at all to be honest.
If you’ve got 3x8TB in a RAIDZ (raid5), and you replace your first disk with a 12tb disk, it’ll only use 8TB until you replace the other two.
You currently cannot add a disk to an existing vdev. If you want to add more capacity, you need to add another entire RAIDZ vdev. (IE: 3 more disks, of identical size)
Sounds like you really do need to read the link
Here is what my pool looks like.
I run five , two-disk
mirror vdevs with 1 hot spare. This allows me to piece-meal upgrade my storage 2 drives at a time so I don’t have to shell out $$$ all at once. Furthermore, it allows maximum redundancy and performance at the cost of 1/2 my total raw storage capacity. Lastly, the biggest kicker is that if you want to have great performance out of your pool you need to have as many vdevs as you can muster. The biggest mistake that everyone seems to make is that they want to roll with
RAIDZX pool because it sounds cool.
[email protected] ~]# zpool status
scan: scrub repaired 0B in 2h7m with 0 errors on Sun Dec 1 04:07:20 2019
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
scsi-35000c500344075c7 ONLINE 0 0 0
scsi-35000c5003438fcbb ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
scsi-35000c50034290b9b ONLINE 0 0 0
scsi-35000c500341ea143 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
scsi-35000c500341e7f33 ONLINE 0 0 0
scsi-35000c50025c5279f ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
scsi-35000c50025c4f27f ONLINE 0 0 0
scsi-35000c50025c4f143 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
ata-HGST_HDN724030ALE640_PK2234P9HAK9WY ONLINE 0 0 0
ata-HGST_HDN724030ALE640_PK2234P9K1456Y ONLINE 0 0 0
Which can be calculated based on the max link bandwidth and time between flushing
This is one thing I miss in ZFS from BTRFS - I wish ZFS had a
copies=2 that would guarantee separate device placement. This would be equivalent to BTRFS
-m raid1 -d raid1
I wouldn’t bother with raidz until you get to 10/15 spinning disks total. Yes it can work with 3 and technically it’s cheaper per byte, but evolving the pool is such a hassle.
Ehhhhhhhhhh I disagree. I see your point though.
My number is 4 disks for raidz.