ZFS Metadata On New Server

Arimil · January 23, 2023, 8:27am

Hey guys, I’m looking to setup a home server using TrueNAS Scale in the next 2 days, this is migrating my current server. Here’s the parts I have:

Still waiting on the RAM and I don’t need the 10G networking atm. However I’m concerned about my setup using a metadata cache. I was intending for one of the optane drives to be a boot drive for TrueNAS but it seems like these would be better for a metadata cache at this point. The SN850 was intended to be a cache for the ZFS array in general. I’m not even sure if that’s how this works? I might be getting things mixed up and think I may need an additional SSD for a boot drive.

2x Optane for Metadata cache
1x SN850 for cache
1x boot drive?

I also wanted to have an SSD for fast storage that is external to the ZFS pool, which means now I’m completely screwed and need an addon card. Does anyone see any mistakes above? Want to make sure I’m planning this out correctly since ideally this is up and running by this weekend.

PikachuEXE · January 23, 2023, 2:31pm

Boot drive can be small old SSD AFAIK.

I have no idea what metadata cache is.
There can be special vdevs storing metadata and small block files only but those are not caches (losing those would break the pool).
There can also be L2ARC which only contains data blocks evicted for ARC (RAM) but metadata remains in RAM.

Read OpenZFS - The Final Word in File Systems about ZFS basics and watch ZFS Metadata: Special Device And You! - YouTube about special device for metadata.

Arimil · January 23, 2023, 6:13pm

I have no idea what metadata cache is.
There can be special vdevs storing metadata and small block files only but those are not caches (losing those would break the pool).

I believe that’s what I’m referring to:
ZFS Metadata Special Device: Z there is a thread about it here, the way I understand it is a directory of the ZFS array is stored on the SSD to speed up browsing.

But that would take two of the 3 m.2 connections (assuming I put the Optane in a mirror like suggested above) on the board which doesn’t leave space for both a cache and a boot drive.

PikachuEXE · January 24, 2023, 12:35am

Beware that that special device is not a metadata cache.
It’s a pool critical device which is lost, would render the whole pool useless.
So that such special vdev should be setup as at least a mirror.

Though unless you already know what kind of op and performance you would have, better monitor with default/minimum setup first and add those special devices later.
(You can still connect the drives but not add them as vdevs yet)

Arimil · January 24, 2023, 4:46am

Alright, since it seems you can do this at any time I’ll set it up without the metadata layer, I’ll just use one of the optane I have as the boot drive for now and get some more in the future when I better understand how ZFS works and how that metadata layer works.

I think a mirror of optane should be sufficient, once I do decide to set it up since their incredibly low failure rates.

PikachuEXE · January 24, 2023, 1:44pm

I think any cheap/old SSD can be used as boot drive, even SATA one

FunnyPossum · January 24, 2023, 2:50pm

I’d just buy 2 x Cheap new SSD’s, a 120GB SSD brand new is like $15 now, cheaper than an SD card

I setup my pool with 2 x Intel DC S3700 800GB SSD’s for Metadata, sure makes the whole array feel snappy!

Arimil · January 25, 2023, 4:35am

I bought those Optane drives to be boot drives, they only have 64GB of storage, so not much use for something else.

Arimil · January 25, 2023, 4:37am

Do you need that much storage for metadata? I was reading that it was ~1GB per TB in the array. Surely you don’t have 800TB of storage.

FunnyPossum · January 25, 2023, 5:06am

No, but they are really durable SSD’s and I already had them. SSD’s are so cheap now it wasn’t really worth ditching them for newer smaller SSD’s

Exard3k · January 25, 2023, 7:36am

You can also store small files on the special vdev. I personally have millions of them and it speeds up the pool far more than the metadata (which is mostly stored in ARC anyway). So a TB of special vdev has it’s value, my vdev has 420GB allocated as of yesterday, 70GB being metadata.

Arimil · January 25, 2023, 10:42am

Can you configure a dedicated cache drive as well as a special vdev, as separate entities? I was under the impression you could.

Exard3k · January 25, 2023, 10:44am

Sure. L2ARC (cache) is totally seperate from special vdev. I have both running on my server. 2TB L2ARC, 1TB special vdev.

PikachuEXE · January 25, 2023, 11:51am

Ensure you have redundancy for the special vdev…
Losing L2ARC is fine, losing special vdev isn’t…

What drives are you using?

Exard3k · January 25, 2023, 11:53am

cheap consumer NVMe drives. 1x2TB for L2ARC and a mirror of 1TB drives (all three Mushkin PCIe 3.0 drives)

PikachuEXE · January 25, 2023, 11:56am

Gonna use some old SATA SSDs for L2ARC
Will add some nvme consumer SSDs as special vdev if needed

Exard3k · January 25, 2023, 11:58am

You want to use NVMe on L2ARC and use the SATA SSDs for special. Special doesn’t need much bandwidth as it’s mostly about fast access time. But L2 you want bandwidth because that’s housing actual large user data

PikachuEXE · January 26, 2023, 2:03am

But L2ARC might be rarely used (depends on actual L2ARC hit ratio, which largely depends on system setup and workload)
Metadata and small block files might require more performance (IOPS & latency)

Random Reddit post about SATA vs NVMe SSDs

Exard3k · January 26, 2023, 8:59am

L2ARC is always in use unless you access only data that is within the ARC. Thus should be the fastest device. special and data vdevs only come into play if ARC+L2ARC don’t have the data stored.

Also don’t get fooled by the hit rate statistics from ZFS. The calculations are wierd and misleading at best. ARC hit rate of 99.6% meant nothing when I was reading 150GB of audio files yesterday. Everything was pulled right from L2ARC and my special and HDDs didn’t get a single IOP. L2ARC hit rate stated 15% or 20% or so. It means nothing.

PikachuEXE · January 27, 2023, 12:56am

Well how do I measure how much IO/data is read from L2ARC though