Advice on how to best utilize my ZFS pool

shoddyguard · February 7, 2019, 2:23am

Hi there,
Long time fan of L1T, first time poster

I have 40TB of a nice fresh zpool waiting to be used, but I’m not sure on how to use it given my unique setup.

This is a Debian based machine which acts as a hypervisor using KVM for a bunch of VM’s, one of which is my Windows Domain Controller as I run a domain at home.

I was thinking of creating a massive virtual HDD in the zpool and having Windows handle all the file sharing and then I can also take advantage of Windows’ nice block level dedupe but I’m not sure if I’ll take a massive performance penalty by doing this or if this presents data integrity problems? (I’m very new to ZFS)

The Debian machine is domain joined so I could use the ZFS storage bare metal, as it were, and use SAMBA to make it available to Windows.

Most of the data that will be going on here are rips of my BluRays which my family will stream via Plex so decent performance is top of my list.

Any advice is greatly appreciated, and I’m happy to go with the flow if I’m missing a better solution?

Many thanks
Steve

oO.o · February 7, 2019, 2:47am

Can you give us more specs?

CPU, memory, drive config, gpu (if any).

How many VMs running which OS’s?

Are you running plex in a vm? What OS?

I’m not sure if I’d recommend this for a very large pool, but I’m also not a Windows guy, so maybe it’s not terrible. The way you want to do it is to create a zvol which you’ll use as a drive device for your vm.

As far as dedup goes, if you have enough RAM, deduping your VM’s OS drives can make sense because there’s a lot of redundancy there, but for mass storage for SMB sharing, maybe WIndows dedup is better (I have no idea).

shoddyguard · February 7, 2019, 3:12am

Apologies I should have included that first >.<

Specs:

Processor E5-1620v3 (3.5Ghz)
Ram 128GB Kingston ECC DDR4
GPU’s 1x AMD…something (basic GFX for Debian), 1x nVidia Quadro P600 (GPU Decoder for Plex).
SSD’s: 2x 1TB Intel nVme Drives (Host Boot, Storage, SLOG, L2ARC)
HDD’s: 12x 4TB drives (Raid z3)

VM’s:

Nginx - uses almost no resources, reverse proxy mostly. (CentOS based)
Windows Server 2016 - Domain controller, DHCP, DNS etc (the most beefy of the VM’s, consumes around 16GB of RAM under load)
Ubuntu 18.04 - Plex, mostly uses nVidia GPU (passed through to the VM from the host) except for older Hi10p stuff where it nabs a lot of the CPU.
Windows Server 2019 - mostly testing, off most of the time.

This zpool is completely empty at the moment so I can destroy it and rebuild in a different format if needed.

I considered mirrors as I heard they are better for sequential files such as SQL or VM’s but my current server is using around 24TB so if I was to use mirrors then I wouldn’t actually have enough storage
(it’s impossible to add more as the case only has space for 13 3.5" drives).

So I’m looking to get the best performance out of the drives I currently have.

At the moment I have the VM’s boot drives all on the nVme drives so that part is fine.

I don’t care about dedupe really and wasn’t going to use it in ZFS as I heard it was a resource hog and I think the RAM is better suited going to VM’s and such however if I was going to be passing the storage to Windows then I would use the builtin dedupe there as it’s actually pretty decent while not being a resource hog.

oO.o · February 7, 2019, 5:18am

I’ll let someone more familiar with Windows weigh in on how to deal with the smb shares, but here are my thoughts fwiw:

Experiment with gzip for your media library. If your processor isn’t too bogged down by VM workload, you have the clock speed to deal with more compression than just lzw.
If you can afford to lose 4TB of storage, go for 2x raidz2 instead of 1 raidz3.

thro · February 7, 2019, 6:12am

De-dupe is not free (irrespective of the platform doing it). ZFS has a performance hit to do it in-line, live. Not sure how windows de-dupe works but the way any of them work is to maintain a hash table and either do it off-line (maintenance task over night or whatever) or inline (zfs way). Inline (ZFS) requires a lot of RAM to hold the de dupe table or performance tanks (that’s where the 1-2 GB RAM per TB that is so often mis-quoted comes from - for turning on de dupe. without de dupe RAM requirement isn’t anything serious). Offline (like a netapp, and probably/possibly windows if it is not doing it inline) … well, its a scheduled task during which your performance will tank.
lz4 Compression on ZFS is “almost free” with a half-decent CPU these days. I noticed no performance hit turning it on with a 4 drive striped mirror on a shitty little N54L AMD Turion based box. Your CPU is … WAY faster than that thing.

I wouldn’t bother with a windows VM, i’d just turn on lz4 compression and share it out directly from the zfs box.

Bringing windows into the picture imho is just un-necessary overhead and complexity.

You will pay some level of performance/resource penalty for de-dupe whether or not you are doing it with windows or ZFS.

For a single/small number of users storage box, you may be surprised at just how much de-dupe does not save. ZFS has the ability to give you an estimate as to how much space de-dupe will save on a pool before you turn it on. I’d put your data on it, run that, and be dismayed at how little you’d save before going trying to chase de-duplication savings that probably do not exist.

edit:
for VMs… unless you’re doing VDI or something, most of the VM (usually) will be stuff other than the shared windows files. You can already share most of the base OS with templates and thin clones a lot of the time. So de-dupe won’t be the massive win you may initially think of for general lab type VM storage, instead of just using thin clones/templates.

oO.o · February 7, 2019, 6:25am

lz4 is a no-brainer as a default and can easily save you 20% capacity or more, but with a 3ghz+ clock on your processor, you can start looking at gzip for large sequential data.

lz4 is so inexpensive that you can actually see performance increases because the time saved writing compressed data to disk is greater than the time spent compressing the data.

Agree 100% here. I only suggested dedup for VM OS storage because there is inherently a lot of duplication there. However, with only 4 VM’s, it’s not really an issue. You should definitely stick with lz4 and not gzip on the VM storage though.

thro · February 7, 2019, 6:27am

^ Agreed. edited my post above before i saw your comment on VM storage.

oO.o · February 7, 2019, 6:28am

That’s interesting, I didn’t realize templates shared base OS data, I assumed they just duplicated. Do you have any reading material on this?

thro · February 7, 2019, 6:31am

Depends on the VM software - and they DO diverge.

But for a lab… my process with VM workstation is to spin up an OS, patch it, sysprep it, snapshot it, turn into a vmware template.

Then clone from it with a vmware workstation thin clone and it only stores the diff from the base snapshot.

Different hypervisor software my differ, but if you’re doing that with workstation then sure, there may be duplicate patches, etc. in your VM template diffs (as time goes on and patch tuesdays happen), but you’re talking about a small amount vs. the OP’s 40TB ZFS pool… not worth the hassle to chase…

thro · February 7, 2019, 6:35am

You could also do that with ZFS provisioned iSCSI volumes - ZFS snapshot/clone them to create copies of them for other VMs - only the diff from the base disk will require more space in your pool.

oO.o · February 7, 2019, 6:36am

Ah, ok yes that makes sense. I wonder to what extent this is possible with kvm. I honestly have never made the time to fully explore vm templates even though they make a whole lot of sense.

One instance where I go full gzip-9 + dedup is on the ISO storage. If you’re keeping a copy of several linux/bsd/whatever ISO mirrors, then the space savings there are pretty great.

thro · February 7, 2019, 6:37am

As above with KVM (i too am thinking about it as i am trying to move my workflow to it), if its local VM on local disk, put your VM disk in its own data-set, snapshot the dataset, clone to new data-set… spin up new VM with that base disk in the new dataset… should work?

Not quite as elegant as vmware right-click thin-clone, but hey… free/open/etc.

e.g.

you may have a bunch of datasets (as datasets can be a child of another dataset)

/mnt/tank/VMs
/mnt/tank/VMs/foo-vm
/mnt/tank/VMs/foo2-vm

/mnt/tank/VMs is your KVM disk location

need a new clone?

snapshot /mnt/tank/VMs/foo-vm
clone to new dataset under /mnt/tank/VMs

edit:
one thing i’ve eventually learned with zfs is to think less in terms of folders, and more in terms of data-sets.

you can do cool things with datasets.

oO.o · February 7, 2019, 6:46am

Yeah, but then again, if you’re using kickstart or similar where the point of departure is really during install, then you can argue that it could make more sense to just dedupe. But not for OP with only a few VMs. If there were a dozen though…

Yeah, absolutely. Having everything in tank is a rookie mistake. Make a dataset/zvol for each use case. Also, if you are using dedupe, it is system-wide, so it will dedupe across all datasets that have dedup enabled. So if you have an ISO dataset and VM zvols, both with dedupe enabled, they can potentially be deduped against each other (more likely if the compression is the same though).

thro · February 7, 2019, 6:51am

Even more than that, rather than thinking about “do i need a new dataset” the mentality really needs to be “why shouldn’t i make this new folder a dataset”?

Its a total mindfuck coming from other file systems (and no guide i’ve seen seems to stress this enough?), but things like snapshots (and thus zfs send for backup, etc.) are per-dataset. I’d almost suggest that more often than not you really want a dataset rather than a folder in one.

But i can’t talk. I don’t have everything in tank at home, but i only have a few datasets. I should have more.

shoddyguard · February 7, 2019, 12:36pm

Thanks for the advice guys
I’m in the UK hence my delayed reply!

Cool, I’ll create some Samba shares then.

TBH Dedupe really isn’t a concern for me, it was more that if using ZFS to store VM volumes for shares was a good idea then I’d enable it in Windows cos it’s as good as free in terms of performance.

It’s been interesting reading your discussions on datasets, I’ve clearly got a bit more research to do yet!

I can absolutely afford that and I have just split my array into 2z Raidz2

2bitmarksman · February 8, 2019, 12:19am

ZFS does well enough for Samba sharing, but if you really want to use Windows for sharing data, present an iSCSI drive to the Windows VM and have it as your share drive.

oO.o · February 8, 2019, 1:08am

The windows vm is running on the nas so I think it makes more sense to just use a zvol.

thro · February 8, 2019, 1:12am

Also, BIG performance difference - if you’re running VMs on a dataset, be sure to turn off access time tracking.

Otherwise there’s a write to update access time for every read or write to a file… which on a datastore used for VM disks tanks things massively.

Not sure if that applies to iSCSI volumes or not, but couldn’t hurt to turn it off.

I think the ZFS option is “Atime”

https://prefetch.net/blog/index.php/2006/07/25/disabling-access-time-atime-updates-on-zfs-file-system/

re: datasets - just be aware that if you turn on deduplication on anything i’m pretty sure that it affects the entire pool and performance on the entire pool tanks if you don’t have enough RAM for the deduplication hashes for the entire pool. it’s a pool rebuild to turn it off IIRC.

SO definitely check the estimate for deduplication before playing unless you’re prepared to recreate your pool.

oO.o · February 8, 2019, 1:35am

Yeah, you want to use zvols for VMs, either directly for local ones or via iscsi for remote hypervisors. Using a file for vm storage isn’t ideal (except maybe in ESXi, but they designed a filesystem around that use case).

2bitmarksman · February 8, 2019, 2:02am

Oh, didn’t catch that. Yeah a zvol would be better in that case. Though I suppose I should have clarified that presenting an iSCSI drive to windows, I mean a seperate dataset and zvol for that particular drive/datastore