Virtualizing (virt-manager) a storage device with ZFS

ravenstargames · September 6, 2022, 8:41pm

So I noticed that you can have zfs host the virtual hard disk for vm’s, is there any specific benefits marked anywhere online of the benefits and detriments of doing this. I was specifically wondering if there are any benefits of having ram caching for those storage drives (speed and latency improvements/detriments).

I’m currently running Arch Linux with ZFS pools for both root (1tb NVMe Drive) and home (5tb spinning rust drive).

ulzeraj · September 7, 2022, 7:16am

I think you mean zvols. If a VM is its own zvol dataset it makes snapshoting really fast and easy.

You can use some ZFS properties like compression in a zvol even if its formatted with a different file system or has its own partition table. To be fair it would also work if you just use sparse files on top of a dataset.

I think its a matter of preference I guess.

When creating a zvol add -s to create a sparse volume. Not sure if you can overbook the pool tho.

Cheetobandit · September 7, 2022, 7:30am

Disregard the above comments. Using QCOW on a ZFS-backed VM drive is redundant and offers no gains, only performance losses. That is two layers of copy-on-write. You want to designate your VM storage as RAW instead. And deduplication is RARELY a benefit in any ZFS pools and is most definitely not if you are really running single drive pools on a consumer desktop. And what do you mean RAM caching for those storage drives? ZFS will actually use up more of your RAM than if you just used QCOW2 on EXT4 or whatever.

Log · September 8, 2022, 5:53pm

Conceptually, it’s reasonable to think ZVOLs should perform better for VM storage.

In practice, it’s much more complicated than that. One thing to note is that ZVOLs have been a bit neglected code-wise, and are waiting for some significant optimizations and fixes last I checked a year ago.

As current, some people find ZVOLs to be better, and others find even cow-on-cow with qcow2 on datasets to be better. Ultimately you’ll have to benchmark and see for yourself, as well as compare the real world experience. If you look online, you’ll find wildly different conclusions of one over the other.

For my home use VMs (as in, I’m not trying to deal with hundreds of the things in a professional environment) I personally prefer dealing with easily movable qcow2 files and datasets (with recordsizes limited), over the mandatory CLI management of otherwise invisible ZVOLs. It’s been a while but I never found a performance difference big enough to care about. And if I was lusting for all the MB/s I could get, I’d do raw files on datasets.

Exard3k · September 8, 2022, 9:14pm

I run all my VMs via zvols. Even for testing out stuff, I have zvols on my laptop. I really like all the management and administration within a ZFS dataset and I don’t need tweaking client-side/VM.

But:
Learn about volblocksize. Zvol configuration needs knowledge of the VM filesystem and sometimes also the underlying RAID configuration.

But having checksumming, compression, snapshots, overbooking, thin-provisioning/sparse and other ZFS features on your Windows NTFS VM, is really nice.

And zvols usually need more space than is actually stored because of block sizes and write amplification. So a ZVOL can get >100% capacity.

I create all my VM disks via ZFS shell commands and select /dev/zvol/poolname/zvolname in virt-manager. On my server, everything is handled in Proxmox via iSCSI shares.