Fast acess to data on ZFS for Windows and Linux KVM guest

igoodman · July 29, 2020, 12:18pm

Dear forum,

I have a question that may seem trivial to many of you - yet it gives me a lot of headache and I would highly appreciate your input

Summary: What’s the fastest way to share data from a ZFS on a Linux Host to Linux and Windows KVM guests?

Background:
Currently I have a small home server running ubuntu 20.04 with a 40TB RaidZ2 & dedup.

To use it more efficiently and reproducibly I want to use virtualisation. Here I have to kinds of systems to run: Ubuntu for my scientific work (PhD@home due to CoViD) and Windows for some of the things you sadly can do on Linux like run Adobe CC.
Using KVM/virt-manager I have figured out GPU passthrough, snapshots,etc. but storage access to the bulk storage on the raidz2 is still a little bit of a black box for me.
(also I do not want to make any huge mistakes and mess with the raidZ2 on which my scientific data is)

What I want to do is the following: Run the VMs off images on the boot SSD but access data on the RaidZ2 from within the VMs to give the Linux VM access to the scientific data for work and the Windows VM to images for Adobe CC.

Samba did work for both but the performance was horrible (not better than 1G ethernet even though the data was on the same machine as the VMs)
Did I make a (network?) configuration error or is Samba simply that slow?

What would you recommend as the best way to access the data in my setup?

Thanks a lot!

nx2l · July 29, 2020, 12:56pm

could use image files (e.g. qcow) on a zfs dataset

could create zvols and just directly use them as virtual drives

could setup iscsi with zvols from the zpool and assess them from inside the VM.

igoodman · July 29, 2020, 1:58pm

@nx2l Thank you for your reply!

I would much prefer to not squeeze it into an image as this would make data management much more cumbersome (the scientific data is about 18TB with dedup on – heck I do not even know if QEMU images can hold 18TB much less how they mix with dedupe. Performance with a raidz2 and dedupe Is slow even now )

Do the two approaches also work with windows as a guest?

Can I still access the data on the raidz2 from the host os or more importantly other VMs simultaniously?
(This question seems especially prominent with using zvols as virtual drives)

What method are you using?

nx2l · July 29, 2020, 2:01pm

if you arent using a filesystem that supports clustering, you will cause corruption from more than one system accessing it at a time.

gordonthree · July 29, 2020, 2:36pm

What level of performance are you expecting from your array and can it actually deliver it consistently?

What are the specs on your host system? Do you have at least 1gb ram per tb of array capacity? How many spindles are in this array? Do you have SSD caching setup? Undertaken any performance tuning in the OS or the array?

Network connection between host and guest using virtio is well over 10gbit and is pretty efficient. The SMB daemon itself under Linux is pretty fast these days, I have no trouble saturating a 10gig fiber connection with SMB reading or writing a flash array.

Dynamic_Gravity · July 29, 2020, 3:37pm

Agreed.

OP should probably just set up an NFSv4 mount, see if it does the job. If not then the iscsi thing.

gordonthree · July 29, 2020, 3:40pm

Doesn’t matter what storage protocol is used if the underlying hardware can’t perform. CIFS vs NFS vs iSCSI are application / workload specific solutions but don’t address the potential underlying problem.

If SMB is terrible, NFS and iSCSI will also likely be terrible.

Dynamic_Gravity · July 29, 2020, 3:42pm

His perf is going to suck anyway because he’s using RZ2 and dedupe.

Without a full list of his system and drive specs we can’t really see much like you said.

gordonthree · July 30, 2020, 12:46am

Not a fan of either of these solutions for the “enterprise at home” crowd. Dedupe is so expensive and doesn’t offer the huge gains people think it does. Even Oracle and BSD recommend NOT using dedupe unless you have certain very specific use cases for it. Compression on the other hand, it’s almost free with a modern CPU and surprisingly effective.

Multi-redundant raid (rz2, rz3) is a crutch people use as they fall into a trap, forgetting “RAID is not BACKUP”

nx2l · July 30, 2020, 12:55am

Ive used dedup lightly.

My take away is… if on ssd storage using mirrors and have 5GB of RAM per TB of storage… My experience was good with it. (having 1 thread per drive in every vdev is nice to have too)

freqlabs · July 30, 2020, 4:38am

Samba on FreeNAS can certainly do over a gigabit, dunno about Linux though.

TheCakeIsNaOH · July 30, 2020, 1:57pm

Linux samba can also definitely do over a gigabit, I have seen at least 2.5, and that is on spinning rust.