I have a question that may seem trivial to many of you - yet it gives me a lot of headache and I would highly appreciate your input
Summary: What’s the fastest way to share data from a ZFS on a Linux Host to Linux and Windows KVM guests?
Background:
Currently I have a small home server running ubuntu 20.04 with a 40TB RaidZ2 & dedup.
To use it more efficiently and reproducibly I want to use virtualisation. Here I have to kinds of systems to run: Ubuntu for my scientific work (PhD@home due to CoViD) and Windows for some of the things you sadly can do on Linux like run Adobe CC.
Using KVM/virt-manager I have figured out GPU passthrough, snapshots,etc. but storage access to the bulk storage on the raidz2 is still a little bit of a black box for me.
(also I do not want to make any huge mistakes and mess with the raidZ2 on which my scientific data is)
What I want to do is the following: Run the VMs off images on the boot SSD but access data on the RaidZ2 from within the VMs to give the Linux VM access to the scientific data for work and the Windows VM to images for Adobe CC.
Samba did work for both but the performance was horrible (not better than 1G ethernet even though the data was on the same machine as the VMs)
Did I make a (network?) configuration error or is Samba simply that slow?
What would you recommend as the best way to access the data in my setup?
I would much prefer to not squeeze it into an image as this would make data management much more cumbersome (the scientific data is about 18TB with dedup on – heck I do not even know if QEMU images can hold 18TB much less how they mix with dedupe. Performance with a raidz2 and dedupe Is slow even now )
Do the two approaches also work with windows as a guest?
Can I still access the data on the raidz2 from the host os or more importantly other VMs simultaniously?
(This question seems especially prominent with using zvols as virtual drives)
What level of performance are you expecting from your array and can it actually deliver it consistently?
What are the specs on your host system? Do you have at least 1gb ram per tb of array capacity? How many spindles are in this array? Do you have SSD caching setup? Undertaken any performance tuning in the OS or the array?
Network connection between host and guest using virtio is well over 10gbit and is pretty efficient. The SMB daemon itself under Linux is pretty fast these days, I have no trouble saturating a 10gig fiber connection with SMB reading or writing a flash array.
Doesn’t matter what storage protocol is used if the underlying hardware can’t perform. CIFS vs NFS vs iSCSI are application / workload specific solutions but don’t address the potential underlying problem.
If SMB is terrible, NFS and iSCSI will also likely be terrible.
Not a fan of either of these solutions for the “enterprise at home” crowd. Dedupe is so expensive and doesn’t offer the huge gains people think it does. Even Oracle and BSD recommend NOT using dedupe unless you have certain very specific use cases for it. Compression on the other hand, it’s almost free with a modern CPU and surprisingly effective.
Multi-redundant raid (rz2, rz3) is a crutch people use as they fall into a trap, forgetting “RAID is not BACKUP”
My take away is… if on ssd storage using mirrors and have 5GB of RAM per TB of storage… My experience was good with it. (having 1 thread per drive in every vdev is nice to have too)