Thoughts about correct/incorrect usage of ZFS over iSCSI

(post deleted by author)

In before “help, my metadata is eating my NAS”.

Windows and Linux both have a basic pagecache, don’t they cache user data from network shares? Never payed attention to that. It’s not ARC, sure. But every OS puts memory to good use. And you still have network speed if cached on the NAS.

I find NFS/SMB to be easier to use than Zvols, which have several drawbacks like metadata, >100% capacity utilization depending on pool and block size or not being able to use special vdevs for small files. And we didn’t cover sharing a LUN between several (concurrent) machines yet.

And then there is the question what blocksize you want to use for an all-purpose 10T disk. Hard to optimize.

I suggest you just create a zvol with -s tag to make it sparse/thin provisioning and see how everything works and performs. I like Zvols for VMs, but if they need proper storage, I use NFS shares.

(post deleted by author)

When I first setup my storage server, i was using fileio with targetd on zfs… but thats because at the time I saw better performance with files on zfs datasets.

Once zvol performance improved with targetd , i switched over.

Ive been using zvols with iscsi via targetd the last few years.

I guess I should reexplore nfs vs iscsi, but havent had a need yet.

I only use it for homelab… so its fine.

1 Like

Well, with Zvols you will get write amplification with blocksize larger than your files. This is not like datasets, where recordsize defines the largest amount of a record, but adaptively uses smaller records for small files.
So you probably don’t want large blocksize if using a variety of different file sizes. But compression works better with higher blocksize, which mitigates some of the wasted space. Write amplification also takes a heavy toll of performance.

Low blocksize on the other hand amplifies metadata by quite a bit. Expect >100GB of metadata for a full 8k 10T zvol. If you can cache that amount, all is fine.

1 Like

You overestimate how much data a VM uses. While it’s nothing to brag about, I had 2x 48 switches connected via a single 10G pipe (I know, risky, I inherited it, but I also never took the time to set the second 10G pipe in an active-standby configuration), connected to hp proliant micro-servers with 2x 1Gbps each in balance-alb and to my proxmox hosts, each with 2x 1Gbps balance-alb. Each host was running between 20 and 50 VMs. Most have 32 or 64GB disk images (qcow2 on NFS), with some having second disks with 100 to 300GB, with under 10 VMs going over 500GB.

Had about 10 hosts and 10 or so NASes, of which around 7 were micro-servers. No server had more than 2x 1Gbps NICs. The NASes were running ext4 on RAID10 md arrays. Now, the 3 other NASes were the bulk storage ones, a Dell PowerVault, an Intel something or another server board and 2 HP ProLiant DL380s (I think one gen7, one gen8). These were the heavy lifters.

For a home setup, I think going with small ZVOLs and passing them via iSCSI should be fine. I’m pretty sure iSCSI should be lighter weight than qcow2 vdisks on NFS, since you kinda take the virtualization out of the equation and you work with straight block devices.

A 1Gbps pipe should get you for at least 10 VMs, likely more. VMs, if they aren’t pushed, don’t really do much, they barely transfer a few KB/s. And if you are insane enough and want to go with diskless / ramdisk installs, you can run as many VMs as you’d like.

That said, it’s probably lighter if you go with netboot and rootfs on NFS (classic diskless) than with iSCSI on ZFS. But then, you may have some headaches with root on NFS, like I get when updating my system, anything that has small files seems to move with the sped of turned off light. Which is why a ramdisk install, the frugal install of Alpine, is very enticing, however, I do not know how to replicate that with Void. I’m stuck with root on NFS for now, but maybe I should use iSCSI, should not be too hard.

[oddmin@bikyhc4 ~]$ df -h
Filesystem                                     Size  Used Avail Use% Mounted on
udev                                           1.9G     0  1.9G   0% /dev
tmpfs                                          372M  1.3M  370M   1% /run
192.168.150.120:/data/nfs/hc4/rootfs  916G  355G  515G  41% /
shm                                            1.9G     0  1.9G   0% /dev/shm
cgroup                                         1.9G     0  1.9G   0% /sys/fs/cgroup
tmpfs                                          1.9G     0  1.9G   0% /tmp

vs Alpine frugal install:

bikypi3:~# df -h
Filesystem                Size      Used Available Use% Mounted on
devtmpfs                 10.0M         0     10.0M   0% /dev
shm                     455.8M         0    455.8M   0% /dev/shm
/dev/mmcblk0p1          511.0M    125.9M    385.0M  25% /media/mmcblk0p1
/dev/mmcblk0p2            1.3G     18.6M      1.2G   2% /media/mmcblk0p2
tmpfs                   455.8M     58.4M    397.4M  13% /
tmpfs                   182.3M    184.0K    182.2M   0% /run
/dev/loop0               26.0M     26.0M         0 100% /.modloop

There’s something about that modloop that I haven’t figured out. But I’d have to really strip down void for that, a base install would not fit in the 4GB of RAM of the HC4. And then I’d need RAM to run stuff too…

bikypi2# du -sh hc4/rootfs/
3.8G    hc4/rootfs/

tl;dr
VMs don’t use a lot of network bandwidth when they are idle, so iSCSI on small ZFS ZVOLs should be just fine. To save on network bandwidth anyway, and on system resource consumption, Alpine frugal and diskless install with lbu backup location on NFS and tftp boot should give you way less network traffic and bragging rights for having that setup.