Linux + ZFS + NVME

Open question: any performance tips for NVME M.2 drives (PCIE x4) on linux, using ZFS?

I am thinking queue depths, concurrent IO, etc.

Tips as in configuration parameters for the kernel as well as ZFS.

Multiple m.2 drives? So youā€™re using NVME for the core of the zpool, not ZIL or L2ARC?

Tell me more.

I have 2 NVME drives in a mirrored pool for the system drive (/ and /boot), VM boot volumes (no zvol, zfs dataset with vmdk, I use it for desktop based VMs for work, Windows flavors).

Iā€™m just wondering if I can set configuration parameters for the pure VDEV, as the NVME architecture differs quite a bit from SATA.

1 Like

Just looking at the basics. The drives are Samsung 950s (consumer series, not pro).

Well thatā€™s pretty slick. I can tell you that, at least in Proxmox, VMs tend to operate using zvols with an 8k record size. The logbias setting is generally set to latency, but since youā€™re on NVMEā€™s you might try throughput and see how that works for you.

Inside of a Windows VM, you might test these using Crystal Disk Mark. Outside of Windows and in Linux-land, you might consider

fio --name=random-writers --ioengine=sync --iodepth=4 --rw=randwrite --bs=4k --direct=0 --size=256m --numjobs=16

Making sure that youā€™re hitting disk, of course, and not just testing the speed of your memory. Speaking of cache, if youā€™re using VMs, you can claw back some of your memory from ZFS by setting the ARC max to like 2GB or 4GB. Generally your guest VM will do caching of its own.

Iā€™ll go over my list of ZFS tuning options and see what I can find. Most of my tuning has spinning disks in mind. We may need to alter it for godawful fast SSDs.

Yeah obviously I canā€™t complain but hey, weā€™re on this forum to check and learn if we are getting the most out of it right?

I went for this setup because I use the VM desktop a LOT for work (weā€™re tied to W10 here), so I didnā€™t want to be annoyed too often waiting for spinning rust (especially since W10 is write-heavy and Iā€™m using mirrors). So I went for a dual M.2 slot MB and bought lower capacity NVME drives for the system drives. I also have some ā€œscrapā€ space on it for temp storage (e.g. organizing photos etc.) and I reserved some space for SLOG for the spinning rust pools. Quite frankly, my workload does not benefit a lot from that.

If ZFS would implement a 100% tiered approach with some sort of ā€˜staged IOā€™ where I could decide to force incoming writes to SSD or M.2, that would be fun. I hear that might be in the pipeline.

Man I love technology. Iā€™ll report back on those tests.

2 Likes