ZFS slow NVME mirror?

Hi

I have two nvme samsung 990 pro - My setup was to try and avoid my spinning rust spinning up as much as possible, so I partitioned these 50/50 on both drives and assigned partition 1 of each to a special device on the main pool and then created a second pool with partition 2

Now; this is working nicely to some degree, and my drives no longer spin up unless needed (90%) of the time. I have another post open about my previous setup using JUST a special device and it for some reason still writing to rust (special set lower than recordsize)

In this new config; I seem to be maxing out write speeds at around 600mb/s, which seems low for these drives? sync - off, compression - off

Setting sync=always (Which I usually have on - I have optane slog on my spinning pool) reduces this to around 125mb/s.

Adding compression (zstd-3) on top of this averages around 90mb/s

Pools as follows

NAME                                                    STATE     READ WRITE CKSUM
        Fast                                                    ONLINE       0     0     0
          mirror-0                                              ONLINE       0     0     0
            nvme-Samsung_SSD_990_PRO_1TB_S6Z1NF0WB12620F-part2  ONLINE       0     0     0
            nvme-Samsung_SSD_990_PRO_1TB_S6Z1NJ0W333776L-part2  ONLINE       0     0     0

NAME                                                    STATE     READ WRITE CKSUM
        NAS                                                     ONLINE       0     0     0
          raidz1-0                                              ONLINE       0     0     0
            ata-WDC_WD10EURX-63C57Y0_WD-WCC4JCC6KRU9            ONLINE       0     0     0
            ata-WDC_WD10EURX-63C57Y0_WD-WCC4JL4W41VE            ONLINE       0     0     0
            ata-WDC_WD10EURX-83UY4Y0_WD-WCC4J3LCTDFA            ONLINE       0     0     0
            ata-WDC_WD10EZRX-00A3KB0_WD-WCC4J3EXK40P            ONLINE       0     0     0
        special
          mirror-1                                              ONLINE       0     0     0
            nvme-Samsung_SSD_990_PRO_1TB_S6Z1NF0WB12620F-part1  ONLINE       0     0     0
            nvme-Samsung_SSD_990_PRO_1TB_S6Z1NJ0W333776L-part1  ONLINE       0     0     0
        logs
          nvme-INTEL_SSDPE21D280GA_PHM2749000VL280AGN           ONLINE       0     0     0

Is this kind of speed expected? A downside of partitioning vs whole drive?

Are you referring to writing to “Fast” or to “NAS”?

Again - which pool?

ZFS was designed to scale out HDD based storage and overcome reliability and performance challenges.
The design assumptions (e.g. 10+ms latency for accessing HDDs) don’t apply to NAND-only pools and therefore zfs is grappling with migration into that space.

Maxing out 600mb/s on Fast - With sync off and compression off

The zfs design centers on the observation that a CPU can perform millions of operations while waiting for a single HDD op.
That allows performing complex calculations seemingly without performance penalty.

When using nvme devices, there is very little time to perform extra calculations.
You’ll find that to achieve reasonable performance on a zfs pool of nvme devices you’ll have to turn off a lot of zfs functionality.

This means you need to ask yourself why exactly you want to use zfs on nvme devices (what features are important). Have you explored alternatives (mdraid + xfs/ext4, btrfs, …)?

I agree that 600mb/s write speeds are just sad - unless there is no better alternative :slight_smile:

I am totally on board with using ZFS lol. It adds simplicity to the backup I have, compression and redundancy

I am just unsure if this is a speed I should be expecting. 90Mbs with the features that I want turned on is even more sad!

I was also unsure if partitioning had some effect on this

FYI adding my SLOG to the NVME Fast Mirror does increase the speed to around 300mb/s sync/zstd3

Still not brilliant for nvme

On NVME pool, you don’t need SLOG.
I suggest you put “logbias=throughput” on the NVME pool and leave sync=standard.

This is what I thought…But…

SLOG optane appears to increase sync writes from 90 > 300 mb/s

Without sync I am getting around 550>600…Same with logbias, as above…Still seems low

How about read performance from the nvme only pool?

BTW, partition should work fine. Just realize that if that drive fails, it may affect multiple pools.

Read performance is about the same. Around 600mb/s

Yep, so I partitioned so I would have to replace the partitions in both pools

Do you use ZFS native encryption?

Encryption is off - All filesystems

Have you tested the drives without ZFS? Have you made sure the bandwidth to the drives is not a bottleneck? I would make sure it is not before going into the settings of ZFS. Weird things happen from time to time and maybe you have a bad link or bad controller.

Hmmm…I just formatted one of the special partitions to ext4 and am only getting around the same speed (Actually a bit lower, 550mb ish)

I have one of these asus hyper…Maybe that is causing bottlenecks

1 Like

Yeah I was going to ask about how it is connected. But if you’re using it on an x16 slot with a relatively modern motherboard you should have miles of headroom. You know the slot is actually wired for x16 or have you checked the specs?

I would also imagine if these are mirrored then the effective write speed should actually be 2*600mb/s no? for 1.2GB/s total? Still wayyyyy below spec.

Exactly. It is in a x16 slot, for sure. x570 Prime pro, 5600x.

Crazy low

For some complexity, these are passed through, from proxmox - but lspci tells me

LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <64us

Which I assume is 4 lanes of gen 4 pcie

Does this seem relevant (I know it’s a different board)? One of the x16 slots on your board is definitely wired for x4 (it’s running on the chipset rather than the CPU I think?). If you have an old enough processor this could actually be pcie 3…

https://www.reddit.com/r/Amd/comments/gn4zwa/question_pcie_bifurcation_pointless_on_x570_for/

Asus specs will tell you how all your slots are wired based on the series of processor you have installed.

Test them not passed through to eliminate that as the bottleneck. Also, what are you using to test? I can usually get 1-2GB/s sequential writes and 2-3 GB/s sequential reads on a zfs nvme mirror using fio with large record sizes.

I’m just moving a big mp4 with rsync. To and from a ramdisk on my virtual machine