ZFS crypto engine vs. dm-crypt on ZVOL + XFS / single vs. multi thread?

(context Ubuntu Server 22.04 LTS, 4-wide raidz1 pool)

I have an encrypted ZFS dataset (aes-256-gcm), which exhibits rather low r/w throughput.

For the giggles, i’ve tested another approach to encrypting a portion of my zed-pool. I’ve created a thinly over-provisioned zvol and encrypted it via LUKS/cryptsetup(aes-xts-plain64 key=512) + XFS on top.

On a Core I3-9100F CPU, i’m observing about a 10x gain in r/w performance.

Observing htop, while cat-ing a large file, through pv, to /dev/null reveals the following two facts:

  1. the ZFS (on Linux) crypto engine must be single-threaded - only 1 core loaded
  2. the ZVOL-LUKS-XFS pipeline performed about 10 times higher/better - all cores loaded
1 Like

Out of curiosity, how fast is cat in one case vs. the other.

I’m using dm-crypt under btrfs on a very low powered odroid n2+ over USB, I’m limited by USB HDD performance - about 200MB/s and it’s never really using more than 1 core worth of time for crypto stuff.

cryptsetup benchmark says the tiny cpu can do 700MB/s

cat /zfs_crypt_ds/largefile | pv > /dev/null4.31GiB 0:03:35 [20.5MiB/s] (i’ve seen as low as 3-5 MB/s)

cat /mnt/zvolLUKSxfs/same_largefile | pv > /dev/null4.31GiB 0:00:29 [ 148MiB/s]

an/my I3-9100F produces the following cryptsetup benchmark

aes-cbc = yeah, nah - afaik, the code book cypher mode is broken/vulnerable
aes-xts        256b      3519.9 MiB/s      3530.4 MiB/s
aes-xts        512b      2863.9 MiB/s      2847.4 MiB/s

edit: the whole pool is actually quite fast - those are fancy Ultrastar DC HC310 6TB HDDs - zpool scrub flies at ~600MB/s

by now, copying over 3+ TB finally finished - lets say that poor little i3-9100F has been stressed more than my patience. (screen session, mc, all automated - i couldn’t care less)

Now i wonder, is this what usually happens, when you operate encrypted ZFS datasets? At least htop tells me that the ZFS crypto engine is single-threaded.

Who had greater throughput than 16MB/s(avg) on an encrypted ZFS dataset?
Is it my CPU, an i3-9100F?

Why is dm-crypt/LUKS+XFS at least 8-10x faster, off of a zvol?
Is it because htop indicates multi-threadedness, of that kind of pipeline?

What are YOUR results? (with platform specs, please)

Am i doing something wrong - what exactly?

Just to suggest an alternative, maybe you could look at a program like gocryptfs. This is a fuse filesystem that provides transparent encryption of a folder - no need to do a zvol + dmcrypt + xfs. It also makes it a lot easier to backup the encrypted folder using rsync.

I’m a little surprised at how slow your encryption is here, even if it’s single threaded. You clearly have AES-NI that’s working. There are apparently some issues on non-AVX CPUs, but your CPU should have AVX.

What version of ZFS on Linux are you running? I’d suggest making a new issue on the openzfs github, this might be an unknown issue.

it’s the version from the Ubuntu 22.04 LTS repo

$ zfs --version
zfs-2.1.2-1ubuntu3
zfs-kmod-2.1.2-1ubuntu3

edit: the i3-9100F has AVX2 - Intel Core i39100F Processor 6M Cache up to 4.20 GHz Product Specifications

That CPU should be plenty fast enough, if dmcrypt is fine, then I’d hazard a guess it is a kernel+zfs version interaction. I’ve lost count the number of times kernel internal APIs broke ZFS performance, but all I know is the current Arch Linux LTS kernel + ZFS DKMS package seem fine. And FreeBSD just always trucks on regardless :slight_smile:

Tried some tests on three machines I have. In the ZFS tests, actually cat seemed to be the bottleneck - it was using nearly one whole core with spare CPU left.

Files created using /dev/urandom and dropping cache (echo 3 > /proc/sys/vm/drop_caches) or export/import ZFS pool between runs:

Machine 1 (Lenovo Yoga laptop)

zfs-2.1.4 / Arch Linux kernel 5.15.45 / Intel Core i5-11300H

ZFS

ZFS dataset: aes-256-gcm, no compression , single NVMe WD SN700

cat * (many big files) | pv > /dev/null
 277GiB 0:03:00 [ 1.54GiB/s]

dmcrypt

cryptsetup benchmark | grep aes-xts
        aes-xts        256b      5276.9 MiB/s      5289.6 MiB/s
        aes-xts        512b      4672.7 MiB/s      4641.1 MiB/s

No spare partition to try dmcrypt + xfs.

Machine 2 (Broadwell Xeon NAS)

zfs-2.1.4 / FreeBSD 13.1-RELEASE / Intel Xeon E3-1285Lv4

ZFS

ZFS + aes-256-gcm , zstd-1 , single SATA Seagate Exos ST18000NM000J

cat bigfile.bin | pv > /dev/null
16.5GiB 0:01:47 [ 156MiB/s]

ZFS + aes-256-gcm , zstd-1 , two SATA Crucial MX500 2TB SSD

tar cf - lots_of_small_files | pv > /dev/null
68GiB 0:00:18 [ 323MiB/s]

No dmcrypt support to try.

Machine 3 (desktop)

zfs-2.1.4 / Arch Linux kernel 5.15.48 / EPYC 74F3

ZFS

dataset: aes-256-gcm, no compression , single NVMe WD SN850

cat 100G.bin | pv > /dev/null
97.7GiB 0:00:51 [1.91GiB/s]

Seems to be bottlenecked on cat - most cores remain idle.
cat was using one core at 95%, 6x z_rd_int_{0…5} threads using 15% CPU each.

dmcrypt

cryptsetup benchmark | grep aes-xts
        aes-xts        256b      3915.3 MiB/s      4483.6 MiB/s
        aes-xts        512b      3768.9 MiB/s      3798.2 MiB/s

xfs on dmcrypt, default luksFormat options - aes-xts-plain64 512bit

cat /mnt/xfs/250G.bin | pv > /dev/null
5.47GiB/s

edit 1: I was curious how fast AES could go on this box, 191GB/s? - now if only programs could use multiple threads when reading files …

openssl speed -evp aes-256-xts -multi $(nproc)
evp           13653662.79k 58340319.00k 123940645.46k 173261454.68k 190404837.38k 191108440.11k
2 Likes

Are you thinking of something specific? … or was this off-hand wishful thinking?

(I can think of some hacks that might help particular use cases, like LD_PRELOAD triggered prefetching :slight_smile: ).

Wow, that’s a lot of test - thank you!

I’m usually accessing those files via sshfs, and i’ve managed to almost saturate (average) the 2.5gig network link (Realtek, PCIe and USB) with the zvol + dm-crypt + xfs pipeline - good enough for me. :smiley: Feels a bit snappier than native ZFS crypto.