Fixing massive IOWAIT: double LUKS, VM-nested AES-NI

Hi!

I’ve noticed a massive all-core utilization when uploading to my Nextcloud VM.

My setup:

  • Xeon E3-1230 v5 (Skylake)
  • VM host is fully LUKS-encrypted, every single storage device
  • 3 HDD spinners, via LUKS, in a ZFS raidz1(RAID5) pool, /storage
  • ZFS slog/l2arc for /storage, on separate SSD, also via LUKS
  • the Nextcloud VM (Linux/KVM) runs entirely off of a raw zvol beneath /storage
  • the Nextcloud VM itself LUKS-encrypts everything, effectively double-crypto

Symptoms:

  • when i upload to the Nextcloud VM, via gigabit, the VM host’s “htop” (with “Display options”>“Detailed CPU time[…]IO-Wait[…]” enabled) displays a massive all-core/thread load in grey bars, >90%. I guess that’s IO-Wait, due to double AES-NI utilization, right?
  • …it’s not? Why? What else?

I don’t actually NEED that Nextcloud VM to be double-encrypted, since it’s in the same “thread model” category/level as the VM host itself …which already has encryption for all of it’s data-at-rest.

Questions:

  • Can i reasonably expect increased performance, by setting up a fresh Nextcloud VM, without LUKS-crypto …just plain XFS?
  • EXT4, with its “ext4lazyinit” kernel thread, seems to be a little IOPS/capacity hogger. Quite a nuisance, imho, so none of that rubbish; XFS all the way! (mkfs, and be done!)