QAT - HowTo on Linux -- and SRIOV -- And Proxmox 8

Background

Intel QAT can be pretty awesome for gzip and zstd compression, but it can be a little tricky to get it enabled on Proxmox. I also don’t really recommend doing this on “production” proxmox boxes unless you really understand the ramifications and potential problems you can create for yourself. You end up using zfs-dkms instead of the proxmox-built version of zfs.

At some point zfs on proxmox may natively be built with QAT accelerator support.

If you aren’t in the know about QAT it is an accelerator that can accelerate data transformation operations; mainly compression and encryption. It shows up like a PCIe device, but is typically built into the CPU (there used to be PCIe add-in cards, but mostly those are slower than what you can do on-cpu today). It is a technology pushing 10+ years old now, was incredibly useful on embedded CPUs like the Cherry-Trail Intel Atom CPUs, but is now on many (but not all) Xeon CPUs.

This has been a long time coming

…and some were excited to talk about QAT+ZFS when we were just talking QAT+nginx (hat tip @Exard3k haha … )

Doing this as a boot-from-zfs is not recommended since QAT likely won’t be detected properly at boot time and then reloading ZFS after boot w/QAT support becomes a problem

This is also good background reading from intel:
https://cdrdv2-public.intel.com/632506/632506-qat-getting-started-guide-v2.0.pdf

Config Notes

Driver for Older/Ancient QAT

https://www.intel.com/content/www/us/en/download/708765/intel-quickassist-technology-driver-for-linux-hw-version-1-5.html

There are many versions of QAT hardware. Be aware of this. This one goes all the way back to cherry trail atom and is almost certainly not what you need. I mention this because the other documents I was looking at to figure this out for myself make no mention of this as a possible pitfall at all.

Driver for Modernish QAT

There is firmware, a driver and a userspace library (qatlib) that you will need. Generally the “linux-firmware” package should include it? In my case it was qat_4xx*.bin The repo that contains files normally included in the linux-firmware package come from here in case you need to DIY it. kernel/git/firmware/linux-firmware.git - Repository of firmware blobs for use with the Linux kernel

Ensure the needed firmware is present at /lib/firmware (or where your Linux distribution keeps all its other firmware).

# The kernel module in my case loaded just fine, and I had the firmware
dmesg -i -e | grep qat_4xxx

# ... and the device showed up with these IDs
# lspci -d :4940 -k
# lspci -d :4940 -k
2b:00.0 Co-processor: Intel Corporation Device 4940 (rev 40)
        Subsystem: Intel Corporation Device 0000
        Kernel driver in use: 4xxx
        Kernel modules: qat_4xxx
e0:00.0 Co-processor: Intel Corporation Device 4940 (rev 40)
        Subsystem: Intel Corporation Device 0000
        Kernel driver in use: 4xxx
        Kernel modules: qat_4xxx
e5:00.0 Co-processor: Intel Corporation Device 4940 (rev 40)
        Subsystem: Intel Corporation Device 0000

There are sometimes packages in your distro for ```qatlib``; it may just be an apt search away for you. If not, GitHub - intel/qatlib

You should have a ```qat`` service that you must enable and start once you complete this installation.

systemctl status qat

Did you know QAT can do VFIO/SR-IOV as of Sapphire Rapids?

Yup, it’s true. One must explicitly enable IOMMU and SR-IOV in the bios, and configure the kernel boot command line with intel_iommu=on

Then verify:

# For me on Sapphire Rapids, I had these Device IDs. QAT PF id is 0x4940, and the VF id is 0x4941
# lspci -d :4941
# lspci -vn -d :4940|grep -i SR-IOV
# lspci -vn -d :4940|grep -i SR-IOV
        Capabilities: [150] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [150] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [150] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [150] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [150] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [150] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [150] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [150] Single Root I/O Virtualization (SR-IOV)

… but this will have to be left to a future video :slight_smile:

Using QAT in user space

This is pretty easy; I’d suggest adding a qat group and

 sudo groupadd qat
 sudo usermod -a -G qat <YOUR_USER>

Command Notes

lsmod | grep -i zfs
lsmod | grep -i qat
modinfo zfs | grep qat
 ls -al /proc/spl/kstat/zfs/
service qat_service status
 cat /sys/module/zfs/parameters/zfs_qat_disable

# maybe needed for proxmox
# /etc/modprobe.d/zfs.conf
blacklist spl
blacklist zfs


Further Stuff I read Working On This

Intel Reference PDF

https://www.intel.com/content/www/us/en/content-details/710059/intel-quickassist-technology-software-for-linux-getting-started-guide-customer-enabling-release.html

Notes from Phoronix on the November '23 Updates

They like to add support for new algorithms from time to time.

6 Likes

Is there a way to use QAT in TrueNAS Scale?
I see the kernel modules are loaded, but it looks like the userspace libraries and the qat service are missing.
I have a box with C3558, which is basically a TrueNAS Mini X, and it’s a shame it’s not being used.

1 Like

for core, it is juuuuuust getting patched upstream for 4xxxx qat devices into freebsd, once that’s done there may be a way.

for scale, its just a pcie device. lspci and see them? gotta compile your own kernel or dkms zfs but should be doable one of those paths

2 Likes

lspci sees it, and Linux kernel seems to be loading modules:

# lsmod | grep qat
qat_c3xxx              16384  0
intel_qat             208896  1 qat_c3xxx
crc8                   16384  1 intel_qat
authenc                16384  2 intel_qat,essiv
# dmesg | grep qat
c3xxx 0000:01:00.0: qat_dev0 started 6 acceleration engines

Looks like only userspace and zfs support are missing.

test support in zfs for you is only a recompile away then

Ahh man, this is very tempting for my D-2146NT, QAT sitting unutilized right now. But I’m booting from ZFS and more work than I want right now. Maybe when I wrap up some projects, but really hoping for OS support.

It’s nice to see some content exploring QAT. A while back I picked up a Dell Virtual Edge Platform VEP4600. If you search Google you can easily find this document hosted by Dell which contains everything worth knowing about the system: "vep4600_tech_guide_en-us.pdf. I would post a link, but the forum doesn’t want to let me. I had an opportunity to purchase one of these open-box but otherwise in seemingly brand new condition for significantly cheaper than I was able to find them selling for in pre-owned or refurbished condition elsewhere. I believe the one I purchased has the config that comes with 8-cores/32GB memory. It has an Intel Skylake-D CPU, specifically the Xeon-D 2100 and comes with QAT which is supposed have some support for accelerating crypto operations as well as compression operations.

My hope was that, given the CPU along with the support for QAT, it would serve as a very nice opnsense router, and using its dual 10GbE ports I was hoping it would allow me to bridge the gap between my 2.5GbE cable modem and my 10GbE switch allowing me to take advantage of the extra 200Mbps on the 1.2Gbps service I am getting from my ISP. Unfortunately, I later discovered that, unlike most of my other 10GbE equipment, these ports did not have support for also running at 2.5GbE/5GbE speeds, and so it simply runs at 1GbE, foiling my plan to finally put that extra 200Mbps to use. I installed it in my homelab server rack and though I had considered installing Proxmox on it, I decided to use vmware esxi instead since that is one of the operating systems that Dell configures these with out of the factory, and Dell provides a custom build of esxi for this machine specifically. I configured a VM inside of esxi, installed opnsense on that VM, and then I was off to the races.

Of course, it wasn’t that easy and I did have more to do to have any hope of actually getting QAT working properly inside of a virtualized opnsense instance. From Intel I was able to find and build QAT esxi drivers that enabled support for passing QAT virtual functions to my VMs using SR-IOV. Despite opnsense having support for qat when running native (which I assume should work out of the box for the most part if just running natively on bare metal), that didn’t seem to work for the qat virtual functions that I was passing through from esxi. So again, from Intel I was able to find QAT drivers for freebsd and was able to succesully build and install them in opnsense. At that point it did seem like everything was being detected properly inside of my opnsense instance. However, when I tried running the tests which Intel’s documentation suggested to use for the purposes of actually confirming that operations were actually being offloaded and accelerated by qat properly. the counters for operations handled by qat that were supposed to be going up seemed to just stay at 0. I spent a number of hours trying different things but so far I have still haven’t been able to actually confirm that it’s working properly (though it’s been several months and I haven’t gone back and tried again since then). This hasn’t really been a problem for me as though I do ocassionally VPN into my homelab from my mobile hotpsot when I’m not at the house, that’s pretty much all it has to handle, so the performance isn’t really an issue either way. Playing with QAT for me is mostly for fun as a learning exercise.

Though they aren’t officially supported by Dell like the vmware esxi build I currently have installed, I’ve considered trying to install Proxmox or just try to install opnsense directly on bare metal and see if I have any better luck with either of those. I also thought about trying pfsense instead of opnsense, though I doubt there would be a difference between the two and I prefer opnsense all other things equal.

@coryg89 with QAT on pfsense I noticed that not all QAT accelerators are enabled by the driver. You need to make sure your specific device is supported, pfsense should have a list in their documentation.
pfsense is who contributed the QAT bits to the project, so Opensense is probably experiencing the same issues, as they are relying on pfsense contributions for QAT.

1 Like

@samsausage Hm, thanks for the tip. My QAT device shows up for me as “c6xx”. In Intel’s documentation they show QAT c6xx as being associated with both the Intel C62x chipset and also with some of the dedicated adapters, Intel QuickAssist Adapter 8960/8970, possibly others as well.

Looking in the pfsense documentation, I was unable to find an exhaustive list of supported
devices, but on their documentation page titled “Cryptographic Accelerators” it does mention some supported devices:

QAT devices are supported on certain Intel-based platforms such as select models of c3000 and c2000 SoCs, and also by QAT add-on cards. Several Netgate hardware models include QAT devices, such as the 4100, 5100, 6100, 7100, 8200, and more.

My device doesn’t appear to be explicitly listed, though the text seems to suggest that there are other devices not explicitly listed that could be supported. If my device was not supported by default by the driver built in by pfsense, that may be why I had to build and install my own QAT driver for freebsd from Intel before I could get it to show up. Anyway, on the samee page the pfsense documentation has a section on confirming the accelerator is being used and when I ran vmstat -i | grep qat it did seem to be showing up, it’s just the count of interrupts being handled by the accelerator didn’t seem to increase no matter what I tried.

Hello,
I got Proxmox working with an Intel QAT 8970 while using this in hybrid mode. Providing one instance for the Host for compression and crypto while having 32 VF instances for virtual machines.

root@xxx:~# grep qat /proc/crypto
driver       : pkcs1pad(qat-rsa,sha512)
driver       : qat-rsa
module       : intel_qat
driver       : qat_aes_gcm
module       : intel_qat
driver       : qat_aes_cbc_hmac_sha512
module       : intel_qat
driver       : qat_aes_cbc_hmac_sha256
module       : intel_qat
driver       : qat_aes_xts
module       : intel_qat
driver       : qat_aes_ctr
module       : intel_qat
driver       : qat_aes_cbc
module       : intel_qat

root@xxxx:~# cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1      1081006 iterations per second for 256-bit key
PBKDF2-sha256    1466539 iterations per second for 256-bit key
PBKDF2-sha512    1094546 iterations per second for 256-bit key
PBKDF2-ripemd160  634731 iterations per second for 256-bit key
PBKDF2-whirlpool  485451 iterations per second for 256-bit key
argon2i       5 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id      5 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b       342.2 MiB/s       339.6 MiB/s
    serpent-cbc        128b        79.3 MiB/s       509.7 MiB/s
    twofish-cbc        128b       180.1 MiB/s       322.9 MiB/s
        aes-cbc        256b       351.1 MiB/s       333.5 MiB/s
    serpent-cbc        256b        81.7 MiB/s       510.7 MiB/s
    twofish-cbc        256b       181.9 MiB/s       324.9 MiB/s
        aes-xts        256b       374.1 MiB/s       351.6 MiB/s
    serpent-xts        256b       456.1 MiB/s       455.3 MiB/s
    twofish-xts        256b       300.7 MiB/s       303.0 MiB/s
        aes-xts        512b       368.9 MiB/s       353.3 MiB/s
    serpent-xts        512b       465.1 MiB/s       456.3 MiB/s
    twofish-xts        512b       305.4 MiB/s       303.6 MiB/s
root@xxx:~# cat /sys/kernel/debug/qat_c6xx_0000\:83\:00.0/fw_counters
+------------------------------------------------+
| FW Statistics for Qat Device                   |
+------------------------------------------------+
| Firmware Requests [AE  0]:                4495 |
| Firmware Responses[AE  0]:                4495 |
| RAS Events        [AE  0]:                   0 |
+------------------------------------------------+
| Firmware Requests [AE  1]:                4494 |
| Firmware Responses[AE  1]:                4494 |
| RAS Events        [AE  1]:                   0 |
+------------------------------------------------+
| Firmware Requests [AE  2]:                4494 |
| Firmware Responses[AE  2]:                4494 |
| RAS Events        [AE  2]:                   0 |
+------------------------------------------------+
| Firmware Requests [AE  3]:                4495 |
| Firmware Responses[AE  3]:                4495 |
| RAS Events        [AE  3]:                   0 |
+------------------------------------------------+
| Firmware Requests [AE  4]:                4494 |
| Firmware Responses[AE  4]:                4494 |
| RAS Events        [AE  4]:                   0 |
+------------------------------------------------+
| Firmware Requests [AE  5]:                4494 |
| Firmware Responses[AE  5]:                4494 |
| RAS Events        [AE  5]:                   0 |
+------------------------------------------------+
| Firmware Requests [AE  6]:                4495 |
| Firmware Responses[AE  6]:                4495 |
| RAS Events        [AE  6]:                   0 |
+------------------------------------------------+
| Firmware Requests [AE  7]:                4494 |
| Firmware Responses[AE  7]:                4494 |
| RAS Events        [AE  7]:                   0 |
+------------------------------------------------+
| Firmware Requests [AE  8]:                4494 |
| Firmware Responses[AE  8]:                4494 |
| RAS Events        [AE  8]:                   0 |
+------------------------------------------------+
| Firmware Requests [AE  9]:                4495 |
| Firmware Responses[AE  9]:                4495 |
| RAS Events        [AE  9]:                   0 |
+------------------------------------------------+

However performance was quite disappointing when comparing to Intel E5-2660v4 AES-NI:

root@xxx:~# /etc/init.d/qat_service status
Checking status of all devices.
There is 3 QAT acceleration device(s) in the system:
 qat_dev0 - type: c6xx,  inst_id: 0,  node_id: 1,  bsf: 0000:83:00.0,  #accel: 5 #engines: 10 state: down
 qat_dev1 - type: c6xx,  inst_id: 1,  node_id: 1,  bsf: 0000:85:00.0,  #accel: 5 #engines: 10 state: down
 qat_dev2 - type: c6xx,  inst_id: 2,  node_id: 1,  bsf: 0000:87:00.0,  #accel: 5 #engines: 10 state: down
root@xxx:~# cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1      1068884 iterations per second for 256-bit key
PBKDF2-sha256    1474790 iterations per second for 256-bit key
PBKDF2-sha512    1093405 iterations per second for 256-bit key
PBKDF2-ripemd160  634731 iterations per second for 256-bit key
PBKDF2-whirlpool  483660 iterations per second for 256-bit key
argon2i       5 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id      5 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b       589.8 MiB/s      2339.8 MiB/s
    serpent-cbc        128b        82.4 MiB/s       513.4 MiB/s
    twofish-cbc        128b       183.2 MiB/s       326.0 MiB/s
        aes-cbc        256b       444.1 MiB/s      1841.0 MiB/s
    serpent-cbc        256b        81.8 MiB/s       512.3 MiB/s
    twofish-cbc        256b       183.4 MiB/s       327.8 MiB/s
        aes-xts        256b      2011.0 MiB/s      2028.8 MiB/s
    serpent-xts        256b       470.7 MiB/s       457.1 MiB/s
    twofish-xts        256b       305.9 MiB/s       304.0 MiB/s
        aes-xts        512b      1607.0 MiB/s      1610.4 MiB/s
    serpent-xts        512b       467.4 MiB/s       457.8 MiB/s
    twofish-xts        512b       305.6 MiB/s       304.1 MiB/s
root@xxx:~# lsmod | grep -E "zfs|qat"
qat_c62xvf             36864  0
qat_c62x               20480  0
zfs                  6213632  6
spl                   143360  1 zfs
qat_api               655360  3 zfs
intel_qat             409600  4 qat_c62x,qat_api,usdm_drv,qat_c62xvf
uio                    24576  1 intel_qat
authenc                12288  1 intel_qat

Overall underwelming performance for crypto for now.

Cheers

1 Like

So what I found out. It displays correctly within the OS however you won’t be able to actual use Hybrid setup. You have to decide between PF (host use) or VF (vm use). Downside is PF does not work with IOMMU enabled. Sadge.

You end up using VF as disabling IOMMU prevents any use of passthrough/vfio. In my case NVIDIA vGPU.

Quote from documentation:

Note: Using the boot flag intel_iommu=on prevents using the QAT physical function (PF) on the host. To use Intel® QAT on the host with this flag, refer to the virtualization app note cited above for the instructions to use Intel® QAT virtual functions (VFs) on the host. access is required to make grub changes

Beside that the card constantly complains about being run on NUMA 0 while being attached to NUMA 1 PCIe.

Valid for Intel QAT Accelerator 8970 and probalby all HW 1.x

Impressive Performance with PCIe Passthrough of 3 VF in a Windows VM:

3 Likes