QAT for Speedy NGINX

Background

I went to Intel Innovation 2023 to try to get a better understanding of intel’s overall strategy for server processors going forward – at least when it comes to “Uncore” stuff like accelerators.

I’ve already done a lot of work with Glenn Berry on Microsoft SQL server and QAT. Intel’s QAT enables a unique and game-changing bit of functionality to do with backup acceleration and compression. On a busy SQL server it doesn’t take very much CPU away from doing the SQL server stuff but you still get great hardware-accelerated encryption when your server processor has QAT. (Software can still be used in lieu of hardware, but that eats into the available CPU overhead).

Compression/Decompression isn’t the only useful thing QAT can do: it can also offload TLS encryption, like the kind used on the web, for a web server. This can dramatically lower the amount of cpu overhead setting up and tearing down TLS connections.

I set out to investigate

Intel Docker/Kubernetes resources

This is the main resource:

This is a great source of knowledge and ready-to-roll demos not just for QAT, but also FPGA, DSA and even just GPU demos.

It can be a little discouraging when some of the first search hits for docker/QAT intel demos is this demo last updated 3 years ago:
https://hub.docker.com/r/intel/crypto-reference-stack

… it is possible to find much more recent demonstrations of TLS acceleration with nginx as used by other commercial projects on github:

Getting Started

When running docker, or anything else, it is important the host have the QAT driver installed, and working.

https://www.intel.com/content/www/us/en/content-details/710059/intel-quickassist-technology-software-for-linux-getting-started-guide-customer-enabling-release.html

This guide is the most reasonable start on that, if a little obtuse.

Before proceeding further use the qzip utility to test that the QATengine can be used for data compression to confirm its working properly.

While the docker images contain QATzip and QATengine, you must configure QATzip and QATengine on each host that the containers run. The QATzip configuration files are located at QATzip/config_file and the QATengine configuration files are located at QAT_Engine/qat_hw_config.

There are multiple versions of the configuration files optimized for different adapaters and usage scenarios. Select the ones that meet your adapter and usage pattern. Copy them to the /etc directory. Note that QATzip looks for NumberDcInstances and QATengine looks for NumberCyInstances. Thus you will need to merge the QATzip and QATengine configuration files together as you need both in NGINX.

For example, /etc/c6xx_dev0.conf might look similar to the following:

^ This from the readme on OpenVisualCloud’s QAT nginx container for Docker.

If you’re using kubernetes, of course, you can just follow Intel’s device plugins for kubernetes.

I Want To Use This

The Dockerfile has lots of cool heard-learned lessons in it:

… one can take a lot away from the openssl setup they’ve automated in the container here if you are contemplating adopting this for production

Testing

Writeup/TODO/ Paste youtube video here

Final Notes

So the one fly in the ointment here is that in 99% of use cases your webserver is NOT continuously tearing down and setting up encrypted sessions. The bottleneck won’t be how many TLS connections one can open, and then close. Real-world the web server will allow clients to keep the connection open for either a fixed # of seconds, a fixed # of requests, or some combination of parameters like that. This connection re-use is what helps speed things along. QAT is still quite useful for shifting load off of the CPU for both opening new connections and maintaining old connections, though.

Real world? It’s going to depend on your workload. If you’re running a microservice then you might actually see 90% of the benefit I’ve shown off here. Running a bog-standard webapp? You could still easily squeeze in +20% more connections over what you would be able to do without QAT alone.

In my mind the real problem with QAT adoption is that it needs to be ubiqitous and that it needs to be on more folks’ radar.

4 Likes

Netflix played around with KTLS a few years back and got pretty good results using that (no QAT requirements)

It’s obviously not the same thing but probably viable for more users in general.

2 Likes

It’s an Intel-only thing in a server market where AMD has convincing arguments as well. And QAT being a premium feature for a selected amount of SKUs. I’m not sure if the PCIe cards are up-to-date or are generally recommended, but these are still exotic equipment. I may as well look into other DPUs or FPGA products.

OpenZFS will soon have access to QAT and other similar technologies to boost compression, checksums and parity calculations. I’ve seen other storage solutions having QAT support as well.

Software is slsowly adopting accelerator technology, I just need something other than selected (expensive) 4th gen Xeon SKUs to use it.

QAT support is already in and depending on data paths hardware acceleration doesn’t necessarily mean faster in all cases.

I’ve been waiting for this, just watched the video. I recently bought a supermicro SYS-E200-12A-8C which has an Atom C5325, just because of its QAT. I am a bit confused as to versions of QAT: which skus have which version and feature differences between versions. Perhaps someone can clarify or point to the right resources. Another avenue of curiosity is how to use it in virtualized situations, in my case Proxmox based. I read that there you can either use it in user space or kernel space or maybe it was either host or guest mode.

Resources aren’t as readily available as a lot of the popular tools/topics that seem to capture the attention of tech/homelab content creator crowd, so appreciate the attention you’re trying to bring to QAT in general. STH has some high level content as well.

I have a vague idea to experiment with kubernetes service mesh TLS acceleration. I also looked into using it as a pfSense/OPNSense box, but it seems uses of QAT are limited there at best, not to mention on pfSense it’s available only for plus version, which has gone back to being paid only. Also saw the ZFS stuff like mentioned above - exciting.

Anyway, looking forward to digging into what you posted; will report if I find something interesting. I would love to see detailed tutorial-like content and/or anything related to QAT - maybe on level1linux?

Youtube ate my comment on the “Understanding Quick Assist Technology” video - so I’m going to post about it here. My guess I made the mistake of typing “crypto” once lol, anyway…

If comparing QAT TLS performance, keep in mind that OpenSSL regressed badly on performance in version 3.0 - the version shipped in Ubuntu 22.04 LTS. The release was a complete disaster performance wise. So I would advise that any comparisons takes this into account. The TLS performance of OpenSSL 3.0 is in no way representative for the performance of AES-NI for example.

3.2 is a tad better but there is still a long way to the performance of OpenSSL 1.1.1 in many workloads, and Ubuntu is sticking to 3.0 even in the upcoming 24.04 for its LTS status it seems.

References (I’m not allowed to post links, sorry):
HAProxy wiki “SSL Libraries Support Status” - see OpenSSL section
OpenSSL GIthub issue #20286 (there are many related open issues)

what’s a reasonable alternative other than using the ancient openssl on lts ubuntu?

Hm maybe running stuff in 20.04 based containers on 22.04 host? But that just buys you a bit more than a year in terms of support. And you will be missing some optimizations for modern cpus in various userspace bits and pieces.

At $job we’ve moved some projects to WolfSSL, and will be building a package with OpenSSL 1.1.1 that some other packages can selectively build and link against, at least for as long as some major distros offer security support for 1.1.1. But thats not exactly a end-user friendly workaround is it , packaging, re-building and adapting stuff. :persevere: