Supermicro Hardware
Background
Wow, this one has a lot of backstory if you’re just walking in the middle. Maybe check out the video for more context.
Intel is out in front with usable multi-tenant GPU facilities for the masses – far ahead of nvidia and amd and anyone else. Well, tehcnically, nvidia is far ahead of everyone with GRID, virtualized GPUs and multi-tenant solutions but these solutions cannot be bought. They can only be rented via expensive ongoing subscriptions.
No longer; Intel Flex 170 GPUs are now a viable solution. Intel officially support IBM’s OpenShift (and that platform can do VDI, inferencing for AI and a lot more). Here, we will go into what I have to do, as of March 9, 2024, to get Intel’s i915 driver working with Flex 170 on Proxmox 8.1 with the default kernel (via DKMS).
Intel also officially supports Ubuntu 22.04 Sever (5.15 kernel) and 22.04 Desktop (6.5 kernel) however Flex 170 is only supported in a server context. Oops
Not to worry – so many customers are climbing the walls to get away from VDI subscriptions that even Proxmox is looking to directly support intel GPUs.
SR-IOV in Client GPUS!?!?
The other reason I say intel is far ahead of everyone else here is that they quietly enabled sr-iov for the iGPU in 10, 11, 12th, 13th and 14th gen client CPUs. 10/11 sr-iov support is not fully-baked in 10/11th gen, but it does actually work well in Alder lake and newer. Yes, even for Plex Media Server use cases!
For the play-by-play in sr-iov in client (nothing really to do with this how-to except that it’ll be really important to understand what happened with A770 sr-iov later) check out this forked copy of Intels’ sriov dkms driver from strongtz:
and also this excellent blog writeup specifically for proxmox from Derek Seaman:
…and finally this gist:
I love seeing these fringe use cases being adopted, and documented, by the community at large. Good work yall!!!
Intel Offical vs Unofficial
Intel has pulled something off – these GPUs can be used with both Citrix and VMware, officially, in sr-iov mode, and some more advanced modes specifically with VMware but… what about a full open source stack? That’s unofficial. RedHat OpenShift is fully supported but it is pricey, bit of a niche use case, and just doesn’t have the mindshare of being a viable replacement for small-to-medium MSP type use cases the way that a VMware or Hyper-V based VDI type solution does. OpenShift better enables a mixed workload use case where you might do VDI or inferencing… but for today, let’s focus on Proxmox.
After this I could see myself investigating how it works with xcp-ng (a much older kernel code base) and Nutanix (???).
Let’s get Proxmox Going
Clone Intel’s Repo
cd /usr/src
git clone https://github.com/intel-gpu/intel-gpu-i915-backports.git
cd intel-gpu-i915-backports
git checkout backport/main
^ if you read the readme carefully, intel says YES to 6.5 kernel versions in the context of Ubuntu 22.04 LTS desktop but not server (kernel 5.15) and way, way nothing about proxmox (sad trombone).
Not to worry, it does work, and here’s how.
I needed to make some changes to get dkms to build. One line change, edit the Makefile.backports
jump to line 434 and comment it out with a # :
else
# $(info "OSV_NOT SUPPORTED")
endif
My OSV is SUPPORTED to, dang it! Supported by meeeeeeee. I think this and line 430 maybe have typos with tabs instead of spaces? There is a pending pull request from my buddy over @ proxmox to fix that issue. For our purposes these edits will suffice.
I also needed to edit scripts/backport-mki915dkmsspec
and comment out
# Obsoletes: intel-platform-vsec-dkms intel-platform-cse-dkms
…because there is some problem with intel-platform-vsec-dkms for this kernel version and we actually kinda do still need those symbols.
with those files edited we’re almost ready to make the module.
We need to make sure 1) we have development tools installed and 2) we have the kernel headers
apt install proxmox-headers-6.5.13-3-pve # your version may be different; uname -a and apt search to figure it out, should not be older than this/lower version no though
apt install gcc g++ make binutils flex autoconf libtool devscripts dh-dkms dkms # etc
With the dependencies installed you can built inside the source folder now:
make i915dkmsdeb-pkg
should build some deb files in the parent directory:
-rw-r--r-- 1 root root 3140588 Mar 9 18:33 intel-i915-dkms_1.23.10.32.231129.32+i1-1_all.deb
-rw-r--r-- 1 root root 5123 Mar 9 18:33 intel-i915-dkms_1.23.10.32.231129.32+i1-1_amd64.buildinfo
-rw-r--r-- 1 root root 1245 Mar 9 18:33 intel-i915-dkms_1.23.10.32.231129.32+i1-1_amd64.changes
from here you can apt install /usr/src/intel-i915-blahblah and that should build/compile the dkms module.
You may also need to unload the intel_vsec module – rmmod intel_vsec. This was its own module, now is bundled in, had its own package, won’t have its own package going forward. We might need it? But probably not except possibly for troubleshooting.
what about firmware? check the dmesg output when you try to modprobe i915 and see if you’re missing firmware. You can grab it form the linux kernel website
pt install --reinstall ../intel-i915-dkms
_1.23.10.32.231129.32+i1-1_all.deb
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Note, selecting 'intel-i915-dkms' instead of '../intel-i915-dkms_1.23.10.32.231129.32+i1-1_all.deb'
0 upgraded, 0 newly installed, 1 reinstalled, 0 to remove and 0 not upgraded.
Need to get 0 B/3,141 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 /usr/src/intel-i915-dkms_1.23.10.32.231129.32+i1-1_all.deb intel-i915-dkms all 1.23.10.32.231129.32+i1-1 [3,141 kB]
(Reading database ... 111884 files and directories currently installed.)
Preparing to unpack .../intel-i915-dkms_1.23.10.32.231129.32+i1-1_all.deb ...
AUXILIARY_BUS is enabled for 6.5.13-1-pve.
AUXILIARY_BUS is enabled for 6.5.13-1-pve.
AUXILIARY_BUS is enabled for 6.5.13-1-pve.
Module intel-i915-dkms-1.23.10.32.231129.32 for kernel 6.5.13-1-pve (x86_64).
Before uninstall, this module version was ACTIVE on this kernel.
i915-compat.ko:
- Uninstallation
- Deleting from: /lib/modules/6.5.13-1-pve/updates/dkms/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
i915.ko:
- Uninstallation
- Deleting from: /lib/modules/6.5.13-1-pve/updates/dkms/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
i915_spi.ko:
- Uninstallation
- Deleting from: /lib/modules/6.5.13-1-pve/updates/dkms/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
iaf.ko:
- Uninstallation
- Deleting from: /lib/modules/6.5.13-1-pve/updates/dkms/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
[snip]
depmod....
Deleting module intel-i915-dkms-1.23.10.32.231129.32 completely from the DKMS tree.
Unpacking intel-i915-dkms (1.23.10.32.231129.32+i1-1) over (1.23.10.32.231129.32+i1-1) ...
Setting up intel-i915-dkms (1.23.10.32.231129.32+i1-1) ...
Loading new intel-i915-dkms-1.23.10.32.231129.32 DKMS files...
AUXILIARY_BUS is enabled for 6.5.13-1-pve.
Building for 6.5.13-1-pve
Building initial module for 6.5.13-1-pve
AUXILIARY_BUS is enabled for 6.5.13-1-pve.
Done.
AUXILIARY_BUS is enabled for 6.5.13-1-pve.
AUXILIARY_BUS is enabled for 6.5.13-1-pve.
… this is what a good installation looks like. It’s best to configure the kernel, then reboot.
Configure the kernel
I’m using grub to boot, so I needed to add intel_iommu=on iommu=pt i915.enable_guc=3 i915.max_vfs=7
to the end of my kernel line by adding it to /etc/default/grub . (If you’re using systemd you’ll need to update sysfs.conf instead. ) The lsat two parameters there don’t seem to work anymore, but I know I’ve used them in the past.
Post-reboot
Log back into the console in Proxmox of your node, run lspci looking for the Flex GPU:
lspci |grep Flex
45:00.0 Display controller: Intel Corporation Data Center GPU
If you see more than one, that’s great, the Virtual functions are already enabled, you can skip the next step. Otherwise, time to enable virtual functions.
Enabling Virtual functions
I’m not sure why but the old kernel parameters in 6.1 and before for the i915 drivers seem not to work, i915.enable_guc=3 i915.max_vfs=7
so we’ll set them manually.
After that lspci should be a little more filled out:
45:00.0 Display controller: Intel Corporation Data Center GPU Flex 170 (rev 08)
45:00.1 Display controller: Intel Corporation Data Center GPU Flex 170 (rev 08)
45:00.2 Display controller: Intel Corporation Data Center GPU Flex 170 (rev 08)
45:00.3 Display controller: Intel Corporation Data Center GPU Flex 170 (rev 08)
45:00.4 Display controller: Intel Corporation Data Center GPU Flex 170 (rev 08)
45:00.5 Display controller: Intel Corporation Data Center GPU Flex 170 (rev 08)
45:00.6 Display controller: Intel Corporation Data Center GPU Flex 170 (rev 08)
45:00.7 Display controller: Intel Corporation Data Center GPU Flex 170 (rev 08)
also note that if you lspci and then note your device is 0000:45 as mine is, and look for that in /sys/devices… you might not see it. that’s normal, you hvae to get there via the bridge. check out the red highlights in the above screenshot. That’s where sriov_numvfs
is located.
Flex 170 supports up to 31 virtual functions, but for practical VDI usage I’d really only recommend about 7 high-res dual monitor seats per card, or up to 15ish 2x1080p clients at most. The fact that a 1-socket 32 core Intel Emerald Rapids Xeon Gold 6538N + a single Flex 170 can easily support a dozen information-worker VDI roles is pretty significant.
I could play crysis at 1080p reasonably okay in a 7-slice config. That’s 14 good-size VDI seats per Supermicro Pizzabox
Proxmox VE 8.1 - Configure Resources in Datacenter
The right way to handle this, so that VM migration works in a cluster, is to map these PCIe devices to resources. That way Proxmox knows that when moving a VM between hosts that the PCIe resource in one host to the next is transportable.
Assign the resource, never the raw device, via the Datacenter area of Proxmox:
…for every host that you assign resources in this pool, Proxmox will assign the resources appropriately depending on the physical host the VM is running on.
Configure the first virtual function (00.1) in one of the Proxmox VMs you’ve already created (it is easier to install and setup windows THEN add the Flex GPU fwiw). I also recommend enabling remote desktop and making sure all that works before adding the PCIe device.
Once that’s done, boot up the VM and install the windows drivers for the GPU.
The Windows client driver you’ll need:
https://www.intel.com/content/www/us/en/download/780185/intel-data-center-gpu-flex-series-windows.html
The Future
I am not sure about SR-IOV’s future, in general. I really thought intel was on to somehting with GVT-g, but that has been deprecated. This exact same functionality + the awesome LookingGlass project on “consumer” GPUs like the A770 or even AMD or Nvidia GPUs cannot get here fast enough. With this functionality it becomes more possible to easily share gpu compute between hosts and guests.
AMD doesn’t realize it, but this also easily solves their AI problem because Windows and Linux can co-exist on the same hardware, seamlessly, with this type of functionality. Had SR-IOV been in client as easily as Intel Xe Graphics, I think AMD adoption of GPUs for AI would be much farther along than it is today.
Minisforum MS-01 SR-IOV
This guide basically applies to the MS-01 HOWEVER please ensure that you are able to ssh into your host prior to starting this guide AND that you are able to reboot & the network comes up automatically. There is a good chance that when you start this guide, the local console stops working.
[ 3.567985] i915 0000:00:02.0: vgaarb: deactivate vga console
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P [Iris Xe Graphics] (rev 04)
I also strongly recommend setting:
intel_iommu=on iommu=pt i915.enable_guc=3 i915.max_vfs=7 i915.modeset=1
in /etc/default/grub so that the console never gets initialized. Seems to cause null pointer problems when trying to use the VFs. Sometimes.
FWIW The SR-IOV functions on the igpu seem much closer to “beta” quality than not; this should stablize around kernel 6.8.
that’s what the iGPU looks like on the MS-01; once the backports i915 dkms driver is loaded, try to add some virtual functions:
Success!
echo 4 > /sys/devices/pci0000:00/0000:00:02.0/sriov_numvfs
and output from dmesg
[ 273.854452] pci 0000:00:02.1: [8086:a7a0] type 00 class 0x030000
[ 273.854486] pci 0000:00:02.1: DMAR: Skip IOMMU disabling for graphics
[ 273.854554] pci 0000:00:02.1: Adding to iommu group 22
[ 273.854560] pci 0000:00:02.1: vgaarb: bridge control possible
[ 273.854561] pci 0000:00:02.1: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 273.854634] i915 0000:00:02.1: enabling device (0000 -> 0002)
[ 273.854664] i915 0000:00:02.1: Running in SR-IOV VF mode
[ 273.855425] i915 0000:00:02.1: GuC interface version 0.1.9.0
[ 273.856592] i915 0000:00:02.1: [drm] GT count: 1, enabled: 1
[ 273.856623] i915 0000:00:02.1: [drm] VT-d active for gfx access
[ 273.856642] i915 0000:00:02.1: [drm] Using Transparent Hugepages
[ 273.857782] i915 0000:00:02.1: GuC interface version 0.1.9.0
[ 273.859519] i915 0000:00:02.1: GuC firmware PRELOADED version 0.0 submission:SR-IOV VF
[ 273.859545] i915 0000:00:02.1: HuC firmware PRELOADED
[ 273.869944] i915 0000:00:02.1: [drm] Protected Xe Path (PXP) protected content support initialized
[ 273.870312] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.1 on minor 1
[ 273.871590] pci 0000:00:02.2: [8086:a7a0] type 00 class 0x030000
[ 273.871607] pci 0000:00:02.2: DMAR: Skip IOMMU disabling for graphics
[ 273.871652] pci 0000:00:02.2: Adding to iommu group 23
[ 273.871657] pci 0000:00:02.2: vgaarb: bridge control possible
[ 273.871659] pci 0000:00:02.2: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 273.871698] i915 0000:00:02.2: enabling device (0000 -> 0002)
[ 273.871712] i915 0000:00:02.2: Running in SR-IOV VF mode
[ 273.872471] i915 0000:00:02.2: GuC interface version 0.1.9.0
[ 273.873587] i915 0000:00:02.2: [drm] GT count: 1, enabled: 1
[ 273.873615] i915 0000:00:02.2: [drm] VT-d active for gfx access
[ 273.873630] i915 0000:00:02.2: [drm] Using Transparent Hugepages
[ 273.874834] i915 0000:00:02.2: GuC interface version 0.1.9.0
[ 273.876399] i915 0000:00:02.2: GuC firmware PRELOADED version 0.0 submission:SR-IOV VF
[ 273.876401] i915 0000:00:02.2: HuC firmware PRELOADED
[ 273.886244] i915 0000:00:02.2: [drm] Protected Xe Path (PXP) protected content support initialized
[ 273.886549] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.2 on minor 2
[ 273.887242] pci 0000:00:02.3: [8086:a7a0] type 00 class 0x030000
[ 273.887258] pci 0000:00:02.3: DMAR: Skip IOMMU disabling for graphics
[ 273.887293] pci 0000:00:02.3: Adding to iommu group 24
[ 273.887297] pci 0000:00:02.3: vgaarb: bridge control possible
[ 273.887298] pci 0000:00:02.3: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 273.887321] i915 0000:00:02.3: enabling device (0000 -> 0002)
[ 273.887331] i915 0000:00:02.3: Running in SR-IOV VF mode
[ 273.888310] i915 0000:00:02.3: GuC interface version 0.1.9.0
[ 273.889780] i915 0000:00:02.3: [drm] GT count: 1, enabled: 1
[ 273.889792] i915 0000:00:02.3: [drm] VT-d active for gfx access
[ 273.889803] i915 0000:00:02.3: [drm] Using Transparent Hugepages
[ 273.890680] i915 0000:00:02.3: GuC interface version 0.1.9.0
[ 273.892330] i915 0000:00:02.3: GuC firmware PRELOADED version 0.0 submission:SR-IOV VF
[ 273.892332] i915 0000:00:02.3: HuC firmware PRELOADED
[ 273.902980] i915 0000:00:02.3: [drm] Protected Xe Path (PXP) protected content support initialized
[ 273.903357] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.3 on minor 3
[ 273.904165] pci 0000:00:02.4: [8086:a7a0] type 00 class 0x030000
[ 273.904177] pci 0000:00:02.4: DMAR: Skip IOMMU disabling for graphics
[ 273.904210] pci 0000:00:02.4: Adding to iommu group 25
[ 273.904213] pci 0000:00:02.4: vgaarb: bridge control possible
[ 273.904214] pci 0000:00:02.4: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 273.904237] i915 0000:00:02.4: enabling device (0000 -> 0002)
[ 273.904246] i915 0000:00:02.4: Running in SR-IOV VF mode
[ 273.904647] i915 0000:00:02.4: GuC interface version 0.1.9.0
[ 273.906426] i915 0000:00:02.4: [drm] GT count: 1, enabled: 1
[ 273.906452] i915 0000:00:02.4: [drm] VT-d active for gfx access
[ 273.906461] i915 0000:00:02.4: [drm] Using Transparent Hugepages
[ 273.907348] i915 0000:00:02.4: GuC interface version 0.1.9.0
[ 273.909055] i915 0000:00:02.4: GuC firmware PRELOADED version 0.0 submission:SR-IOV VF
[ 273.909079] i915 0000:00:02.4: HuC firmware PRELOADED
[ 273.919835] i915 0000:00:02.4: [drm] Protected Xe Path (PXP) protected content support initialized
[ 273.920261] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.4 on minor 4
[ 273.921120] i915 0000:00:02.0: Enabled 4 VFs
Errors?
if you see something like IOV0: Initialization failed (-EIO) GT wedged
it just means you need firmware. The logs even tell you where to get it, farther up. https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915
and put in /lib/firmware
If you don’t see the firmware you need there, chances are it’s here:
git://anongit.freedesktop.org/drm/drm-firmware
… this is where the diff/“pull requests” to the linux kernel general come from for the intel driver team. You can git clone this in usr src and copy what you need to /usr/lib/firmware/i915
(this is also a good troubleshooting step – bleeding edge firmware newer than what’s on the linux kernel website).
At the time I’m writing this, that’s not the case and you can inspect individual firmware files to confirm. Our guc firmware is from just a couple weeks ago, 20204-02-24
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/log/i915/adlp_guc_70.bin
I don’t know that it was necessary but in /usr/lib/firmware/i915/ I did
ln -s adlp_guc_70.bin adlp_guc_70.13.1.bin
because it seemed to be looking for that specific filename. I could see that in the commit history though, so the 70.bin was just the “latest version” – probably a bug in the backports driver looking for a specific firmware version.
Null Pointer Dereference
Over the course of working on the video for the linux channel, the i915 backports repo changed enough that the iGPU sr-iov stuff stopped working, kinda. Officially, the i915 backports does not really support sriov on igpus.
This repo is the best resource for using sriov on igpus in general. There is a good chance this repo will work if i915 backports doesn’t work. You will need to remove the other i915 dkms repo if you use this repo, however. I think, at least before kernel 6.8, I recommend this repo.
Code 43
I was getting code43 at first, but it went away after setting vendor_id (using args: in the vm conf) AND updating the driver to this version:
Alarming Things That Remain
It can be annoying enabling more than 7 virtual functions via i915 backports; this seems to be a vestige of the fact we’re on Proxmox.
This isn’t Flex 170 related, as far as I can tell, but is likely PVE kernel related with bleeding edge Xeons:
[ 4342.301217] x86/split lock detection: #AC: CPU 1/KVM/13757 took a split_lock trap at address: 0x7ef3d050
[ 4475.445905] x86/split lock detection: #AC: CPU 0/KVM/13756 took a split_lock trap at address: 0xfffff80756e769af
… have’t had a chance to really dig into that yet. It only happens during pcie or driver init, if it’s going to happen. So things are stable once they’re running but rebooting a VM on a heavily loaded machine has maybe a 1 in 10 chance of being weird. I haven’t seen this before on my other boxes. Maybe EMR Xeon related. Rebooting the VM a second time seems to resolve it.