Return to Level1Techs.com

KVM with MxGPU (firepro s7150)

I have bought a dell poweredge r730 and a AMD firepro s7150 graphics card which I was hoping to use for hosting hardware accelerated VMs(VDI). I have created VMs before with KVM and pcie passtrough but I am struggling with getting my MxGpu compatible graphics card to work.

I am by no means a Linux guru but more of a hobbyist so assume my knowledge with both Linux and KVM is minimal. I am listing every thing I have done so fare so sorry for the wall of text.

I have looked at the AMD driver page for KVM which points to a github repo for GIM. It states that GIM has only been tested for a couple of old OS and kernel versions

The tested host OS for GIM is Ubuntu16.04.2. All other hypervisor SW(KVM, XEN, QEMU, LIBVIRT) versions are aligned with default version of OS.

To be more specific then it lists Ubuntu 16.04.2 server wit kernel 4.4.0-75-generic. GIM also supply a kernel patch which is made for 4.4.0-75-generic.

I have therefor installed Ubuntu 16.04.2 which default comes with kernel version 4.4.0-173-generic. I ended up downloading kernel version 4.4.0-75-generic and setting it as my default kernel in grub in order to follow the recommendations from the GIM repo as close as possible. After doing this then I get a problem with my ethernet adapters and lshw -class network does not list them any more. I think thats because the ethernet drivers are not available in that kernel version.

So as I got problems with my ethernet adapters I changed back to use kernel 4.4.0-173-generic. I cloned the GIM repo and compiled the driver. which gives me a gim.ko file. When I run make install I get the following error:

make -C /lib/modules/4.4.0-173-generic/build M=/home/miivers/projects/drivers/MxGPU-Virtualization/drv modules_install
make[1]: Entering directory ‘/usr/src/linux-headers-4.4.0-173-generic’
INSTALL /home/miivers/projects/drivers/MxGPU-Virtualization/drv/gim.ko
At main.c:222:

  • SSL error:02001002:system library:fopen:No such file or directory: bss_file.c:175
  • SSL error:2006D080:BIO routines:BIO_new_file:no such file: bss_file.c:178
    sign-file: certs/signing_key.pem: No such file or directory
    DEPMOD 4.4.0-173-generic
    make[1]: Leaving directory ‘/usr/src/linux-headers-4.4.0-173-generic’

To me it seems like signing the kernel driver fails. I therefor installed sudo apt install shim-signed and ran sudo update-secureboot-policy --new-key. When i tried running make install again and got the same error. I found a guide for signing kernel modules which I followed. Along the way I got the error EFI variables are not supported on this system. After some googleing I found a page which say that older Ubuntu versions has issues signing kernel modules on non UEFI systems. I restarted and changed my boot options from bios to UEFI and disabled secure boot. After doing this I could no longer see my drive in the boot menu(have installed ubuntu on the redundant SD card module). So I reverted these changes. I then tired to add “module.sig_enforce=0” to the linux command line /etc/default/grub and run sudo update-grub and rebooting. I ran insmod gim.ko. lsmod then shows:

Module Size Used by
gim 1007616 0

So it is loaded but not in use.

This is the rest of lsmod output:>

lsmod | grep -i amd
amdkfd 131072 1
amd_iommu_v2 20480 1 amdkfd
amdgpu 991232 0
i2c_algo_bit 16384 1 amdgpu
ttm 98304 1 amdgpu
drm_kms_helper 155648 1 amdgpu
drm 364544 4 ttm,drm_kms_helper,amdgpu

And the lspci output:

lspci -k
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT GL [FirePro S7150]
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT GL [FirePro S7150]
Kernel driver in use: amdgpu
Kernel modules: amdgpu

The following lines is added to /etc/modprobe.d/blacklist.conf

blacklist amdgpu
blacklist amdkfd

I though this should blacklist amdgpu?

Well I am not sure what I should do next. To me it seems amdgpu is loaded even though I blacklist. The GIM driver seems to be loaded(manually) but not in use. Looking for input for what I should do next.

is SR-IOV enabled on the system’s firmware?

1 Like

SR-IOV is enabled from BIOS and virtualization is enabled. I should add that I have tested with ESXI before where the graphics card show up as 16 devices. ESXI require enterprice licenses so I am trying out KVM.

From what I understand I need to prevent the system from using amdgpu as the driver for the graphics card. Thats the whole point with blacklist amdgpu or am I missing something? So when I run lspci -k I should not see amdgpu as the driver in use?

If I were you, I would start with VMware first to confirm if everything works first.
What I heard from many people is that MxGPU is very finicky to setup.

Try this official deployment guide first,
make sure everything works well with R730,
after that, move to KVM

I will be following your path very soon with S7150 once I get it with decent price.

Let me know how it goes

Will this work as multiple GPU’s on Unraid if AMDgpu driver is blacklisted?