GPU Passthrough via Ubuntu 20.04

All,

We are currently setting up some VM’s on an Ubuntu 20.04 host. We are having some issues with GPU Passthrough and stability.

We are deploying this via MaaS 3.4 RC1.

I have attached our server details and kernel options to see if anything does not look right.

GPU Passthrough Issue 1:

When we start an openSUSE 15.4 Leap VM, we sometimes receive the following error in the host console at startup. When this happens LXD and our host seem to freeze:

[3295.911851] vfio-pci 0000:86:00.0 timed out waiting for pending transaction: performing function level reset anyway

Currently the PCI address of the NVIDIA card we have installed is 0000:86:00.0

GPU Passthrough Issue 2:

We would like the output of our VM to come through port 1 on the GPU.

Host Hardware Details:

SuperMicro Server: H13SSW (https://www.supermicro.com/en/products/motherboard/h13ssw) 

Bios Version: 1.5 

CPU: AMD EPYC 9224 24-Core Processor (AM5 Genova) 

RAM: 128GB DDR5 

Disk: 4 x nVME (2TB Each) 

GPU: PNY RTX A2000 6GB  

Host OS: Ubuntu 20.04 LTS 

Host Kernel Version: 5.4.0-156-generic

Host VM Engine: LXD

Kernel Options:

console=tty1 console=ttyS1,115200n8 kvm_amd.nested=1 modprobe.blacklist=amdgpu,nouveau,nvidia,nvidiafb,nvidia-drm,radeon,radeonfb,snd_hda_intel amd_iommu=on iommu=pt rd.driver.pre=pci-stub,vfio,vfio_virqfd,vfio_iommu_type1,vfio-pci-core,vfio-pci vfio_iommu_type1.allow_unsafe_interrupts=1 video=vesafb:off,efifb:off pci-stub.ids=10de:2531,10de:228e vfio-pci.ids=10de:2531,10de:228e quiet splash vt.handoff=1 vfio-pci.rombar=0

Here is the device information via lxc config:

architecture: x86_64
config:
  limits.cpu: "44"
  limits.memory: "68719476736"
  limits.memory.hugepages: "false"
  security.secureboot: "false"
  volatile.cloud-init.instance-id: 5af7d1a7-ae38-4d51-ab93-23b2380ff02d
  volatile.eth0.host_name: tapd11fb535
  volatile.eth0.hwaddr: [REDACTED MAC ADDRESS OF NIC]
  volatile.last_state.power: RUNNING
  volatile.last_state.ready: "false"
  volatile.rtxa2000.last_state.pci.driver: vfio-pci
  volatile.rtxa2000.last_state.pci.slot.name: 0000:86:00.0
  volatile.uuid: c53e5812-032f-4b2e-9865-5ee64d410042
  volatile.uuid.generation: c53e5812-032f-4b2e-9865-5ee64d410042
  volatile.vsock_id: "402844175"
devices:
  eth0:
    boot.priority: "1"
    name: eth0
    nictype: bridged
    parent: b-p129s0f0np0
    type: nic
  root:
    boot.priority: "0"
    path: /
    pool: default
    size: "1000000000000"
    type: disk
  rtxa2000:
    gputype: physical
    pci: "86:00.0"
    type: gpu
ephemeral: false
profiles: []
stateful: false
description: ""
created_at: 2023-08-17T17:03:27.310526007Z
name: loving-fly
status: Running
status_code: 103
last_used_at: 2023-08-17T19:26:51.420296284Z
location: none
type: virtual-machine

Any help would be wonderful as we are trying to get this functioning ASAP.