All,
We are currently setting up some VM’s on an Ubuntu 20.04 host. We are having some issues with GPU Passthrough and stability.
We are deploying this via MaaS 3.4 RC1.
I have attached our server details and kernel options to see if anything does not look right.
GPU Passthrough Issue 1:
When we start an openSUSE 15.4 Leap VM, we sometimes receive the following error in the host console at startup. When this happens LXD and our host seem to freeze:
[3295.911851] vfio-pci 0000:86:00.0 timed out waiting for pending transaction: performing function level reset anyway
Currently the PCI address of the NVIDIA card we have installed is 0000:86:00.0
GPU Passthrough Issue 2:
We would like the output of our VM to come through port 1 on the GPU.
Host Hardware Details:
SuperMicro Server: H13SSW (https://www.supermicro.com/en/products/motherboard/h13ssw)
Bios Version: 1.5
CPU: AMD EPYC 9224 24-Core Processor (AM5 Genova)
RAM: 128GB DDR5
Disk: 4 x nVME (2TB Each)
GPU: PNY RTX A2000 6GB
Host OS: Ubuntu 20.04 LTS
Host Kernel Version: 5.4.0-156-generic
Host VM Engine: LXD
Kernel Options:
console=tty1 console=ttyS1,115200n8 kvm_amd.nested=1 modprobe.blacklist=amdgpu,nouveau,nvidia,nvidiafb,nvidia-drm,radeon,radeonfb,snd_hda_intel amd_iommu=on iommu=pt rd.driver.pre=pci-stub,vfio,vfio_virqfd,vfio_iommu_type1,vfio-pci-core,vfio-pci vfio_iommu_type1.allow_unsafe_interrupts=1 video=vesafb:off,efifb:off pci-stub.ids=10de:2531,10de:228e vfio-pci.ids=10de:2531,10de:228e quiet splash vt.handoff=1 vfio-pci.rombar=0
Here is the device information via lxc config
:
architecture: x86_64
config:
limits.cpu: "44"
limits.memory: "68719476736"
limits.memory.hugepages: "false"
security.secureboot: "false"
volatile.cloud-init.instance-id: 5af7d1a7-ae38-4d51-ab93-23b2380ff02d
volatile.eth0.host_name: tapd11fb535
volatile.eth0.hwaddr: [REDACTED MAC ADDRESS OF NIC]
volatile.last_state.power: RUNNING
volatile.last_state.ready: "false"
volatile.rtxa2000.last_state.pci.driver: vfio-pci
volatile.rtxa2000.last_state.pci.slot.name: 0000:86:00.0
volatile.uuid: c53e5812-032f-4b2e-9865-5ee64d410042
volatile.uuid.generation: c53e5812-032f-4b2e-9865-5ee64d410042
volatile.vsock_id: "402844175"
devices:
eth0:
boot.priority: "1"
name: eth0
nictype: bridged
parent: b-p129s0f0np0
type: nic
root:
boot.priority: "0"
path: /
pool: default
size: "1000000000000"
type: disk
rtxa2000:
gputype: physical
pci: "86:00.0"
type: gpu
ephemeral: false
profiles: []
stateful: false
description: ""
created_at: 2023-08-17T17:03:27.310526007Z
name: loving-fly
status: Running
status_code: 103
last_used_at: 2023-08-17T19:26:51.420296284Z
location: none
type: virtual-machine
Any help would be wonderful as we are trying to get this functioning ASAP.