Falcon Northwest MIG - 8 gamers 1 PC with RTX Pro 6000

Are you okay with linux guests? Did you get past the part where you get the list of uuids for the instances and then use the uuids in kvm to pass through that function of the gpu to the guest?

I’ve been a little swamped to do a full walkthrough

is not bad, but not fully complete I guess.

whats your output of :

sudo nvidia-smi mig -lgip

then after you create the instances

sudo nvidia-smi

then

sudo nvidia-smi -L

?

virsh edit and for example:

<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
  <source>
    <address uuid='USE-INSTANCE-UUID-FROM-ABOVE/>
  </source>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
</hostdev>

then from there install a linux guest for diagnostics, what do you see in lspci inside the guest?

you might have to blacklist nouveau on boot, I remember that something terrible happened – panic iirc – when nouveau tried to load.

1 Like

On Fedora everything up until spawning up the MIG instances works, nvidia-smi shows they get created and get an UUID assigned. Libvirt/KVM does not find them though.

On Arch I do not find a compute only driver to test this and the full-GPU driver always binds processes on the card keeping me from spawning MIG instances. Even in non display mode.

Sanity check, the Blackwell is on 22:00.0 so the virsh example should be like this right?

<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
  <source>
    <address uuid='USE-INSTANCE-UUID-FROM-ABOVE/>
  </source>
  <address type='pci' domain='0x0000' bus='0x22' slot='0x00' function='0x0'/>
</hostdev>

Or did I mess up the bus and the slot?

the source section device uses only the uuid as that’s the source device. so its a little different than how passthrough is normally done

the address is the address inside the vm

I have to use virsh edit manually with the uuid to assign them

1 Like
Enabled MIG Mode for GPU 00000000:22:00.0

Warning: persistence mode is disabled on device 00000000:22:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Successfully created GPU instance ID  3 on GPU  0 using profile MIG 1g.24gb+gfx (ID 47)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  3 using profile MIG 1g.24gb (ID  0)
Successfully created GPU instance ID  4 on GPU  0 using profile MIG 1g.24gb+gfx (ID 47)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  4 using profile MIG 1g.24gb (ID  0)
Successfully created GPU instance ID  5 on GPU  0 using profile MIG 1g.24gb+gfx (ID 47)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  5 using profile MIG 1g.24gb (ID  0)
Successfully created GPU instance ID  6 on GPU  0 using profile MIG 1g.24gb+gfx (ID 47)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  6 using profile MIG 1g.24gb (ID  0)
GPU 0: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (UUID: GPU-aaa)
  MIG 1g.24gb     Device  0: (UUID: MIG-d27bbb)
  MIG 1g.24gb     Device  1: (UUID: MIG-ccc)
  MIG 1g.24gb     Device  2: (UUID: MIG-ddd)
  MIG 1g.24gb     Device  3: (UUID: MIG-eee)

<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
  <source>
    <address uuid='d27bbb'/>
  </source>
  <address type='pci' domain='0x0000' bus='0x22' slot='0x00' function='0x0'/>
</hostdev>
error: Failed to start domain 'archlinux'
error: device not found: mediated device 'd27bbbnot found

do you see i t under/
/sys/bus/mdev/devices

ls -l ??

what about the output of that?

oh display=‘on’ and ‘off’ maybe is important too

1 Like

There is no /sys/bus/mdev for me.

I dont have the gpus in this config anymore to test, haven’t had a chance to get back to the how to, you do have /sys/ setup right? anything under there for mdev devices
?

btw for the supported profile, you are using a mig profile thats +gfx right? otherwise you only get compute resources, no gfx resources

What you mean having sys setup?

Directly under sys there is nothing relating to mdev either.

The article I attach relates to vGPU but mentions that there should be a /sys/class/mdev_bus/domain\:bus\:slot.function/mdev_supported_types/ option but for me there is no mdev_bus there either.

It also mentions there should be a /sys/bus/pci/devices/0000:22:00.0/mdev_supported_types/ but it is not there either. It seems like there is nothing at all relating to mdev functionality in Fedora 42 on my computer and it is a pretty fresh install.

I am referencing this article:

does lsmod at least show you’ve got vfio_mdev?

mount | grep sysfs
ls /sys/class/mdev_bus/ maybe?

vfio_pci               20480  0
vfio_pci_core         106496  1 vfio_pci
irqbypass              16384  2 vfio_pci_core,kvm
vfio_iommu_type1       53248  0
vfio                   77824  4 vfio_pci_core,vfio_iommu_type1,vfio_pci
iommufd               139264  1 vfio

… ohhhh I read vfio_mdev has been deprecated and is only mdev now.

A sudo modprobe mdev yielded a few new directories …

/sys/kernel/btf/mdev
/sys/kernel/debug/printk/index/mdev
/sys/class/mdev_bus
/sys/bus/mdev
/sys/module/mdev

I think I might need to load the mdev module first and then try again.

1 Like

Now I have /sys/bus/mdev/devices but it is empty. Strange, but I guess that is more where I have to look.

redo /probe all the Nvidia stuff. that’s weird

1 Like

Thank you for you help so far, the mdev module is a step in the right direction. I read a thread by Red Hat where somebody mentioned, in regard to vGPU, the mdev devices not showing up and the answer was that maybe the driver not installed correctly since it is apprently up the driver to register the cards with MDEV.

I will setup a couple different distributions and install drivers from different sources in the hopes I can disect this or at least see an emerging pattern of sorts. I am still not 100% if it is a software or hardware issue but since I am limited on hardware I’ll go this route first.

Of course I’ll report on the forum if I figure this out.