Tesla P4 at wits end with vGPU

ucav117 · March 20, 2023, 7:47pm

I need a sanity check and maybe some help/advice from someone that really knows what they are doing here.

So I have been trying to get vGPU working on my Proxmox VM server and I feel like I am just chasing my tail and am turned around on what I should be doing.

Ok so I have the Tesla P4 in my proxmox server. I have the datacenter drivers and the vGPU unlock rust script installed on proxmox (used the craft computing guide: Proxmox GPU Virtualization Tutorial with Custom Profiles thanks to vGPU_Unlock-RS - YouTube ) Card reports in NVIDIA-SMI perfectly fine:

mdevctl types also seems to return good info:

But when I attach the vGPU to a VM and try to start it I get this error:

mdev instance '00000000-0000-0000-0000-000000000104' already existed, using it.
kvm: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:0e:00.0/00000000-0000-0000-0000-000000000104,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: vfio 00000000-0000-0000-0000-000000000104: error getting device from group 38: Input/output error
Verify all devices in group 38 are bound to vfio-<bus> or pci-stub and not already in use
TASK ERROR: start failed: QEMU exited with code 1

Or this error:

TASK ERROR: mdev instance '00000000-0000-0000-0000-000000000103' already exits, but type is not 'nvidia-59'

I am hoping that others who have done vGPU setups on proxmox can shed some light on what my issue is.

lemij31400 · March 20, 2023, 8:23pm

I have an A2 at home so I can try see if I can figure it out and see if I can help!

AbsolutelyFree · March 20, 2023, 9:04pm

Does nvidia-smi vgpu show anything?

I like craft computing, but his instructions are essentially just a less verbose version of these. Try following along with those and see how far you get.

Also there is a known bug with vGPU on Proxmox when it comes to destroying vGPUs, read through this thread for info and a fix that works. A patch was submitted but I don’t believe it has been accepted etc yet.

EDIT: Reading your error messages more closely, it looks like the errors are saying that a vGPU already exists in both (but a different UUID) and that the type of vGPU that you are attempting to create is different from the one that already exists. Your screenshot of the output from mdevctl types also shows that you have no available instances of any type other than nvidia-285. Be aware that you cannot mix and match instance types, I believe that they can have the same amount of VRAM (for instance GRIDP40-8C and GRIDP40-8Q) but I might be wrong about that. They definitely cannot have different amounts of VRAM though, that I know for certain.

Also the output saying that you have no available instances of any mdev type other than nvidia-285 implies that your system must have a vGPU running with a framebuffer=8192. If no vGPUs were running, you would see that each mdev would report available instances equal to the total VRAM of the card divided by the amount of framebuffer that mdev is configured to have.

system · December 19, 2023, 3:05pm

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.