NVIDIA Single Slot Card for VFIO Passthrough (RTX 4000 Ada)

So, kicking around what my replacement workstation is going to look like. I use VFIO passthrough for my work VMs which don’t exactly require the most powerful GPU in the world, but I would like to be able to play around with some ML stuff. As the latest AMD cards still have reset bugs, I’m going to look at NVIDIA. Naturally the card that caught my eye is the RTX 4000 Ada 20GB. Has anyone used this with VFIO and passed it through to a VM? My plan is to use it for my work VMs, and then be able to use it under linux for playing with some ML stuff.

As an alternative there is a T1000 card, but obviously probably wouldn’t be able to much if any ML with it.

I would be using this in a AMD TRX50 platform.

Thanks!

The NVIDIA RTX 4000 Ada is exactly what I am using for my Windows guest on Linux host. The usual blacklisting and passing through the whole card song and dance.

I should note that I have an issue with the system resetting randomly regardless of load. The issue only happens when one specific VM is active and it’s the one with the GPU. So far, I have managed to rule out faulty RAM and huge pages. The only known variables I have not ruled out are the GPU and probably some VM settings I forgot I tweaked because I can have other VMs run just fine. Strange coincidence that the issues started on an otherwise perfectly stable VM a week after I replaced the GeForce GT 1030 with the RTX 4000 Ada.

Maybe check USB devices as well. On my B650 board the 2nd and third USB-controller is very picky. Its instand-reboot time when I connect any USB-hub or certain devices to these passed through controllers.

I’ve got a pretty large fleet (50+) of 2000 Ada and 4000 Ada GPUs in production right now, passed through on Proxmox. They work great, and have no reset issues to date. (the cards are pooled so each GPU gets reassigned to a new VM probably every 8 hours or so. I’d find out quickly if they didn’t reset well.)

3 Likes

Thanks! One question I have is if it also provides sound by any chance? For instance my current card (an AMD Vega 64) has a sound device (from lspci): Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64]

You gotta pass through both functions from the same graphics card.

<hostdev mode="subsystem" type="pci" managed="yes">
  <source>
    <address domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
  </source>
  <address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
</hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
  <source>
    <address domain="0x0000" bus="0x01" slot="0x00" function="0x1"/>
  </source>
  <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
</hostdev>

:point_up_2: That’s for my host, of course. Your values may differ depending on your system setup (like where you plugged your graphics card).

Here is the audio device in the guest:

The USB devices have been fine. I’ve been passing through the same controllers for over a year.

I suspect it’s actually a power quality issue now, and I’ll get to test it today. If not, there are more tests that can employ to narrow down the source of the problem (saving the RTX 4000 Ada for last).

EDIT: Nope. Not the power. But turning huge pages back on aggravates the system quite a bit more, increasing the rate of random resets.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.