Anyone know if AMD’s current line of Pro GPU’s support passthrough?
I’ve done it with old Nvidia GPU’s, but I have no experience with these.
I’m considering retiring the workstation I use all together, and just running it in a VM in my server in the rack, with a passed through GPU.
I like the W7500 in part because it is a pro card, and likely more tested for stuff like VM passthrough. Also I like it because it is 70W or less, and single slot, which means it both fits easily in the server, and doesn’t need a power cable.
Appreciate any thoughts on this plan, or if there is a better GPU to go with.
I pass through an AMD Radeon Pro W6600 to a MacOS VM that I use every day. It doesn’t suffer from the “reset bug” and works great all around, aside from getting a bit hot.
I have seen a couple reports of reset issues coming back on RDNA 3 cards, so you’ll likely want to find someone using a similar card before you buy anything.
I am passing one through with the studio driver installed on both the hypervisor and guest so there aren’t any kernel panics when the VM is powered off and the device is passed back to host.
My host OS is Debian 12 running the 5.10 kernel from the Debian 11 repos. My host GPU is an RTX A5000 handling the Debian side of things (because that’s where I need the heavy lifting).
Since I have two GPUs with separate drivers, and have the AMDGPU driver blacklisted on my host, I can’t speak to passing the GPU between the host and guest.
Interesting, I didn’t realize that happened on workstation cards. I thought that was a consumer card problem, and that passthrough was officially supported on pro-cards.
A couple of years ago I had read that Nvidia now officially supported forwarding their professional/workstation GPU’s to VM’s, so I guess I had presumed that AMD had followed suit.
I’d go with a RTX A2000 instead if I could only find a single slot version, all of them seem to be those stubby little dual slot half height cards.
Do you know if this is even a problem in a linux host if you blacklist the amdgpu kernel driver and block the PCIe device string?
In theory, switching would never become relevant then, as the GPU has never communicated with the host.
(Of course theory and reality are often not the same thing )
It is both single slot, and maybe has a more proven track record with passthrough, but it does require a power connector, and because my server power supplies don’t have PCIe power connectors I’d be using adapters (yuck, dangerous although it wouldn’t be pulling THAT much power…)
It would also (even used on eBay) be a bit of a budget increase…
That said, reliability is key here. I can’t have this “workstation VM” not work. I depend on it for my work.
Yes, the reset bug is an issue no matter what you blacklist. It is a hardware/firmware issue with certain AMD cards. On my system, it manifested as the UEFI resetting my machine with a PCIe error after I tried to reboot the VM.
I second the A4000 choice. None of my NVIDIA cards since Pascal have given me issues with VFIO. If I didn’t need MacOS support, an A4000 certainly would have been my choice.
Anecdotally, if you Google “RX 7600 reset bug,” the results that mention RDNA 3 cards do not look pretty
I wish I could make the A2000 work since it is a 70w part that doesn’t require a power cable, but it only comes in the stubby half height dual slot config.
(well, there is a sketchy single slot one from (I think) a Chinese manufacturer that uses a mobile A2000 chip and requires a sketchy hacked driver from a Chinese server that I wouldn’t touch with a proverbial 39.5ft pole)
The A4000 is a full height single slot card which fits nicely, but it is a 140w part, so it requires a 6pin power connector, which the server does not have.
The server has more than abundant power available (dual 920W redundant power supplies) but the connectors available to me are somewhat limited.
It has six 4pin “molex” power connectors that go straight to the backplane. I could tap into those, but they are already tasked with powering 24x 7200rpm enterprise drives, so I don’t think they have enough spare capacity.
It also has two 8pin EPS connectors, both of which are connected to the server motherboard (Supermicro H12SSL-NT)
The motherboard (even decked out as it is) probably doesn’t draw even near the max power though. When it comes to 12v it has 144w (from 24pin ATX) and then 336w from each of the 8pin EPS connectors for a total of 816w. No matter how cautiously I overestimate power requirements when I add up all of the components (CPU, RAM, NIC’s, SAS HBA, fans, etc.) I can’t get to more than 600w at most, and when I look at the IPMI/BMC logs, the server usually averages about 250w. The peak it ever pulled was ~500W, and that was a real outlier. And this is for all the parts, even those (mainly backplane and hard drives) not powered through the motherboard.
Of course, running my workstation as a VM I will probably be loading it up more, but I still think the margin here is more than enough to justify pulling 75w for the GPU off one of the 8pin EPS connectors.
I’d need an 8pin EPS/CPU (female) Y connector with the Y having an 8pin EPS/CPU (male) connector on one side, and a 6pin PCIe (male) connector on the other.
I’ve found 8pin EPS/CPU Y splitters and 8pin EPS/CPU to 6pin PCIe adapters separately, but daisy chaining these together seems sketchy. I’d very much like to avoid reliability/short/fire situations that can arise from high current adapters.
Maybe the best solution is to make my own. I’ve crimped Mini-Fit connector pins before (though only male ones) and it isn’t that hard. If I do, I could also make it the perfect length so the wires aren’t flopping around where I don’t want them.
It’s still a little sketchy, but totally doable. I am going to have to think about it.
If you’ve got a Supermicro chassis, the power distributor board is happy to supply more than the rated current on any single cable. These are not multi-rail PSU.
I’ve got an old Supermicro 4U 743TQ chassis with two 920 watt PSUs that may be identical to yours (PWS-920P-SQ). They’re rated at 75 amps, each, on the 12 volt rail. Make sure your cables use a quality wire gauge and a couple hundred watts of extra current won’t be an issue.
And yes, those are the same power-supplies. I got them years ago after the 1200W units the server came with (I think they were PWS-1K27A-1R) were uncomfortably loud
So, what you are saying is the cabling for the EPS connectors included with the stock power distribution board can handle higher amperage, or would I need to desolder the stock cables and solder my own all the way back to the board?
Edit:
Hmm. So, 8pin EPS is rated for 336 watt at 12v, so that is 28 amps. but this is over 4 pairs of wires, so I gather they must be 7 amps each.
According to my wire size calculator, for DC, I’d need 14awg cable for 7amps.
The only Mini-Fit jr crimp terminals I can find are 18-24awg in size.
Am I missing something? Or am I just not finding the right pins?
Edit2:
Never mind, I can find pins for slightly larger sizes (up to 16awg) but according to my calculations 16awg still isn’t enough for 7 amps… Yet, Molex rates them for up to 13 amps.
To say I am confused would be an understatement.
Maybe my wire size calculator is wrong.
Edit3:
Yeah, never mind, the wire size calculator I was using seems to not line up with any of the other information I can find. Even 18 awg seems more than enough for the worst case current Id see over the lengths I’d be using, and since 18awg seems to be the easiest to find, I’ll probably just go with that.
Edit4:
Here is what I am assuming to be my worst case current.
Scenario:
Motherboard somehow maxes out the 8pin EPS cable it is plugged into (despite me seeing no way that would be possible based on the power specs of everything that is plugged in, added up)
8 Pin EPS spec is 336w, a total of 28 amps at 12v, but split over four +12v/ground pairs, so each pair is 7 amps.
6 pin PCIe max spec is 75w, so 6.25a total at 12v, but this one is split over three +12v/ground wire pairs, so 2.08 amps each.
In an extremely unlikely scenario with both the motherboard and the GPU maxed out, I’d see 7 amps on one wire pair, and 9.08 amps on the remaining three pairs.
So 9.08a according to the new calculator I found with a 1% allowable voltage drop results in needing no smaller than a 24 gauge wire.
This calculator is a little odd though, as it does not have a place to specify wire length, or insulation type.
Either way, I think I’ll probably be fine with 18ga. Especially since the section of max load will only be between the female and male 8pin connectors, and that can technically be made really short. The longer part of the cable that will go to the GPU only needs to handle the 2.08a current.
At least in my 743TQ, the wires coming off the distribution board are 16 awg. I have PCIe power adapters that go as low as 20 awg that have never given me any trouble. I’m sure 18 will work just fine.
It looks as if it is not fully plugged in, but that was not the case. I tried to pry it off after the fact, before taking this picture. I presume one of the pins was poorly crimped resulting in local heating.
This happened on my old workstation (Sandy-E 6C/12T Core i7-3930k in an Asus P9x79 WS, which I had overclocked to 4.8Ghz under some oversized cooling) during an overnight video encode operation.
As an aside, that thing was quite beastly in late 2011 when I got it. It was already fast at stock, but at 4.8Ghz nothing could touch it for years and a couple of CPU generations, as Ivy Bridge and Haswell didn’t seem to clock as high)
I’ve been a bit leery of power extensions ever since.
Shockingly enough the entire system (except for the PSU cable and extension) survived this and went on to serve for another year until I finally retired the old beast in 2019 when I built my Threadripper 3960x system.
Never thought I’d ever get 8 years out of a CPU. I used to upgrade them every few months when I was back in college in the very early oughts…