How to SR-IOV Mod the W7100 GPU

Could someone assist me in locating the S7150.bin file. I have been searching Tech PowerUp. The file I found was the 228431.rom, which is 128K. But the dump i did from the w7100 card is 512K. I cannot locate the 512K file. Thank you for your help.

please help me install GIM driver on Kubuntu 20.04 :pleading_face:

Please excuse the intrusion.

Are you sure the actual payload is 512K or is it just zero-padded? Open it in a hex editor and take a look.
Often they just reuse larger chips because supply chain and/or the same chip for multiple cards.

1 Like

Is there anyone who managed to run this successfully on a recent kernel on a TRX40 Aourus Master Xtreme?
I had some luck in building the GIM driver on Fedora 34 but loading it just locks the machine.
SysRq’d stack trace shows the driver’s stuck at atom_exec_bios_table.

I haven’t been able to install Proxmox 6.4. The installer doesn’t start due to graphics driver issues, I assume.

1 Like

Are you using it as an addin or primary gpu? Once you mod it, it only works as secondary

As an add-in on the 3rd slot. It shows up in lspci after flashing the VBIOS but unfortunately no output. it’s a Barco MXRT-7600.
1st slot is occupied by a 6800 XT.

I’ve had trouble getting it work to with the AMDGPU open source drivers, because under Fedora they crash (6800 XT works fine). I’m currently trying to get it to work under debian testing with the AMDGPU Pro drivers and original VBIOS. If it works, I’m going to try flashing it again.

Edit: I was able to get the W7100 working with the regular BIOS on debian testing. After flashing the S7150 BIOS modprobe gim just panics but fortunately, doesn’t end up locking the machine up.
panic.txt (8.7 KB)

Looking at this in May of 2021, I am seeing W7100 going for over $400.00 I was wondering if there was a way to turn a Vega 56 into a mini V340. Since I have one at arms length.

Did anyone get past that hot reload problem with the w7100? I’ve got the 8gb w7100 with the 512kb 25Q41B dual/quad EEPROM.

I get to the part where this payload gets delivered (dmesg -wH above):
https://raw.githubusercontent.com/GPUOpen-LibrariesAndSDKs/MxGPU-Virtualization/master/drv/gim_vbios_patch.h

Then the whole machine hard hangs. I wonder if this is the hot reload bug that’s been discussed… Anyone have any ideas? I could really use some help here as I’m finding myself somewhat stuck.

So far flashed the s7150 firmware with amdvbflash and made the modifications to GIM and compiled as described in @wendell 's OP. Problem comes when trying to modprobe gim, then we get the crash.

2 Likes

Update on this. It appears that the crash is happening after GIM delivers the blob payload as linked above and then attempts to read it back. A friend of mine added some additional debug messages to GIM (basically just some printf around where we crash). I know the EEPROM is called “dual/quad” because it has a few different storage modes/regions.
In the screenshot above you can see GIM reads RLCV option ROM version 113 but wants to patch up to version 129. My guess is that it might be possible (perhaps) to preemptively patch this firmware somewhere in the EEPROM (assuming that’s the chip where it’s stored) so that GIM doesn’t need to try to flash it during modprobe as the blob version dependancy would already be satisfied.
My guess is amdvbflash isn’t able to flash this region as I’ve tried dumping and reflashing various VBIOS images I pulled off real s7150 GPUs to this w7100.


Next I guess I’ll try reflashing it via FlashROM and see if I can somehow force this RLCV option rom to load since amdvbflash doesn’t seem to have the ability to affect this.

1 Like

With additional debug messages added to where the fail seems to happen.

These added printf messages might indicate the read is the problem. Maybe we can just write it and not bother to verify the result of the write and otherwise continue execution. Will report back if that works. Any help would be much appreciated!

Can this also be done on a AMD Radeon Pro WX5100 - 8GB (obviously with different ROM and perhaps driver)?
Just wondering if someone knows if it would work with it since I could get one quite cheap and am interested in trying this.

It might be possible but I’m not even sure anyone has successfully done this with the W7100.

Made a bit of progress on this during this weekend. It turns out the cores have some subtle firmware differences between the single slot S7150 images and dual slot S7150x2 images despite that they identify themselves the same. Actual chip topology includes slight variation in stream processor count, ect… lspci -vvn will enumerate all the proper capabilities if you’re using one of the SPI flash images ripped from an S7150x2 image, and the chip will initialize correctly, however if you try to modprobe gim your kernel locks up. I still don’t have everything working, but I will post a more detailed update soon provided I am actually able to get it working for anyone else who might be interested to reproduce a vGPU multiplexing setup on a W7100.

General question for the thread - has anyone in here actually managed to get a W7100 into SR-IOV mode producing IOMMU groups for each of it’s virtual functions or is everyone in here stuck at some stage of converting the card and getting a modified version of GIM to successfully modprobe without crashing the host?

1 Like

Just got GIM loaded on the W7100! There might be a few additional small tweaks to GIM outside those from OP. Will spend a bit of time validating with real world workloads and then I’ll try to do a proper recap with steps to reproduce.

2 Likes


Working W7100 flashable vbios image here:
https://mega.nz/file/WHQBCYyL#I1NyUI4DOajqe2-Pb2yPImOoaDWaiWFdxk1CfRBcBfo

Flash the binary file linked above with the following utility:

Further posts to come.

Edit: Guests are still having problems interfacing with this device - do not consider this working. I will post a workaround for some guest drivers if/when I’m able to resolve it. I think the guest drivers are seeing a device ID that doesn’t exist (Virtual W7100) and they don’t know what to do with it. I have some ideas on possible workarounds, although I guess we’ll see if they work. Stay tuned.

Edit 2: It turns out the VendorID:DeviceID resolves to 1002:0000 in the guest which is undefined. To fix this you need to modify the kernel loaded amdgpu driver, or you need to use amdgpu-dkms and modify the deb package containing the sources. I prefer the DKMS method since this doesn’t require you to recompile the kernel every time you want to update your guest. I don’t think AMD will upstream a patch for this since we’re technically doing something they don’t sanction officially, but if they would be willing I’d submit a small patch to make this work by default.

Edit 3: You can actually spoof the host vendor ID and device ID as well as pass in a guest ROM image from KVM. This essentially fools the guest into believing that the driver it is seeing is a S7150V (virtual device). This appears to have been the final missing piece - doing it this way means there is no requirement to recompile drivers and compatibility with Windows devices where driver recompilation may not be possible (in cases where drivers sources are unavailable) works perfectly with existing compiled drivers!

Yet another update:

The W7100 I have been working on throughout this thread has been able to produce VMs and has been able to run graphics workloads one VM at a time. Once I would create a second VM and then run a graphics workload next to the first VM running it’s own graphics workload the W7100 would lock up the host. The following error message would loop in dmesg shortly before the host would either entirely crash or would stop responding to SSH commands:

So after a ton of messing around with the specific W7100 I had been working on I finally decided to try on card just to see if it would work. It turns out after I performed the exact same process and otherwise flashing the identical image, applying the same modifications to GIM, ect… I was able to load up my unit test with a bunch of VMs running simultaneous Direct X12 programs without issue. This issue appears to be difficult to reproduce it seems. The only other possibility I can think of is that I somehow may have physically damaged the card during the modification process but this seems unlikely. Anyhow, if you’re doing this modification yourself and run into any troubles feel free to direct message me. I’d love to help anyone who has decided to go down this (somewhat) odd path of applying these modifications.

Given I have both the non-working and working modified cards I may spend some time trying to narrow the differences between them. If this does actually come down to something like slight variations between card hardware I’d like to create a few scripts and automated tests perhaps to check if a given W7100 is actually capable of being successfully modified in this way.

3 Likes

Can you still get video output on the host once you enable SR-IOV?

Can you still get video output on the host once you enable SR-IOV?

Great question, I have no idea as I was just using this over SSH.
I can try to dig up this old setup and try it but idk if I’ll be able to get it to work again since it’s kind of been forgotten. Just get an Nvidia GPU instead, those work much better now. Relevant guide documentation here - links to various wikis, ect… included at the bottom of the page:

I think it might be possible to merge AMDGPU with GIM to provide shared host DRM and SR-IOV functionality. Intel’s i915 driver currently supports this out of the box and it is possible using Nvidia drivers as well. AMD likely won’t invest in developing this capability as they are more reactive than innovating ahead/forward looking. The OpenMdev project might try to spend some time on a proof of concept AMDGPU + GIM merged driver for the W7100.
https://openmdev.io/index.php/Merged_Drivers

That being said the value in spending time on this hardware/software is questionable at best as there’s really no strong reason for anyone to use a GPU architecture EOL’d in 2017 when there are so many compelling alternatives.

Yeah I have turned to used Tesla for now, and if I want to have display capability I can do Geforce cards. vgpu_unlock is the best tool we have right now.

I’m late to the party…whatever happened with this? I came here from the One GPUs, Two OSes video as I’ve been watching what seems like the entire back catalog of Level1 videos as I’m hyping myself up for my first home server build (at least, “real” home server- I’ve got some Pis and old desktops set up to do stuff like Pihole, DHCP, NAS, game servers, etc.).

Anyways, I was watching and when he waved around the W7100, it reminded me of the W5100 I bought cheap for an extremely budget rig that is mostly used for my kids to play Minecraft and Fall Guys and etc. It is currently gathering dust as I upgraded to a GTX 1060.

The thing is, they are definitely underpowered, but if the goal is just to test or do proof-of-concept, they are plentiful and cheap. Mine was about $25 with shipping on Ebay, and in the worst case scenario, makes a fine pass-through.

I honestly don’t care if it lives or dies, so I’m happy to experiment or whatever but I’ll admit to being a bit out of my depth. I would say I have the kind of familiarity with Linux that most people have with Windows- I use Linux somewhat often as a Master’s student in Cybersecurity, I have Pinebook Pro with both Ubuntu and Armbian Budgie on it, and I use WSL-2 a lot (mostly Kali and Ubuntu). I can use the CLI and generally make it do what I want…until something weird or esoteric breaks, and then I’m usually stuck and back to following tutorials from SO or some random website.

So if anyone has any ideas on where to start, please let me know. Thanks!