Vega 10 and 12 reset application

This has been superseded by the new vendor-reset project which doesn’t require any kernel patching. This application and patch along with all other navi/vega reset patches on this website are now obsolete and should not be used.


Hi All,

As some of you may be aware I have been working to find either a workaround or fix to the AMD Vega reset bug. Last week I posted to AMD’s reddit a cry for help to fix this issue in an attempt to show AMD how much demand there is for this. As a result, an AMD Engineer got in touch and has guided me to a possible solution to the problem.

Over the weekend I have spent considerable time implementing what seems to be a working reset for Vega 10 and 12, initial testing by a few people confirm that it is working on Vega 10, however it needs further testing.

You must apply this patch to your kernel to prevent vfio-pci from attempting to reset the GPU incorrectly.

Please note that this application is intended as a interim workaround while I work on implementing this into the kernel for vfio.

Download reset-test.tar.gz

Usage is simple, obviously you must not be using the GPU at the time and it should be bound to vfio-pci.

./reset-test 0000:24:00.0

The expected output is:

============================================================================

AMD Vega 10/12 Reset Application (Version: 1.0)
Copyright (c) 2019 Geoffrey McRae <[email protected]>

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

This tool is intended as an interim workaround while I port this into the
kernel driver. If you like my work and want to support it you can contribute
using the following methods:

* Ko-Fi   - https://ko-fi.com/lookingglass
* Patreon - https://www.patreon.com/gnif
* BTC     - 14ZFcYjsKPiVreHqcaekvHGL846u3ZuT13

============================================================================

Attempting Vega 10 reset
CMD_READMODIFYWRITE  0x00000e1c
CMD_WRITE            0x00000e1f
CMD_READMODIFYWRITE  0x00000e2b
CMD_READMODIFYWRITE  0x00000e2b
CMD_WAITFOR          0x0001667c
CMD_READMODIFYWRITE  0x00000e2b
CMD_READMODIFYWRITE  0x00000e2b
CMD_READMODIFYWRITE  0x00000e2b
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x00000e2b
CMD_DELAY_MS
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x0001667c
CMD_WAITFOR          0x00000e2b
CMD_READMODIFYWRITE  0x00000e2b
CMD_DELAY_MS
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x0001667c
CMD_WAITFOR          0x0001667c
CMD_READMODIFYWRITE  0x0001667c
CMD_READMODIFYWRITE  0x00000e2b
CMD_READMODIFYWRITE  0x00000e2b
CMD_READMODIFYWRITE  0x00000e2b
CMD_WAITFOR          0x00000e2b
CMD_WRITE            0x00000052
CMD_WRITE            0x00000053

At this point the GPU should successfully post inside a VM, even after a dirty shutdown or VM crash.

A reset for Vega 20 and Navi is possible, but as I do not have these devices to develop against I can not safely implement it. Poking blindly at the wrong registers is dangerous and can destroy the GPU.

If you would like to see Navi also supported you can contribute to the cost to purchase a suitable card below:

Edit: Funding is complete! Thank you everyone for your support!

35 Likes

In the past you did a gofundme. I wonder if people would be willing to fund another one for you to get a Navi GPU.

2 Likes

Good idea, done:

Edit: Funding is complete! Thank you everyone for your support!

4 Likes

Good work

3 Likes

I have a Radeon 7 I’m more than happy to risk for science. If it would be at all helpful I’m more than willing to throw it in a non production machine and set you up a tunnel to it. Assuming you can find out what you need without physical access.

1 Like

Thank you kindly for the offer however while writing this the GPU ends up in a hung state that often crashes the entire machine requiring a physical power down. I also need to be able to see the physical GPU output, hear the fan spin up/down, etc, to validate what is going on as there is no other form of debug output available from the GPU.

1 Like

Edited the OP, but will to post this here also to be sure it’s not missed.

You must apply this patch to your kernel to prevent vfio-pci from attempting to reset the GPU incorrectly.

Not currently using VFIO but just want to say thanks for working on this, as i may do in future. I just haven’t gotten around to it because time-poor at home…

Flicked you a small BTC donation :slight_smile:

2 Likes

Wanted to thank you for doing what AMD wouldn’t do. Hopefully a similar solution would apply to the other GPUs including Polaris and Fiji.
I remember Linus from LTT struggled with the Fury Nanos in his 7 gamers rig precisely because of this bug.
Quick question though; what is Vega 12 as far as graphics cards go?

Vega 12 is the Vega Pro Workstation cards.

2 Likes

It is funny that a gofundme fundraiser is needed to sort out AMD issues. At this point they really do start to look like the old saying - AMD for the poor people. I guess the only path forward is for AMD marketing to just send you a full line-up of gpu cards - two of each and call it a day.

I sent some small donation - thank you for doing this!

3 Likes

Noice. Even if you don’t get Radeon 7 fixed that’s OK. I just use my R7 for host and get a navi GPU for my gaming VM.

2 Likes

I have an R7 that I want to passthrough to a VM, let me know how I can help to test.

2 Likes

Thanks for the offer but in order to add support for R7 I will need one on hand.

Well let me know. I would like to dive into more of the PCIe structure of my system and I know some of the ins and outs of the linux kernel.

I am still working with AMD on getting the bugs and specifics of this reset worked out, other generations will also require one on one with an engineer for the very same reason. It’s not about poking around in the kernel, but having the literal datasheet for the Vega/Navi GPUs.

4 Likes

Would be really cool if someone on the AMD side could help test your code.

As i’m sure AMD have a plentiful supply of AMD GPUs, and this additional support for their cards under linux is going to cost them basically nothing in terms of R&D seeing as you’re doing it - and as per the linux kernel submissions they make, they clearly already have linux developers for Polaris, Vega and Navi cards…

Has anyone you’ve spoken with at AMD indicated whether or not this may be possible?

Thank you for working on this issue. Do you know if the application works equally well with a macOS guest? A fast, modern GPU working with a macOS VM would be phenomenal!

Ordered a 5700xt. It’ll be here Aug 1st. Once it’s installed my Radeon 7 is totally free for testing. I’m more than willing to cover shipping both ways, and sign a piece of paper stating I wont hold you responsible if it dies or falls into the ocean blah blah. Is that something that would be helpful? I do need the card back but not for a while, and if it dies/gets lost I’ll just get another 5700xt when I need it

7 Likes

It certainly would be helpful! PM me when you’re ready and I will pass on the shipping details. Thanks!

2 Likes