Hi gnif, We would urge you to/welcome you to hang around in the Unraid forums also . Though Unraid sounds like a storage focused OS (that part pretty much works), it also tries to be a seamless GPU pass-through/virtualization(qemu)/gaming platform…and that’s where there are challenges.
May be this thread could be your starting point? (Where your L1T post was featured before)
I am currently away but will return in a couple of days, when I do I will setup a GoFundMe for this GPU as I have also had interest from others via PM to provide funding in this way.
@gnif is AMD doing anything itself (since you are in contact with them) with regard to a full fix? Its been over 6m since your fix and a LOT longer since this problem was identified…
AMD are not doing anything directly. I have been continuing to work with AMD on this, a patch for Vega has not yet been made available as it’s far more complex to reset as compared with Navi, and Navi still has issues that are also present on the Vega generation. Once these issues with Navi are resolved I will move back to working on the Vega cards.
So Vega owners are left aside, sigh. Before kernel 5.4 in a guest I did not experience the Vega reset bug for whatever reason, go figure, but since 5.4.0 I have it. Oddly enough, a windows 10 guest doesn’t show the issue.
The reset application doesn’t work for me, “Failed to exit BACO”, quirk applied or not.
Recent kernel activity which looks related seems to focus on Navi as well
This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x. Clear state buffer (resides in vram) is corrupted after 1st baco reset, upon gfxoff exit, CPF gets garbage header in CSIB and hangs.
Hi,
I just signed up to share my experience with this reset tool, I tried the patch with kernel versions 5.3.18 and 5.4.2, but sadly it doesn’t work for me too:
# ./reset-test 0000:43:00.0
============================================================================
AMD Vega 10/12 Reset Application (Version: 1.0)
Copyright (c) 2019 Geoffrey McRae <[email protected]>
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
This tool is intended as an interim workaround while I port this into the
kernel driver. If you like my work and want to support it you can contribute
using the following methods:
* Ko-Fi - https://ko-fi.com/lookingglass
* Patreon - https://www.patreon.com/gnif
* BTC - 14ZFcYjsKPiVreHqcaekvHGL846u3ZuT13
============================================================================
Attempting Vega 10 reset
CMD_READMODIFYWRITE 0x00000e2b
CMD_DELAY_MS
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_WAITFOR 0x0001667c
Wait for timed out.
Failed to exit BACO
This are my PC specs:
AMD Ryzen Threadripper 1950x
Gigabyte X399 Aorus Gaming 7, BIOS v.F12
ROG Strix RX Vega 64 OC Ed.
I have a few questions (for @gnif mostly) that I can’t find any answer in this thread:
what’s the kernel version you tried the patch and reset tool with?
is there a way to check if the kernel has been correctly patched?
@gnif if I remember correctly you asked in the Navi thread to someone if he/she perhaps had a Threadripper, is this processor problematic with this patch?
So no clues on why it shouldn’t work with my configuration? Are there any tests or something I could do to try and help with this problem?
Thanks for your work btw!
Yes I patched the kernel already, that’s why I asked you on what kernel version did you work with, and how to confirm it has been patched correctly (maybe a check somewhere?). When you patch it does it give you errors on some chunk?
If you got errors, it did not patch. The patch simply adds the gpu to the list of quirked devices to prevent a bus reset. It is very simple to apply by hand to drivers/pci/quirks.c.
Also make sure that it covers your GPU? you might need to add a line for yours. Check your GPUs PCI device ID using lspci -nn.