Return to Level1Techs.com

Is navi (5700 xt) stable on linux?

freesync works well for me. Actually never had any flickering whatsoever. But i take issue with 30W in Idle and about 5-10° more than necessary because the memory wont downclock.

Went with opensuse tumbleweed back in october, kernel 5.3 and mesa 19.2 and never had any issues with it since. I’ve heard a lot of issues from people on newer kernels and mesa so I guess it is just down to luck or what/how the different distros implement the drivers. Feels like there is a possibility the drivers are incomplete somehow on some distros. (?)

Did anyone try running OpenCL on 5700/5700XT (OpenSource drivers exclusively)? I’ve been having trouble on Fedora, some LLVM Bitcode missing, can’t run CL kernels, even though The platform and device are detected. Maybe some distros don’t have that problem?

[The_Riddick]
Freesync or VRR cannot run under Linux with two monitors connected. You must disable one in software fully.
I’ve heard absolutely no news about this ever being fixed, its quite a bummer I know. ALSO keep in mind Linux does VRR differently so it may be better or much worse then windows. For me its worse then windows and I get intermittent flickering under Linux (using Gsync VRR Freesync mode).
Not sure if AMD’s freesync mode is better then NVIDIA’s VRR compatibility mode.

that’s very unfortunate to hear about freesync not working in multi monitor configurations. though that’s my same experience on nvidia on linux too. gsync and freesync doesn’t work in multi either. so i just turned off one monitor when i really wanted the sync. which lately has just been for the outer worlds which i run fullscreen for immersion. most of my other games like world of warcraft and wc3 reforged i just run in a window. and usually have a video playing and my web browser on the second screen.

though do you have sane clocks though? not high elevated with two high refresh rate screens? i haven’t gotten around to hooking up my second yet to test… i tried reading the arch wiki how to setup xorg and tried yesterday but i gave up. i got so use to just using nvidia-settings after all these years. was so simple with nvidia. so i’m just using xfce display settings to set 144hz and letting xorg do the rest automatically…

[ragnarLootbox]
freesync works well for me. Actually never had any flickering whatsoever. But i take issue with 30W in Idle and about 5-10° more than necessary because the memory wont downclock.

that’s odd. my 5700 xt idles to around 8-10w’s. core is reporting 800mhz and gddr6 100mhz at idle. from checking frequently with
sudo cat /sys/kernel/debug/dri/0/amdgpu_pm_info
what distro kernel version are you on? i’m on arch with 5.5.10.

My monitor is just a cheap QNIX, the freesync range isn’t great. I can adjust it on windows without issue, but like I said under Linux even with default settings (48-65 default range) it has blackout issues last time I tried.

ATM I’m not using Linux because I was planning to do a NVMe rebuild with a 2TB NVMe but well… covid-19 said no.

Basically I ran games under NTFS in Linux and I really think it was killing the performance too much (basically 1/10th the speed of windows ntfs performance).

The new setup WAS going to be a full EXT4 configuration, can’t do that now with my crappy 500GB NVMe (Fallout4 with mods eats 200GB by itself).

What I will do now is HOLD off on any further upgrades until end of year or whenever the AUD recovers, then get maybe a 8core APU, new Mobo with dual NVMe slots and a 2TB NVMe and have the 500GB one as a OS drive for both Linux and Windows. (atm its just on a SSD).

The 2TB NVMe was going to cost me $360AUD before shit happened, now its like $600-700AUD!!! yeet!

5.5.8-1-MANJARO

I believe It is bound to the pixel clock. At 70hz its stuck at 800mhz memory, at 60hz it idles properly. Ive not yet understood how i need to modify my custom EDID to achieve proper downclocking.
It is somehow related to a certain pixel clock.

Sapphire Pulse RX 5700 on POP_OS! 18.04 LTS here . Clicked the card in - boot - just works … pun intended … Kernel 5.3 + mesa 2.10 + vulkan (oibaf repo) and nothing but good words …

hey guys, can navi cause MCE errors? I just had a hard system lock up followed by my machine rebooting on its own. I recieved this in journalctl:

Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: Machine check events logged
Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: CPU 7: Machine Check: 0 Bank 5: bea0000000000108
Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffad66d6fe MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: PROCESSOR 1:870f10 TIME 1585120217 SOCKET 0 APIC 2 microcode 8701013
Mar 25 00:10:19 nix64 kernel: #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15
Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: Machine check events logged
Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 5: bea0000000000108
Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffc1196eb6 MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1585120217 SOCKET 0 APIC 9 microcode 8701013
Mar 25 00:10:19 nix64 kernel: #16 #17 #18 #19 #20 #21 #22 #23

I never had this before. I’ve owned my 3900x for awhile now and pretty taken back by it.

Could the card cause this issue? It is possible but this seems to be something going on with the CPU itself. Are you running the latest linux-firmware package?

i have:

core/linux-firmware 20200224.efcfa03-1 [installed]

Installed. i have the latest AGESA 1.0.0.4 Patch B. I also have a Corsair RMX 850 Watt PSU, no overclocks / PBO, all default stock auto except XMP. My kit is a Crucial Ballistix Sport LT 3200mhz kit. Its on QVL and never had issues for 6 months now. Just the GPU swap.

It happened when I was in the middle of playing Warcraft 3 reforged :confused:

That’s interesting. Like I said, the card could cause that if there was a irq that hung due to card access.

Do you have IOMMU enabled on your system. You may try disabling that and see if the issue goes away. The some of the Navi cards are having some issues with that turned on.

dmesg | grep IOMMU states the following:

[ 1.240358] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.245782] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 1.247402] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[ 1.280673] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel [email protected]

So I assume it’s enabled?

edit:
i do want to add yesterday I did test out my 5700 xt on my windows 10 install I have on a ssd i keep around for testing. i just wanted to run 3dmark and bench it to see if its performing as it should. i decided to play a round of call of duty warzone with my buddy and when i went to alt tab out to chrome, my screen flashed black and locked up as well (no mouse or keyboard, couldn’t even activate caps lock), but windows recovered after a little over a minute and reported two TDR’s caused by the amd gpu driver in event viewer. thinking about that crash was very similar to this crash on linux. same symptoms, screen flashed black, hard lock, couldn’t activate caps lock, and after around a minute or so the machine randomly rebooted instead. self rebooting is the only difference than the windows “crash.” i was alt tabbing out of warcraft 3 reforged after i lost my game. i also had corectrl running to set a manual fixed fan rate as well. which was monitoring temps and what not.

i understand linux doesn’t have as graceful way to handle crashes with gpu’s so that’s why i asked if its possible the MCE is related to the GPU. according to that wikipedia article, they can happen from i/o, memory, cpu, or buss. if the gpu hangs or something, how would linux handle it? i would assume in a form of a kernel panic or MCE? depending on how it crashes? i also noticed when i was doing my initial research into navi stability on linux, i did notice people talking about their screens going black and randomly rebooting on them. very similar symptoms to what I just endured. which means it must have been a MCE for them. so your statement about iommu has me interested in if that’s the cause.
edit:
should i disable it via bios (my bios has an option for it according to my manual) or the kernel line amd_iommu=off?

You can disable IOMMU through EFI or you can soft disable with by passing the argument to the kernel. They essentially do the same thing in GNU/Linux land.

Yeah, sounds like it is the Navi card teething issues then. The Linux kernel can and does handle stuff like that gracefully but you need to ensure you have the modules compiled and/or loaded for your kernel, and you need to setup a watchdog time if your distro does not already do that.

Example, on my Poor-dozer 990FX chipset, it has a 90second hardware watchdog that will try to reset the hardware. In my debian install the default watchdog is lie 120 or 200 seconds. Debian will do a reboot if the hardware cannot be recovered.

With the Vega and newer cards I think the AMDGPU reset bug is still a thing until Kernel 5.5 or so. The auto reboot in Windows is probably an MS Windows watchdog that realizes the fault and that it is not recoverable and then just goes for a power cycle.

Again, I am on a 990FX chipset with GCN 1.1 GPU. But my ASUS TUF board has the builtin hardware watchdog.

alright i might be my cpu… ill try replacing it.

Could just be a firmware issue or possibly cache coherency issue due to all of the Intel mitigations that are deemed needed to all x86 CPUs (I don’t know if Intel is still pulling those shenanigans though).

amd responded back to my claim and told me to test on windows as they can’t help with linux. and then respond back to them. even though in my initial comment i told them i tested on windows too. and asked me if i updated to the latest bios… which i put down i did lol.

i guess the support guy saw linux in the first sentance and nope right out of it.

AMD: “We do not support Linux. Try in MS Windows.”
Laughs in Intel

The 5.5.11 Kernel is correcting a lot of my issues it seems. My system no longer has to deal with hangs briefly after start ups or when I turn a monitor on or off. Going to have to check to see if things still flip out when turning on my cintiq.

as an update with my MCE issues, i bit the bullet and just ordered another 3900x to test with. but UPS delayed my shipment :confused: support with amd has gone nowhere. responses have been very slow. i did tell them a second time i tested with windows and haven’t gotten any response back. pretty annoying but it is what it is i suppose. i wish the US was similar to europe in regards in going through your retailer with times like this :confused:

in regards to the 5700 xt, when everything was running smoothly with my CPU, performance was great. my experience with it was very pleasant on linux. i’m really excited to get everything fixed up to finally play around with it more.

I wonder is the stock voltages settings is what is killing your CPU.