Return to Level1Techs.com

Is navi (5700 xt) stable on linux?

Im on fedora 31 and its rock solid. I gamed through xcom2 the chosen and and now playing through the witcher 3. Double digitals hours if not into the hundreds.

No problems on fedora.

On windows 10 I get wierd shit when I play some old games fedora wont play like supreme commander… But I hate windows and rarely use it.

Ubuntu was far better to game on but I just like staying on fedora cause weyland and smooth picard video etc ?

This is on the newer kernals. I had terrible trouble buying the 5700XT on launch. I was stuck on windows for months.

got my 5700XT Pulse a few days ago, and its been totally stable and performance was a big improvement over my Vega 64 Nitro especially on heat. Borderlands 3, Metro Exodus, Deep Rock Galactic, all run just great considering. That was 400€ here in Germany. A decent 2070 Super model wouldve been 150€ extra so p/p ratio is killer.
Im on Manjaro and so far ive had zero issues except !!!:

i assume its the known AMD pixel clock issue where all non standard timings cause the memory to be stuck and it cant idle properly. So my idle temps are in the mid 40s. Im using a 70hz Freesync monitor and going down to 60hz is a workaround for now.
If anyone has a proper ix for that id love to hear it :smiley:
Ive already looked into supplying X with a customEDID but thats not working.

I’ve been using my 5700xt on Manjaro for about 4 weeks now and have had a positive experience. On Windows this same card was experiencing constant black screen crashes in most games that seemed to be related to extended high usage when gaming. Drivers, fan tuning, under-volting, etc. were all useless to resolve. Tried going back to Linux on a whim and it’s been rock solid stable the entire time; without any tweaks involved. I haven’t had a single system crash since hopping back, regardless of game played or GPU utilization. Only thing I’ve since tweaked since that initial setup was to set a fan profile in radeon-profile to cut down the noise a bit when on desktop. Might not be representative of everyone, so YMMV.

giving an update!
just got my 5700 xt from amazon. i did “pacman -Ss nvidia | grep installed” and removed everything that was linked to nvidia and deleted my 20-nvidia.conf xorg file. then I did “pacman -Syu mesa lib32-mesa xf86-video-amdgpu vulkan-radeon lib32-vulkan-radeon libva-mesa-driver lib32-libva-mesa-driver mesa-vdpau lib32-mesa-vdpau” and shutdown, swapped cards, and now i’m sitting right here playing the outer worlds.

my initial thoughts right now is in regards to the pulse. the default bios fan profile appears to be very conservative. playing outer worlds for near two hours now monitoring with lm_sensors, memory hit 90c. so i loaded up corectrl and set a fixed fan rate of 45%, can’t hear it over my case fans, and memory temps drop down to 72c lol. junction temp went from 87c down to 75c. edge from 72c to 60c.

after i play for a few more i’ll get my freesync hooked up. anyone know if amd has the same issue as nvidia with multi 144hz displays? on my 2070 super dual displays at 144hz kept my core at 1.2ghz and memory at 14000mhz. i had to set the one to 60hz and the other to 120hz to allow my card to down clock at idle. i’m hoping on amd i wouldn’t have to do that at idle.

edit:
for mpv hardware acceleration atm i have it set to vo=gpu & hwdec=vaapi. so far videos are extremely smooth playing. but i do have a question, would vdpau be better? on nvidia i was just using nvdec.

3 Likes

Freesync or VRR cannot run under Linux with two monitors connected. You must disable one in software fully.

I’ve heard absolutely no news about this ever being fixed, its quite a bummer I know. ALSO keep in mind Linux does VRR differently so it may be better or much worse then windows. For me its worse then windows and I get intermittent flickering under Linux (using Gsync VRR Freesync mode).

Not sure if AMD’s freesync mode is better then NVIDIA’s VRR compatibility mode.

freesync works well for me. Actually never had any flickering whatsoever. But i take issue with 30W in Idle and about 5-10° more than necessary because the memory wont downclock.

Went with opensuse tumbleweed back in october, kernel 5.3 and mesa 19.2 and never had any issues with it since. I’ve heard a lot of issues from people on newer kernels and mesa so I guess it is just down to luck or what/how the different distros implement the drivers. Feels like there is a possibility the drivers are incomplete somehow on some distros. (?)

Did anyone try running OpenCL on 5700/5700XT (OpenSource drivers exclusively)? I’ve been having trouble on Fedora, some LLVM Bitcode missing, can’t run CL kernels, even though The platform and device are detected. Maybe some distros don’t have that problem?

[The_Riddick]
Freesync or VRR cannot run under Linux with two monitors connected. You must disable one in software fully.
I’ve heard absolutely no news about this ever being fixed, its quite a bummer I know. ALSO keep in mind Linux does VRR differently so it may be better or much worse then windows. For me its worse then windows and I get intermittent flickering under Linux (using Gsync VRR Freesync mode).
Not sure if AMD’s freesync mode is better then NVIDIA’s VRR compatibility mode.

that’s very unfortunate to hear about freesync not working in multi monitor configurations. though that’s my same experience on nvidia on linux too. gsync and freesync doesn’t work in multi either. so i just turned off one monitor when i really wanted the sync. which lately has just been for the outer worlds which i run fullscreen for immersion. most of my other games like world of warcraft and wc3 reforged i just run in a window. and usually have a video playing and my web browser on the second screen.

though do you have sane clocks though? not high elevated with two high refresh rate screens? i haven’t gotten around to hooking up my second yet to test… i tried reading the arch wiki how to setup xorg and tried yesterday but i gave up. i got so use to just using nvidia-settings after all these years. was so simple with nvidia. so i’m just using xfce display settings to set 144hz and letting xorg do the rest automatically…

[ragnarLootbox]
freesync works well for me. Actually never had any flickering whatsoever. But i take issue with 30W in Idle and about 5-10° more than necessary because the memory wont downclock.

that’s odd. my 5700 xt idles to around 8-10w’s. core is reporting 800mhz and gddr6 100mhz at idle. from checking frequently with
sudo cat /sys/kernel/debug/dri/0/amdgpu_pm_info
what distro kernel version are you on? i’m on arch with 5.5.10.

My monitor is just a cheap QNIX, the freesync range isn’t great. I can adjust it on windows without issue, but like I said under Linux even with default settings (48-65 default range) it has blackout issues last time I tried.

ATM I’m not using Linux because I was planning to do a NVMe rebuild with a 2TB NVMe but well… covid-19 said no.

Basically I ran games under NTFS in Linux and I really think it was killing the performance too much (basically 1/10th the speed of windows ntfs performance).

The new setup WAS going to be a full EXT4 configuration, can’t do that now with my crappy 500GB NVMe (Fallout4 with mods eats 200GB by itself).

What I will do now is HOLD off on any further upgrades until end of year or whenever the AUD recovers, then get maybe a 8core APU, new Mobo with dual NVMe slots and a 2TB NVMe and have the 500GB one as a OS drive for both Linux and Windows. (atm its just on a SSD).

The 2TB NVMe was going to cost me $360AUD before shit happened, now its like $600-700AUD!!! yeet!

5.5.8-1-MANJARO

I believe It is bound to the pixel clock. At 70hz its stuck at 800mhz memory, at 60hz it idles properly. Ive not yet understood how i need to modify my custom EDID to achieve proper downclocking.
It is somehow related to a certain pixel clock.

Sapphire Pulse RX 5700 on POP_OS! 18.04 LTS here . Clicked the card in - boot - just works … pun intended … Kernel 5.3 + mesa 2.10 + vulkan (oibaf repo) and nothing but good words …

hey guys, can navi cause MCE errors? I just had a hard system lock up followed by my machine rebooting on its own. I recieved this in journalctl:

Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: Machine check events logged
Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 5: bea0000000000108
Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffad66d6fe MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1585120217 SOCKET 0 APIC 2 microcode 8701013
Mar 25 00:10:19 nix64 kernel: #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15
Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: Machine check events logged
Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: CPU 15: Machine Check: 0 Bank 5: bea0000000000108
Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffc1196eb6 MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Mar 25 00:10:19 nix64 kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1585120217 SOCKET 0 APIC 9 microcode 8701013
Mar 25 00:10:19 nix64 kernel: #16 #17 #18 #19 #20 #21 #22 #23

I never had this before. I’ve owned my 3900x for awhile now and pretty taken back by it.

Could the card cause this issue? It is possible but this seems to be something going on with the CPU itself. Are you running the latest linux-firmware package?

i have:

core/linux-firmware 20200224.efcfa03-1 [installed]

Installed. My motherboard is a MSI X570 Unify with the latest bios installed, 7C35vA3. Which has AGESA 1.0.0.4 Patch B. A3 was released on 1-16-2020. I also have a Corsair RMX 850 Watt PSU, no overclocks / PBO, all default stock auto except XMP. My kit is a Crucial Ballistix Sport LT 3200mhz 32gb kit. Its on QVL and never had issues for 6 months now. Just the GPU swap.

It happened when I was in the middle of playing Warcraft 3 reforged :confused:

That’s interesting. Like I said, the card could cause that if there was a irq that hung due to card access.

Do you have IOMMU enabled on your system. You may try disabling that and see if the issue goes away. The some of the Navi cards are having some issues with that turned on.

dmesg | grep IOMMU states the following:

[ 1.240358] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.245782] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 1.247402] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[ 1.280673] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel [email protected]

So I assume it’s enabled?

edit:
i do want to add yesterday I did test out my 5700 xt on my windows 10 install I have on a ssd i keep around for testing. i just wanted to run 3dmark and bench it to see if its performing as it should. i decided to play a round of call of duty warzone with my buddy and when i went to alt tab out to chrome, my screen flashed black and locked up as well (no mouse or keyboard, couldn’t even activate caps lock), but windows recovered after a little over a minute and reported two TDR’s caused by the amd gpu driver in event viewer. thinking about that crash was very similar to this crash on linux. same symptoms, screen flashed black, hard lock, couldn’t activate caps lock, and after around a minute or so the machine randomly rebooted instead. self rebooting is the only difference than the windows “crash.” i was alt tabbing out of warcraft 3 reforged after i lost my game. i also had corectrl running to set a manual fixed fan rate as well. which was monitoring temps and what not.

i understand linux doesn’t have as graceful way to handle crashes with gpu’s so that’s why i asked if its possible the MCE is related to the GPU. according to that wikipedia article, they can happen from i/o, memory, cpu, or buss. if the gpu hangs or something, how would linux handle it? i would assume in a form of a kernel panic or MCE? depending on how it crashes? i also noticed when i was doing my initial research into navi stability on linux, i did notice people talking about their screens going black and randomly rebooting on them. very similar symptoms to what I just endured. which means it must have been a MCE for them. so your statement about iommu has me interested in if that’s the cause.
edit:
should i disable it via bios (my bios has an option for it according to my manual) or the kernel line amd_iommu=off?

You can disable IOMMU through EFI or you can soft disable with by passing the argument to the kernel. They essentially do the same thing in GNU/Linux land.

Yeah, sounds like it is the Navi card teething issues then. The Linux kernel can and does handle stuff like that gracefully but you need to ensure you have the modules compiled and/or loaded for your kernel, and you need to setup a watchdog time if your distro does not already do that.

Example, on my Poor-dozer 990FX chipset, it has a 90second hardware watchdog that will try to reset the hardware. In my debian install the default watchdog is lie 120 or 200 seconds. Debian will do a reboot if the hardware cannot be recovered.

With the Vega and newer cards I think the AMDGPU reset bug is still a thing until Kernel 5.5 or so. The auto reboot in Windows is probably an MS Windows watchdog that realizes the fault and that it is not recoverable and then just goes for a power cycle.

Again, I am on a 990FX chipset with GCN 1.1 GPU. But my ASUS TUF board has the builtin hardware watchdog.

alright i do believe its my cpu… on windows i just received a critical cache hierarchy fault kernel whea error. well this is very unfortunate. can’t believe my 3900x is kicking the bucket after 6 months… i’ve done nothing but babied it and cooled it with a D15… no overclock ever done at all nor any pbo. i’ll be contacting amd…

Could just be a firmware issue or possibly cache coherency issue due to all of the Intel mitigations that are deemed needed to all x86 CPUs (I don’t know if Intel is still pulling those shenanigans though).