I’ve just picked up an XFX RX 7900 XTX and having a bit of fun trying to get it running on my main system with Fedora 37.
I’ve bumped to rawhide kernel (6.1.0-65.fc38.x86_64) but I seem to be running into an issue with loading the firmware.
I noticed Wendell mentioned day 1 Linux support in his review, but I’m beginning to wonder if that was sarcasm!
Relevant portion of the logs:
kernel: amdgpu 0000:4d:00.0: vgaarb: deactivate vga console
kernel: amdgpu 0000:4d:00.0: enabling device (0006 -> 0007)
kernel: [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x744C 0x1002:0x0E3B 0xC8).
kernel: [drm] register mmio base: 0xB1B00000
kernel: [drm] register mmio size: 1048576
kernel: [drm] add ip block number 0 <soc21_common>
kernel: [drm] add ip block number 1 <gmc_v11_0>
kernel: [drm] add ip block number 2 <ih_v6_0>
kernel: [drm] add ip block number 3 <psp>
kernel: [drm] add ip block number 4 <smu>
kernel: [drm] add ip block number 5 <dm>
kernel: [drm] add ip block number 6 <gfx_v11_0>
kernel: [drm] add ip block number 7 <sdma_v6_0>
kernel: [drm] add ip block number 8 <vcn_v4_0>
kernel: [drm] add ip block number 9 <jpeg_v4_0>
kernel: [drm] add ip block number 10 <mes_v11_0>
kernel: amdgpu 0000:4d:00.0: amdgpu: Fetched VBIOS from VFCT
kernel: amdgpu: ATOM BIOS: 113-D7020100-102
kernel: [drm] VCN(0) encode/decode are enabled in VM mode
kernel: [drm] VCN(1) encode/decode are enabled in VM mode
kernel: amdgpu 0000:4d:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
kernel: amdgpu 0000:4d:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
kernel: [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
kernel: amdgpu 0000:4d:00.0: amdgpu: VRAM: 24560M 0x0000008000000000 - 0x00000085FEFFFFFF (24560M used)
kernel: amdgpu 0000:4d:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
kernel: [drm] Detected VRAM RAM=24560M, BAR=256M
kernel: [drm] RAM width 384bits GDDR6
kernel: [drm] amdgpu: 24560M of VRAM memory ready
kernel: [drm] amdgpu: 32071M of GTT memory ready.
kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072
kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
kernel: amdgpu 0000:4d:00.0: Direct firmware load for amdgpu/psp_13_0_0_sos.bin failed with error -2
kernel: amdgpu 0000:4d:00.0: amdgpu: failed to init sos firmware
kernel: [drm:psp_sw_init [amdgpu]] *ERROR* Failed to load psp firmware!
kernel: [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init of IP block <psp> failed -2
kernel: amdgpu 0000:4d:00.0: amdgpu: amdgpu_device_ip_init failed
kernel: amdgpu 0000:4d:00.0: amdgpu: Fatal error during GPU init
kernel: amdgpu 0000:4d:00.0: amdgpu: amdgpu: finishing device.
kernel: amdgpu: probe of 0000:4d:00.0 failed with error -2
I should note that while my (also installed) 6900XT appears to initialize correctly, I can’t get display out of either when both are installed without resorting to nomodeset.
Switching to a text console works fortunately, and the system shuts down cleanly.
I haven’t had much luck with amdgpu-pro either. I can get the packages for rhel9 installed, ROCm & HIP report they’re working (via rocminfo, hipconfig) but segfault as soon as I try to access them, at least via pytorch.
Updating mesa to 22.3.0-2 from rawhide has resolved the issues with vulkan. All the games I’ve tested seem to run fine (Stray, Valheim & CP2077) and show between 40 and 70 percent improvement over my 6900XT.
I still haven’t had any success with compute, and I’ve noticed a regression with 2D graphics; firefox frequently (3 times in the last couple hours) causes hard lockups, necessitating a forced reset. I haven’t noticed anything particular in common at the time like video playback, but anecdotally it seems to be just after switching tabs.
The last logs recorded prior to lockup are:
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=426303, emitted seq=426305
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox pid 10128 thread firefox:cs0 pid 10249
kernel: amdgpu 0000:4d:00.0: amdgpu: IP block:gfx_v11_0 is hung!
kernel: amdgpu 0000:4d:00.0: amdgpu: GPU reset begin!
kernel: [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:67:crtc-0] hw_done or flip_done timed out
gnome-shell[2735]: Could not release device '/dev/input/event2' (13,66): Timeout was reached
After quickly working out that my pop_os! install was dead in the water after the switch to my 7900 xtx, I tried Arch Linux having read that the 6.0 kernel should support the card.
Any time I start GDM,
Blinking cursor…
Tried installing the amdgpu driver per arch wiki on AMDGPU, blinking cursor
compiled 6.1, blinking cursor.
Seems when people try Linux with these cards, it’s ether a flawless boot, or unsuccessful.
Well I ended up getting gnome loaded, installed the testing branch of linux-firmware that contains the latest amdgpu. But it’s definitely not ready for prime time. Something very wrong with the image buffer. Lots of screen flickering and image ghosting. Guessing the configuration is going to take time to mature.
No dice so far… The fedora provided rocm/hip drivers on mainline and rawhide are both borked - they give me this:
$ rocminfo
ROCk module is loaded
Unable to open /dev/kfd read-write: Resource temporarily unavailable
sythezn is member of render group
$ rocm-clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (3513.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 0
$ clinfo
Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.1 AMD-APP (3513.0)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback
Platform Extensions function suffix AMD
Platform Host timer resolution 1ns
Platform Name AMD Accelerated Parallel Processing
Number of devices 0
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) AMD Accelerated Parallel Processing
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No devices found in platform [AMD Accelerated Parallel Processing?]
clCreateContext(NULL, ...) [default] No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No devices found in platform
ICD loader properties
ICD loader Name OpenCL ICD Loaderns
ICD loader Vendor OCL Icd free softwarens
ICD loader Version 2.3.1ns
ICD loader Profile OpenCL 3.0ns
I’ve been trying to get amdgpu-pro setup correctly, but not having much luck so far.
Keep running into either hard lockups or just no platform reported at all.
mesa 22.3 is in mainline @updates now.
The firmware on rawhide has been updated as well: dnf update --enablerepo=rawhide amd-gpu-firmware linux-firmware
Seems a lot more stable - haven’t had any lockups from firefox since updating.
Edit: scratch that. still getting lockups from firefox, but they’re more graceful (app goes unresponsive rather than hard system lockup).
Was there a linux video on the l1 channel for these GPUs? Its mentioned in the main review but I found nothing. Still not much luck getting this usable
I’ve seen some reports of the drivers working with kernels as far back as 5.16, but I haven’t had any luck at all below 6.1. I’m on 6.2 at the moment and it seems fairly stable for opengl and vulkan on Wayland at least (apart from ongoing issues with Firefox specifically).
I don’t use Pop OS myself, but from what I understand it’s still Xorg by default? May be support just isn’t there yet for the older renderer?
Bit of an update, I’ve been able to get ROCM working with my 6900XT at least by setting the ROCR_VISIBLE_DEVICES environment variable so the 7900XTX is excluded.
For the moment that at least brings my workstation back to functional.
Still no dice with HIP/ROCM, or OpenCL via HIP on the 7900XTX, but gauging by the support list on AMD’s docs that’s expected as there’s no RDNA3/CDNA3 support yet.
Firefox is still exhibiting weird symptoms. With hardware acceleration disabled I get long (5+ seconds) delays where the app goes unresponsive whenever I open a context menu via right click or the (meta?) key.
There’s also long delays when rendering more complex pages, though I somewhat expect that.
Watching the logs I see this at the time of a context menu command, though it’s not clear to me whether it’s related: