It was a hardware issue. My replacement is running perfectly.
EDIT 6: I boosted the DRAM & CPU voltage by +0.1V. Hope it makes the system a little more stable. EDIT 5: opened report at FreeDesktop GitLab page EDIT 4: nope. Never mind. EDIT 3: Ok…think it’s resolved with combination of a MESA PPA, and Pipewire Upstream PPA (the latter which I think was the one that truly fixed it). EDIT 2: Now the issue is that the displays don’t wake if I leave them alone (not asleep). EDIT: using Ubuntu’s 5.13 as opposed to Pop!_OS’s 5.15 seems to have fixed it, but I’ll keep an eye on it, and update this post.
There’s a weird issue with my 6900XT where it doesn’t display anything after being asleep for a few hours.
I took a nap earlier, then had dinner (so about 6 hours asleep).
If I simply press the reset button, the display doesn’t come back. Only thing that shows up is the the boot screen with the Gigabyte Aorus logo then nothing. I have to turn it off then turn it back on.
Antec Signature Series ST1000, 80 PLUS Titanium Certified, 1000W Full Modular with OC Link Feature, PhaseWave Design, Full Top-Grade Japanese Caps, Zero RPM Mode, 135 mm FDB Silence & 10-Year Warranty
Purchased For $250.00
Custom
LIAN LI Bora Digital Series RGB BR 120mm 3 Fans Pack - Silver Frame
Purchased For $64.00
Custom
Lian Li Bora Digital Series RGB BR 120mm 3 Fans Pack - Black Frame
Unfortunately I don’t have a solution for you but I am going state that dealing with an ASUS Gsync monitor under Linux with all AMD hardware has been a not-ok experience.
I had to manually hack in my EDID to not have a something like 640 x 360 resolution on a 27" display that still can’t get the full refresh rate (120/144).
Had a similar issue with my AMD W6800 on 20.10 Ubuntu, Sadly couldn’t figure out the issues, I kept it so the machine didn’t sleep or turn off the displays and that worked for a while until returned the card for an NVIDIA one, hope there is a fix available to help you out with your issues.
It is not clear to me what you are doing exactly. Why are you pressing the reset button? Normally when the computer goes to suspend or hibernate mode you would use the power button to wake it up. Pressing the reset button would in these cases force a immediate restart of the entire machine, which can’t be what you intent to do.
I understand now. There is still something unclear to me. The 6900XT, that is not waking up, is it connected for the host to use and you don’t get output on the host, or do you get output on the host and only the guest does not wake up anymore?
I had a similar problem. My host system was locked in the evening and when I got up in the morning I was unable to wake it … there was not screen output. I found out that indeed the entire system tends to crash when idle for prolonged periods of time. I fixed it by setting the Power Supply Idle Control setting in the UEFI. Some people refer to that as the Ryzen idle bug. There could be a million things wrong, but you might as well give it a try.
[ 4.474380] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
...
[ 49.635022] amdgpu 0000:0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[ 49.635100] [drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR* KCQ enable failed
[ 49.635215] [drm:amdgpu_device_ip_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v10_0> failed -110
[ 49.635321] amdgpu 0000:0c:00.0: amdgpu: amdgpu_device_ip_init failed
[ 49.635323] amdgpu 0000:0c:00.0: amdgpu: Fatal error during GPU init
[ 49.635325] amdgpu 0000:0c:00.0: amdgpu: amdgpu: finishing device.
[ 49.933058] [drm:dc_dmub_srv_cmd_queue [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
[ 49.933210] [drm:dc_dmub_srv_cmd_queue [amdgpu]] *ERROR* Error queuing DMUB command: status=2
[ 49.933412] amdgpu 0000:0c:00.0: amdgpu: Fail to disable thermal alert!
[ 52.038705] [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x0)
[ 54.143981] [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x0)
[ 56.249192] [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x0)
[ 58.354469] [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x0)
[ 60.459742] [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x0)
[ 60.459770] [drm] free PSP TMR buffer
[ 62.565018] [drm] psp gfx command DESTROY_TMR(0x7) failed and response status is (0x0)
[ 62.871241] [drm:psp_v11_0_ring_destroy [amdgpu]] *ERROR* Fail to stop psp ring
[ 62.872704] amdgpu: probe of 0000:0c:00.0 failed with error -110
[ 62.872856] BUG: unable to handle page fault for address: ffffbc373ecfe000
[ 62.872858] #PF: supervisor write access in kernel mode
[ 62.872859] #PF: error_code(0x0002) - not-present page
[ 62.872860] PGD 100000067 P4D 100000067 PUD 120846067 PMD 0
[ 62.872863] Oops: 0002 [#1] SMP NOPTI
[ 62.872864] CPU: 7 PID: 328 Comm: systemd-udevd Not tainted 5.15.15-76051515-generic #202201160435~1642693824~21.10~97db1bb
[ 62.872866] Hardware name: Gigabyte Technology Co., Ltd. X570S AORUS MASTER/X570S AORUS MASTER, BIOS F3c 10/01/2021
...
[ 62.873396] RIP: 0010:vcn_v3_0_sw_fini+0xcb/0x120 [amdgpu]
Are you doing GPU passthrough. It looks like maybe it is hanging on trying to reinitialize VNC on the card but that could also be a passthrough issue. I don’t see multiple GPUs on this system so if you are doing single GPU passthrough, you are in uncharted territory. Also, start your own thread because the issues that you are having seem to be unrelated to OPs issues.
Okay. Then something is up with your AMD GPU then. Are you still on the 5.13 kernel? I though 5.15 was the current LTS kernel. I would try 5.15 or go to 5.16/5.17 since that should have better baked in support for your kernel.
Specifically, it looks like your card is having a hard time waking from sleep. You could try to unplug the monitor and plug it back in to see if you get something different in your dmesg. Unfortunately, I am still rocking GCN 1.1 and my card is pretty much stable.