[RESOLVED] RX 6900XT not displaying anything after being asleep for few hours

Resolved

It was a hardware issue. My replacement is running perfectly.

EDIT 6: I boosted the DRAM & CPU voltage by +0.1V. Hope it makes the system a little more stable.
EDIT 5: opened report at FreeDesktop GitLab page
EDIT 4: nope. Never mind.
EDIT 3: Ok…think it’s resolved with combination of a MESA PPA, and Pipewire Upstream PPA (the latter which I think was the one that truly fixed it).
EDIT 2: Now the issue is that the displays don’t wake if I leave them alone (not asleep).
EDIT: using Ubuntu’s 5.13 as opposed to Pop!_OS’s 5.15 seems to have fixed it, but I’ll keep an eye on it, and update this post.


There’s a weird issue with my 6900XT where it doesn’t display anything after being asleep for a few hours.

I took a nap earlier, then had dinner (so about 6 hours asleep).

If I simply press the reset button, the display doesn’t come back. Only thing that shows up is the the boot screen with the Gigabyte Aorus logo then nothing. I have to turn it off then turn it back on.

Here’s my hardware list:

PCPartPicker Part List

Type Item Price
CPU AMD Ryzen 9 5900X 3.7 GHz 12-Core Processor Purchased For $599.99
CPU Cooler Deepcool ASSASSIN III 90.37 CFM CPU Cooler Purchased For $113.99
Motherboard Gigabyte X570S AORUS MASTER ATX AM4 Motherboard Purchased For $490.00
Memory G.Skill Ripjaws V 64 GB (2 x 32 GB) DDR4-3600 CL18 Memory Purchased For $364.99
Storage Samsung 860 QVO 1 TB 2.5" Solid State Drive Purchased For $0.00
Storage Samsung 980 Pro 2 TB M.2-2280 NVME Solid State Drive Purchased For $369.99
Storage Samsung 870 Evo 2 TB 2.5" Solid State Drive Purchased For $270.00
Storage Seagate Barracuda 2 TB 3.5" 7200RPM Internal Hard Drive Purchased For $0.00
Storage Western Digital WD Black 4 TB 3.5" 7200RPM Internal Hard Drive Purchased For $0.00
Video Card ASRock Radeon RX 6900 XT 16 GB Phantom Gaming D OC Video Card Purchased For $2107.98
Case Lian Li O11D XL-X ATX Full Tower Case Purchased For $270.00
Case Fan Thermaltake Pure Plus RGB TT Premium Edition 70.32 CFM 140 mm Fans 3-Pack Purchased For $80.99
Monitor BenQ EW3270U 31.5" 3840x2160 60 Hz Monitor Purchased For $400.00
Monitor Asus TUF Gaming VG27AQ1A 27.0" 2560x1440 170 Hz Monitor Purchased For $369.99
Custom ZOTAC GTX 1060 6GB Purchased
Custom Antec Signature Series ST1000, 80 PLUS Titanium Certified, 1000W Full Modular with OC Link Feature, PhaseWave Design, Full Top-Grade Japanese Caps, Zero RPM Mode, 135 mm FDB Silence & 10-Year Warranty Purchased For $250.00
Custom LIAN LI Bora Digital Series RGB BR 120mm 3 Fans Pack - Silver Frame Purchased For $64.00
Custom Lian Li Bora Digital Series RGB BR 120mm 3 Fans Pack - Black Frame Purchased For $83.04

My OS:
Operating System: Kubuntu 21.10
KDE Plasma Version: 5.23.5
KDE Frameworks Version: 5.90.0
Qt Version: 5.15.2
Kernel Version: 5.15.11-76051511-generic (64-bit)
Graphics Platform: X11
Processors: 24 × AMD Ryzen 9 5900X 12-Core Processor
Memory: 62.7 GiB of RAM
Graphics Processor: AMD Radeon RX 6900 XT

Unfortunately I don’t have a solution for you but I am going state that dealing with an ASUS Gsync monitor under Linux with all AMD hardware has been a not-ok experience.

I had to manually hack in my EDID to not have a something like 640 x 360 resolution on a 27" display that still can’t get the full refresh rate (120/144).

I don’t have a G-Sync monitor. My ASUS is a FreeSync (officially G-Sync-compatible).

Had a similar issue with my AMD W6800 on 20.10 Ubuntu, Sadly couldn’t figure out the issues, I kept it so the machine didn’t sleep or turn off the displays and that worked for a while until returned the card for an NVIDIA one, hope there is a fix available to help you out with your issues.

Ah my mistake, good to hear. My heartrate is coming back down.

It is not clear to me what you are doing exactly. Why are you pressing the reset button? Normally when the computer goes to suspend or hibernate mode you would use the power button to wake it up. Pressing the reset button would in these cases force a immediate restart of the entire machine, which can’t be what you intent to do.

I meant to restart the desktop because nothing is showing.

I understand now. There is still something unclear to me. The 6900XT, that is not waking up, is it connected for the host to use and you don’t get output on the host, or do you get output on the host and only the guest does not wake up anymore?

The 6900XT was never used on the guest. I have a GTX 1060 for that.

I had a similar problem. My host system was locked in the evening and when I got up in the morning I was unable to wake it … there was not screen output. I found out that indeed the entire system tends to crash when idle for prolonged periods of time. I fixed it by setting the Power Supply Idle Control setting in the UEFI. Some people refer to that as the Ryzen idle bug. There could be a million things wrong, but you might as well give it a try.

Trying to find that in Gigabyte X570s Aorus Master. Is that related to ErP (Energy-related Products)?

Edit: found “Power Loading”, which I enabled.

It is called Power Supply Idle Control in the Tweaker submenu. You need to set it to Typical Current Idle.

Not sure if this is related, but my displays started to freak out, then shutoff.

Jan 22 06:17:30 Y4M1-II kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
Jan 22 06:17:30 Y4M1-II kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
Jan 22 06:17:30 Y4M1-II kernel: [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed!
Jan 22 06:17:30 Y4M1-II kernel: [drm] PSP is resuming...
Jan 22 06:17:30 Y4M1-II kernel: [drm] VRAM is lost due to GPU reset!
Jan 22 06:17:30 Y4M1-II kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000753000).
Jan 22 06:17:30 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset succeeded, trying to resume
Jan 22 06:17:26 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 06:17:19 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU smu mode1 reset
Jan 22 06:17:19 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU mode1 reset
Jan 22 06:17:19 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: MODE1 reset
Jan 22 06:17:19 Y4M1-II kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <psp> failed -22
Jan 22 06:17:19 Y4M1-II kernel: [drm:psp_suspend [amdgpu]] *ERROR* Failed to terminate ras ta
Jan 22 06:17:19 Y4M1-II kernel: [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x0)
Jan 22 06:17:16 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 06:17:16 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 06:17:15 Y4M1-II kernel: [drm] REG_WAIT timeout 1us * 200 tries - hubp2_set_blank line:950
Jan 22 06:17:15 Y4M1-II kernel: [drm] REG_WAIT timeout 1us * 200 tries - hubp2_set_blank line:950
Jan 22 06:17:15 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to disable gfxoff!
Jan 22 06:17:15 Y4M1-II kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:80:crtc-1] flip_done timed out
Jan 22 06:17:15 Y4M1-II kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:77:crtc-0] flip_done timed out
Jan 22 06:17:10 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Bailing on TDR for s_job:18e3f, as another already in progress
Jan 22 06:17:10 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 06:17:10 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1688 thread Xorg:cs0 pid 1731
Jan 22 06:17:10 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=112513, emitted seq=112515
Jan 22 06:17:10 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 06:17:10 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jan 22 06:17:10 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=7058, emitted seq=7059
Jan 22 06:17:10 Y4M1-II kernel: [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
Jan 22 06:17:05 Y4M1-II kernel: [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!

I also notice this more frequently. Seems to happen just before it flickers or the display crashes:

(alsa_output.pci-0000_0f_00.4.iec958-stereo-41) XRun! rate:256/48000 count:1 time:30425802 delay:8874055 max:8874055

Currently using Pipewire. I just reset the config, in case that’s the cause.

Jan 22 08:07:58 Y4M1-II kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 22 08:07:58 Y4M1-II kernel:       Tainted: G           OE     5.15.11-76051511-generic #202112220937~1640185481~21.10~b3a2c21
Jan 22 08:07:58 Y4M1-II kernel: INFO: task Xorg:1692 blocked for more than 120 seconds.
Jan 22 08:07:58 Y4M1-II kernel:  </TASK>
Jan 22 08:07:58 Y4M1-II kernel:  ret_from_fork+0x22/0x30
Jan 22 08:07:58 Y4M1-II kernel:  ? set_kthread_struct+0x50/0x50
Jan 22 08:07:58 Y4M1-II kernel:  ? process_one_work+0x3d0/0x3d0
Jan 22 08:07:58 Y4M1-II kernel:  kthread+0x11e/0x140
Jan 22 08:07:58 Y4M1-II kernel:  worker_thread+0x53/0x420
Jan 22 08:07:58 Y4M1-II kernel:  process_one_work+0x22b/0x3d0
Jan 22 08:07:58 Y4M1-II kernel:  drm_sched_job_timedout+0x6f/0x110 [gpu_sched]
Jan 22 08:07:58 Y4M1-II kernel:  amdgpu_job_timedout+0x14f/0x170 [amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  amdgpu_device_gpu_recover.cold+0x6ec/0x8f8 [amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  ? drm_fb_helper_set_suspend_unlocked+0x33/0xa0 [drm_kms_helper]
Jan 22 08:07:58 Y4M1-II kernel:  amdgpu_device_pre_asic_reset+0xdd/0x480 [amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  amdgpu_device_ip_suspend_phase1+0xa3/0x180 [amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  ? amdgpu_device_set_cg_state+0x12f/0x280 [amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  ? nv_common_set_clockgating_state+0x9f/0xb0 [amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  dm_suspend+0xaa/0x270 [amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  mutex_lock+0x34/0x40
Jan 22 08:07:58 Y4M1-II kernel:  __mutex_lock_slowpath+0x13/0x20
Jan 22 08:07:58 Y4M1-II kernel:  __mutex_lock.constprop.0+0x263/0x490
Jan 22 08:07:58 Y4M1-II kernel:  schedule_preempt_disabled+0xe/0x10
Jan 22 08:07:58 Y4M1-II kernel:  schedule+0x4e/0xb0
Jan 22 08:07:58 Y4M1-II kernel:  __schedule+0x23d/0x590
Jan 22 08:07:58 Y4M1-II kernel:  <TASK>
Jan 22 08:07:58 Y4M1-II kernel: Call Trace:
Jan 22 08:07:58 Y4M1-II kernel: Workqueue: events drm_sched_job_timedout [gpu_sched]
Jan 22 08:07:58 Y4M1-II kernel: task:kworker/12:1    state:D stack:    0 pid:  246 ppid:     2 flags:0x00004000
Jan 22 08:07:58 Y4M1-II kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 22 08:07:58 Y4M1-II kernel:       Tainted: G           OE     5.15.11-76051511-generic #202112220937~1640185481~21.10~b3a2c21
Jan 22 08:07:58 Y4M1-II kernel: INFO: task kworker/12:1:246 blocked for more than 120 seconds.
Jan 22 08:05:24 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Bailing on TDR for s_job:1123, as another already in progress
Jan 22 08:05:24 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Bailing on TDR for s_job:43c, as another already in progress
Jan 22 08:05:24 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 08:05:24 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 08:05:24 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 08:05:24 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jan 22 08:05:24 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jan 22 08:05:24 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jan 22 08:05:24 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=4303, emitted seq=4305
Jan 22 08:05:24 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma3 timeout, signaled seq=1084, emitted seq=1086
Jan 22 08:05:24 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma2 timeout, signaled seq=4379, emitted seq=4381
Jan 22 08:05:20 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:20 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:19 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:19 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:19 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:19 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:19 Y4M1-II kernel: amdgpu_cs_ioctl: 59 callbacks suppressed
Jan 22 08:05:14 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset end with ret = -62
Jan 22 08:05:14 Y4M1-II kernel: snd_hda_intel 0000:0c:00.1: CORB reset timeout#2, CORBRP = 65535
Jan 22 08:05:14 Y4M1-II kernel: snd_hda_intel 0000:0c:00.1: refused to change power state from D3hot to D0
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
...
Jan 22 08:05:14 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset(2) failed
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
Jan 22 08:05:14 Y4M1-II kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
Jan 22 08:05:14 Y4M1-II kernel: [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed!
Jan 22 08:05:14 Y4M1-II kernel: [drm] PSP is resuming...
Jan 22 08:05:14 Y4M1-II kernel: [drm] VRAM is lost due to GPU reset!
Jan 22 08:05:14 Y4M1-II kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000753000).
Jan 22 08:05:14 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset succeeded, trying to resume
Jan 22 08:05:03 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:0c:00.0
Jan 22 08:05:03 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU mode1 reset failed
Jan 22 08:05:03 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: SMU: I'm not done with your previous command!
Jan 22 08:04:58 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU smu mode1 reset
Jan 22 08:04:58 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU mode1 reset
Jan 22 08:04:58 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: MODE1 reset
Jan 22 08:04:58 Y4M1-II kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <psp> failed -22
Jan 22 08:04:58 Y4M1-II kernel: [drm:psp_suspend [amdgpu]] *ERROR* Failed to terminate ras ta
Jan 22 08:04:58 Y4M1-II kernel: [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x0)
Jan 22 08:04:56 Y4M1-II kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
Jan 22 08:04:56 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Fail to disable dpm features!
Jan 22 08:04:56 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to disable smu features.
Jan 22 08:04:51 Y4M1-II kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Jan 22 08:04:51 Y4M1-II kernel: amdgpu 0000:0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jan 22 08:04:50 Y4M1-II kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Jan 22 08:04:50 Y4M1-II kernel: amdgpu 0000:0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jan 22 08:04:50 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 08:04:50 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1692 thread Xorg:cs0 pid 1745
Jan 22 08:04:50 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=570767, emitted seq=570769

When I sshed to this PC, I couldn’t reboot through it. Have to manually shut it off.

And I set Power Supply Idle Control to “Typical Current Idle”.

This Bugzilla report seems to resembles my own issue.

Is it related to the OP’s post? Are you running an RX 6900XT?

Think I solved that one by increasing the DRAM & CPU voltage, so so far so good

Here’s my dmesg (136.2 KB) output. I got this by SSHing to my desktop when no output appeared on my display.

[    4.474380] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
...
[   49.635022] amdgpu 0000:0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[   49.635100] [drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR* KCQ enable failed
[   49.635215] [drm:amdgpu_device_ip_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v10_0> failed -110
[   49.635321] amdgpu 0000:0c:00.0: amdgpu: amdgpu_device_ip_init failed
[   49.635323] amdgpu 0000:0c:00.0: amdgpu: Fatal error during GPU init
[   49.635325] amdgpu 0000:0c:00.0: amdgpu: amdgpu: finishing device.
[   49.933058] [drm:dc_dmub_srv_cmd_queue [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
[   49.933210] [drm:dc_dmub_srv_cmd_queue [amdgpu]] *ERROR* Error queuing DMUB command: status=2
[   49.933412] amdgpu 0000:0c:00.0: amdgpu: Fail to disable thermal alert!
[   52.038705] [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x0)
[   54.143981] [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x0)
[   56.249192] [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x0)
[   58.354469] [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x0)
[   60.459742] [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x0)
[   60.459770] [drm] free PSP TMR buffer
[   62.565018] [drm] psp gfx command DESTROY_TMR(0x7) failed and response status is (0x0)
[   62.871241] [drm:psp_v11_0_ring_destroy [amdgpu]] *ERROR* Fail to stop psp ring
[   62.872704] amdgpu: probe of 0000:0c:00.0 failed with error -110
[   62.872856] BUG: unable to handle page fault for address: ffffbc373ecfe000
[   62.872858] #PF: supervisor write access in kernel mode
[   62.872859] #PF: error_code(0x0002) - not-present page
[   62.872860] PGD 100000067 P4D 100000067 PUD 120846067 PMD 0 
[   62.872863] Oops: 0002 [#1] SMP NOPTI
[   62.872864] CPU: 7 PID: 328 Comm: systemd-udevd Not tainted 5.15.15-76051515-generic #202201160435~1642693824~21.10~97db1bb
[   62.872866] Hardware name: Gigabyte Technology Co., Ltd. X570S AORUS MASTER/X570S AORUS MASTER, BIOS F3c 10/01/2021
...
[   62.873396] RIP: 0010:vcn_v3_0_sw_fini+0xcb/0x120 [amdgpu]

Are you doing GPU passthrough. It looks like maybe it is hanging on trying to reinitialize VNC on the card but that could also be a passthrough issue. I don’t see multiple GPUs on this system so if you are doing single GPU passthrough, you are in uncharted territory. Also, start your own thread because the issues that you are having seem to be unrelated to OPs issues.

Yes, but it’s my GeForce GTX 1060 that’s being used for passthrough.

Okay. Then something is up with your AMD GPU then. Are you still on the 5.13 kernel? I though 5.15 was the current LTS kernel. I would try 5.15 or go to 5.16/5.17 since that should have better baked in support for your kernel.

Specifically, it looks like your card is having a hard time waking from sleep. You could try to unplug the monitor and plug it back in to see if you get something different in your dmesg. Unfortunately, I am still rocking GCN 1.1 and my card is pretty much stable.