Vega hotspot magic fix?

Looks common to have issues getting hotspot temps to not be insane. Hoping somebody has sorted out exactly what is happening. Hoping to not have to just brute force it with water or liquid metal.
Is it just normal to have hotspot be the 1st thing that runs away with too little cooling?
Help me minimize how much I have to bash my head against this problem.

The problem is that everyone thinks there’s a problem. Before you could see the hot spot temp, you didn’t care. Now that you can see it, you’re applying some “conventional wisdom” about what the “right” temps are without having any actual knowledge about what certain logic blocks can handle within an IC.

You’d be amazed at how much dense logic happily chugs along at 120C+ for years without you ever knowing it, and never is that temp a concern for the longevity of the chip. The introduction of the hot spot temperature sensors is purely to assist the clock gating mechanisms into achieving the optimal performance and acoustic balance, since hot spot is a more accurate (and timely) way to moderate adjustments.

If you’re determined to decrease temps, undervolt and pull the power limit down. Easy as that.

Yes because that is quite literally what “hot spot” is; the highest temperature measured from the nearest sensor reporting to that value.

Yeah, artifacting isn’t a concern, right?

Before you reply with dumb shit, it only happens when it over heats.

Artifacting is a symptom of many different electrical problems. Thermals can expose that there is a problem, they are very seldom the root cause. Your card’s got other issues buddy. Especially with Vega and later where the SoC that reports the various sensor data will intervene if any value exceeds safe parameters, generally initiating a shutdown.

Power limit and undervolt, I bet the factory 1.2v tune is what’s got your chip acting up. Vega had to ship, they didn’t treat it kindly.

I asked for potential solutions to a problem, not solutions to other problems that I’d have to explain why it’s not an option.


Or keep insisting that I change things this card’s firmware makes a hassle/ practically impossible to change. Good job. You show me how to change that on a WX9100 (without flashing the vBIOS) and maybe I’ll bother.

this supports the “just not enough cooling” theory.

Seemed odd to me that “GPU temp” was 20c lower.

Should be sorted well enough for 170 watt limit.

As I made this post, it black screened and rebooted.

Not my fault you didn’t specify what your hardware was beforehand. I’m not here to read your mind nor follow every post you’ve ever made to determine what hardware you’re using. You said Vega, the obvious assumption is Vega 64, 56, or VII, the most common variants that a person is likely to be using.

Sounds like you don’t need any help, you seem to already have all the answers.

