Computer Overheating and Crashing after 45 minutes of gaming

I have a friend with a bit of an overheating issue.

He has two 34" 4K 120Hz monitors for his PC. He is running a Ryzen 5950x and an Nvidia 3090 with a custom loop (2x 360 rads) in an InWin 925. After about 45-60 minutes of game-play, one of the monitors will shut off. It will live like this for another hour or two, or it will completely shutdown shortly thereafter. The GPU is running about 80-90C with the liquid at about 45C. The CPU will also be quite toasty around 90C. He is running Windows 10.

He does not use it for particularly CPU intensive things during the normal course of his day (occasionally he will have to render something in blender, or pull our vectorworks) so I think that he should sell the 5950x and get himself of 5700x3d, or a 5800x3d if he can find one. It is a drop in replace, an he wont have to ENTIRELY redo his custom loop. I don’t know how much this will actually help or if this is a sign of some other known issue with nvidia 3090s (though I couldn’t find anything relevant and recent online).

I also find it curious that the GPU seems to just shutdown on of the displays, then full crash about an hour or two later, is this known behavior for a 3090?

I am rocking Air cooled and way less high-end hardware, so I have no idea.

Assuming waterblocks are making good contact, my next assumption is maybe dead pump. Event viewer should be giving clues.

The Pump is known good (old one died). New one is installed, flowmeter and all.

I was just informed of these shenanigans during work today. I’ll have him see if his event viewer is saying anything.

The water blocks SHOULD be fine, but I’ll have him double-check.

Just to be sure, check the Power Supply too, I assume the 3090 is using the first gen of the 12VHPW connector ?

Results?

when is the last time the loop was tore down scrubbed and flushed?

My bets are on water blocks being gunked up

For me it looks like the loop is clogged or airflow to the radiators is very limited. I was running a full custom loop for 6 years, but I did full maintenance teardowns every 12-18 months. I also used premixed fluid which contained all the bio inhibitors and anti corrosion additives.

2x360mm radiators should be able to keep up with 3090 and 5950X - at full tilt they generate around 500-550W?
Since the rise in temp is slow but steady, I think block contact is not an issue here, as the temp would skyrocket in an instant with poor block contact. Still, it should be checked during maintenance.

Swapping 5950X to 5800X3D is a band-aid at most, since you will get at most 100W (less in gaming) reduction overall and 3090 still will be a furnace that it is.

I really recommend doing full loop teardown and maintenance on it. This means opening the blocks and clearing channels on them, opening the pump, cleaning/replacing tubes (depends on the state), cleaning radiators (very important, since they can hide a lot of gunk), using a premix or distilled water with additives.

If any gunk will be found in the loop, then cleaning every part of it is very important, as it will grew back otherwise.

Results are in:

He cleaned the loop and checked the blocks. This is the only event viewer log from when it crashed again on retesting after 90 minutes.

Warning 10/3/2024 9:49:40 AM DistributedCOM 10016 None

Log Name: System

Source: Microsoft-Windows-DistributedCOM

Date: 10/3/2024 9:49:40 AM

Event ID: 10016

Task Category: None

Level: Warning

Keywords: Classic

User: ZR-01[Redacted]

Computer: ZR-01

Description:

The application-specific permission settings do not grant Local Activation permission for the COM Server application with CLSID

{2593F8B9-4EAF-457C-B68A-50F6B8EA6B54}

and APPID

{15C20B67-12E7-4BB6-92BB-7AFF07997402}

to the user ZR-01[Redacted] SID (S-1-5-21-3644006685-2871440048-3140683947-1001) from address LocalHost (Using LRPC) running in the application container Unavailable SID (Unavailable). This security permission can be modified using the Component Services administrative tool.

Event Xml:

<Provider Name="Microsoft-Windows-DistributedCOM" Guid="{1B562E86-B7AA-4131-BADC-B6F3A001407E}" EventSourceName="DCOM" />

<EventID Qualifiers="0">10016</EventID>

<Version>0</Version>

<Level>3</Level>

<Task>0</Task>

<Opcode>0</Opcode>

<Keywords>0x8080000000000000</Keywords>

<TimeCreated SystemTime="2024-10-03T13:49:40.7119754Z" />

<EventRecordID>205307</EventRecordID>

<Correlation ActivityID="{16cd249b-e814-4a01-b4ee-f84296f5f93f}" />

<Execution ProcessID="1072" ThreadID="59728" />

<Channel>System</Channel>

<Computer>ZR-01</Computer>

<Security UserID="S-1-5-21-3644006685-2871440048-3140683947-1001" />
<Data Name="param1">application-specific</Data>

<Data Name="param2">Local</Data>

<Data Name="param3">Activation</Data>

<Data Name="param4">{2593F8B9-4EAF-457C-B68A-50F6B8EA6B54}</Data>

<Data Name="param5">{15C20B67-12E7-4BB6-92BB-7AFF07997402}</Data>

<Data Name="param6">ZR-01</Data>

<Data Name="param7">[Redacted]</Data>

<Data Name="param8">S-1-5-21-3644006685-2871440048-3140683947-1001</Data>

<Data Name="param9">LocalHost (Using LRPC)</Data>

<Data Name="param10">Unavailable</Data>

<Data Name="param11">Unavailable</Data>

He does use a premix, and when he replaced the pump he did a full out and out clean. Then cleaned and reseated the card and the CPU.

There should be loads of entries in event viewer. Clicking on Event Viewer (Local) to bring up the summary helps me sift through the majority with a lot of googling on my part.

To help with minidump/crashdump analzyzing you can use “WhoCrashed” software, which is free for home use. It will find dumps (if they are created) and try to analyze it and provide as much info as possible in a human-readable form.

After servicing the loop, did temperatures improved? Because if loop is saturating with heat that’s not removed and both CPU and GPU go to 90C+ then they will finally shut down to protect themselves.