AMD Threadripper 3970X under heavy AVX2 load: Defective design? (No, but there is an issue)

This happens with a clean install of Windows?

Haven’t tried it, I am trying to avoid a dual boot setup. Windows to go is like a persistent live Linux.

I’ve plugged the windows to go drive into multiple systems without issue.

it might be time to get a new iso for windows to go or just a new install of it on the TR system as the start machine.

Anybody else have Windows instability?
Seeing as no-one else has posted any complaints, seems like the problem is unique to you as of now.

I am also running Windows to Go, off a Samsung T5 External drive. Microsoft, isn’t developing Windows to go anymore. I think its just unique to me. I ended up using OpenRGB in linux to set my Ram and Motherboard colors.

An update on our investigation with AMD: [SOLVED] 3970X - Prime95 stability?

Still would be nice if Asrock and Msi boards,
could also being tested for the said issue.
But there seem to be enough complaints for AMD,
to take a close look to it.
And they seem to have figured out the issue with Gigabyte boards.

So what was the fix in the end? I don’t see it in the thread here.

Was it just a powerstage problem with the motherboard specifically?

@Jimster480 I’m preparing a conclusion post.

1 Like

To add some information for anyone searching for it…

I had issues on one core with the 16k AVX2 Prime95 load when running the CPU at stock. Running PBO or auto OC made the problem go away.

I updated my BIOS (Zenith II Extreme Alpha) to 0902 last night, and its completely stable at stock now.

EDIT:
Looks like that BIOS update included “02. [Q][E] Update CastlePeakPI1.0.0.3 Patch B”

I apologize for the lack of recent updates on this topic. Obviously the current health crisis has not sped up the process.

On March 7th I wrote in this thread:

It turns out that is not the case, at least on my system: I recently switched back to the GIGABYTE TRX40 Aorus Xtreme motherboard (after a few weeks on the ASUS Zenith II Extreme Alpha) and the fact is that GIGABYTE’s latest BIOS version (“F4d”, AGESA 1.0.0.3 B) does not fix the instability under Prime95.

As can be witnessed in this thread, AMD has been extremely responsive and helpful. They do have a fix for the instability that works on my system, but either GIGABYTE screwed up when merging it into their F4d BIOS, or they introduced another issue.

That’s the current situation. At this point, and per my current understanding of the situation, I believe the pressure should be put on GIGABYTE, not AMD.

Ideally GIGABYTE would wake up, get in touch with us and join our conversation with AMD. Unfortunately there’s no sign that they’re willing to do that.

I’m personally done with switching motherboards. I’ve spent far too much time on this issue.

(I’m marking back this topic as unsolved.)

1 Like

Hello @FranzB , I saw this thread and Since I have a 3960x with Aorus Master board I went ahead to test Prime95.

I currently can’t reproduce any issue with Prime95 under Windows 10 or Server 2019 (I have a dual boot system). My BIOS version is F5c

However, I tried a live Ubuntu Linux 20.04 and downloaded Prime95. I started to perform the torture test but after all threads are started I am getting a “killed” message on the terminal window. Doesn’t seems to be related to the issue you have but maybe an issue with using Ubuntu Live.

Unfortunately, Ubuntu 20.04 runs with Kernel 5.4, and only the newer Kernel 5.6 fully supports the Ryzen 3000 series power states, and PBO system. I can see that currently Ubuntu defaults to the lower PState (2.200 Mhz). So I decided not to install Ubuntu on one of my hard drives. I can also see a lot of ACPI errors under linux on boot (Oddly looks like Ubuntu supported as declared on the AMD’s CPU page is false)

I had the same issue on my TRX40 designare - PRIME seemed to work on windows and was dying like that on Linux.
This was fixed by latest firmware F4C I think.
EDIT: maybe not the same issue - I got the issue of some torture threads failing like what started that discussion.

I would say, unless you get a clear notification from Prime95 about a fatal numerical error, it’s not the same issue.

1 Like

@maximlevitsky, @FranzB. Yesterday I tested under latest Manjaro Linux (with new kernel 5.6 stock) and can confirm that Prime95 (mprime) is working fine on my Aorus Master (BIOS F5c) as it did under Windows 10/Server 2019. I also can confirm that PBO works fine under the new Linux release. However, ACPI errors still present on Linux boot messages. I also discovered ACPI errors “15” on Windows event viewer.

These ACPI errrors?

[ +0.000001] ACPI BIOS Error (bug): Failure creating named object [_SB.I2CC.WT4C], AE_ALREADY_EXISTS (20200326/dswload2-326)
[ +0.000036] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20200326/psobject-220)
[

This should be harmless according to my investigations

These “ACPI 15” errors are quite interesting, I get them too.

I’ve been wondering in the past few weeks if they were related to the following issue:

Occasionally, a fifth blank menu entry in the Power menu wil appear (blank in the sense that it’s missing its label).

In the screenshot below, I don’t have the problem:

image

However what will happen sometimes (maybe in the first few minutes after boot?) is that there will be a fifth blank menu entry, and the display of that entry is causing the ACPI 15 errors (maybe because it’s looking for the label in some database and that label is missing?)

EDIT: In my case the “ACPI 15” errors are those:

The description for Event ID 56 from source Application Popup cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event: 

ACPI
15

The message resource is present but the message was not found in the message table

Yes, I have the same error on my Linux boot log

Yes that is the error I have in the event viewer.
I believe all TR 3erd gen have this error. Prior to the Aourus I had an Asrock TRX40 with the same and I returned it to dealer as It was unstable and dropping memory modules randomly. The Aourus has been rock solid stable. Besides those ACPI errors, but not visible impact

I get the same acpi errors as well, running debian sid with linux kernel 5.6.7, everything is stable. The only issue is the audio over spdif doesn’t work but I believe thats being addressed.