Threadripper 3970X, 256GB RAM speed?

Howdy, folks – first time poster, here!

I recently picked up a very nice machine: a Thelio Major by System76 with a Threadripper 3970x. It comes with an Aorus TRX40 Pro Wifi motherboard. I opted to max out the RAM after the machine arrived, because why not? I ordered 256GB of Corsair Vengeance LPX, which is rated at 3200MHz (model number: CMK256GX4M8E3200C16). This model isn’t on Aorus’s QVL for this motherboard, and I knew that ahead of time. I figured: it’s a supported speed for this motherboard and CPU and a reputable RAM brand. Anyway, after installing and configuring the RAM to run at 3200MHz for a while, all was well. I experienced no problems in Linux, which is what I’m using most of the time.

However, when I dual booted to Windows 10 I noticed a few issues. Most of the time this would manifest itself as a failure to load the display driver. Or sometimes a game would refuse to launch due to it notifying me that some of its files had become corrupted. Other times, it was that only some apps would refuse to see an active network connection. Device manager would also sometimes show several unrecognized PCI devices. I never knew which of these problems would present itself when booting into Windows, but there was about a 50% chance at least one would.

My first thought was maybe the issue was a bad Windows install. (Is that still a thing? Not sure honestly, this is my first time using Windows in over 10 years.) After reinstalling, the issues remained. Next, I ran memtest86 on the RAM. Side note, it takes a looooong time to test 256GB of RAM, and I didn’t really have 2 days free to let it run, so cancelled that. Next, I tried upgrading the BIOS to see if that fixed the issue. Unfortunately, the Windows-only issues remained.

Finally, I did some reading and YouTube video watching and realized that the fastest officially supported RAM speed for 8 DIMMs on Threadripper gen 3 was 2666MHz. So, I downclocked it and so far it seems to be running OK (knock on wood).

So, my question(s): does this solution make sense? Am I just fooling myself thinking RAM speed alone might be causing these symptoms? Is anyone else rocking 256GB of RAM on a 3970x and running it at 3200MHz – or faster? I’m not super bothered by having to run the RAM at a slower speed, but obviously I’d like to run it as fast as I can. Is there a chance that a future BIOS update resolves it? Is it possible that this is bad RAM?

It is possible to run 256gbs at 3200. I run a gskill kit at its rated speed by just setting the profile to 3200. It did take weeks of testing tracking down bad ram sticks and waiting on bios updates to get there. I dual boot windows 10 and linux, no issues.

Maybe talk to system76? I think they do some modifications to their laptop bios, maybe they do on the thelio too? If not check for bios update?

image

Should be out soon

5 Likes

@ChuckH what mobo are you using? What were your symptoms? I’m trying to figure out when to conclude my issues are likely caused by not having full BIOS support for the RAM configuration versus bad RAM itself.

:+1: looking forward to this

Errors in memtest86. Mostly in test 7 I think. I would run each stick through 32 passes on test 7 to check if it was ok. I think you need memtest86 pro to do that right since the test is 32bit inversions and it increments the starting point 1 bit each pass, free version only allows 4 passes.

I used prime95 set to small ffts to test cpu stability.

It seems odd you are only having issues in windows. I would try prime95 in both windows and linux. Maybe the default “on demand” performance profile in ubuntu is keeping issues at bay.

Thanks, I’ll definitely give this a shot.

It took me weeks of testing, about 14hrs per stick and then alot more hours on multiple sticks. Oh and I’m using an asus zenith ii extreme alpha. The bios it shipped with was horrible for ram stability. Errors even at 2666 and only 4 sticks.

I’ve thought about bumping my 3200/c16 kit to 3600/c18 and see how it runs. I got tired of testing ram though, just wanted to use it!

Edit:
@argentum another thought, is your windows and linux on separate drives? Maybe try swapping sockets/cables/ports? I also remember getting a warning about pcie device stability at one point in my tweaking. Maybe the pcie buss freq is accidentally getting changed at the higher ram freq?

Yep, each are on a separate NVMe drive. I have a third NVMe slot unused, maybe I’ll move the windows drive onto that one.

Different operating systems hit memory in different patterns.

By the way, I’d be cautious about trusting memtest fully. On my 3900X system memtest would pass my HyperX 3600 RAM perfectly but when doing 24 thread Rust or C++ compiles there would be obvious memory errors like misspelled symbol names in link. It only appeared under heavy CPU load.

What’s “passed” on memtest86? 1 pass of all tests? 4 passes? It ran for an hour with no errors? Memtest86 isn’t 100% certain but the more passes the less likely you have bad ram and more likely the issue is something else.

I ran several different ram testing programs, memtest86 caught more errors than anything else. Most errors would show up in 3 or 4 passes, some took up to 20 passes.

I think that might be an indicator it’s not a ram issue. CPU/IMC could be causing it.

I dare to ask the price :slight_smile: but like over time I want.

no idea prob like $1000 or some bs (didnt actually watch the video, as I have no desire for that)


not sure if any of those are actually the correct ones but yeah > $1k

128GB should be enough for a 3970x.
I’m mostly fine with 64GB oom only happens with 64 thread make and -flto=64 in gcc with heavy compiles like clang or qtwebkit.

I would be more than fine with 32G in my next build but I love when ram was cheap.

As reported above, it was over $1,000 for the RAM alone. But as with every other aspect of this purchase, it was a business expense.

As for my original RAM issues: I moved my Windows NVMe drive around from one slot to another. The weird Windows issues described previously persisted. Again, no perceivable issues on Linux, regardless of configured RAM speed.

I then ran Prime95 on both Linux and Windows using 90% of the RAM in my system, and it found no errors. Running much more would cause the test to crash, for obvious reasons.

I then hunted around for any chipset driver updates from Gigabyte and AMD. I found that AMD’s chipset drivers were more up-to-date than Gigabyte’s. After installing that and a few other Aorus utilities, I found no difference in stability on Windows. The same issues were present (sporadically, as ever).

At this point I suppose I’m likely looking at bad RAM or poor support for this amount of RAM from the motherboard. My next step will be to run memtest86 pro on the RAM, one DIMM at a time. That’s gonna take some time, unfortunately.

Thankfully, all these issues appear to be isolated to Windows, which is exclusively being used for gaming, so nothing critical.

Happy to hear any more suggestions!

Ultimately I was unsatisfied with my situation and opted to return the Corsair RAM (which was not QVL certified for the Gigabyte AORUS TRX40 Wifi Pro) – hey, you live and you learn. In its place, I purchased 256GB of G.SKILL Trident Neo Z ( F4-3200C16Q2-256GTZN) running at 3200MHz. This is actually QVL certified RAM, at 8 DIMMs.

It’s still not perfect, at least as far as Windows 10 is concerned. There are still moments when I boot up and certain things do not work properly (display driver, individual apps don’t have network connectivity, etc). However, it’s far better than it was with the non-QVL certified Corsair RAM. Enough that I can totally notice it. I don’t have to reboot 4-5 times in order to have a working system. It’s now 0-1 times. Yay, for small victories.

In summary, pay close attention to the QVL certification document for your TRX40 motherboard, and do yourself a favor: don’t stray from what’s recommended there.

@argentum Something I noticed on my machine, setting the automatic DCOP settings on the Z2E Alpha sets the bclk freq to 100.

Might be worth a check on your board to see what its running. I’m pretty sure the pcie frequency is tied to the bclk, so this could be causing your instabilities on everything pcie if it is running at anything other than 100.

Just a thought.

Thanks – I’ll check on this!

Checked it several times and it’s been 100 each time, so this wasn’t it. Thanks for the suggestion.