Desperate help needed from Threadripper owners

I’ll keep this one short. Just got around to building the TR system I bought from Sarge. Yesterday we managed to debug it and confirmed all RAM slots and RAM DIMMs are working.

Every time I reboot, the system detects a different amount of RAM in different slots. I shake the cooler a bit and power on, the system then detects the full 8 slots. A reboot later, only a couple of the slots. I have reseated the CPU in the socket 6 times already, it’s ridiculous.

Question: any of you TR or EPYC owners know a trick on how to get this piece of c**p properly pressure mounted? I’m almost certain it’s something with the CPU mount, especially because when I have the system laying on its side, I have the most success in detecting all RAM slots, but it’s not always consistent. I went as far until the wrench clicked and didn’t go further.

I even tried to not tighten it all the way, just good enough to make it sit properly and the system booted, but there was no difference to RAM detection, it was still flaky and random I even not tightened the Noctua NH-U14S-TR4-SP3 all the way and only allowed the pressure from the springs on the screws to hold it in place, but still no dice.

As you can probably tell, I’m very frustrated right now after 2 days of troubleshooting this thing. Please tell me you have a silver bullet for fixing this. I even ran out of thermal paste, which is not hard given the size of the IHS, done that in 2 mounts. At this rate, if I am to change the paste every time until I get it right, I’ll need a pack of 10 thermal pastes. I wonder if the thermal grizzly carbonaut will be good enough for TR.

1 Like

Nothing beyond having the stock install tool and torquing until the click and no further.

2 Likes

This is exactly what I did with my TR build as well, just a 2950x build on an ASUS motherboard with a Noctua cooler.

However, I can’t state if all 8 DIMM slots works as I only have 4 sticks of RAM installed.

1 Like

So just as a thing to test, have you tried just shutting the PC down then remove power, through flipping the switch on the PSU, wait like 3-5s and then start again?

1 Like

Yes, I tried that too. Still different random results, but no difference than just shutting down without unplugging.

1 Like

I only use 4 sticks in each of my two systems, but have not had detection errors tightening the cpu til the wrench clicks the first time.
Gen1 tr. iirc, both the sockets are Lotes ones. I seem to remember something about one of the two socket manufacturers being iffy, but not sure which.

3 Likes

Some sockets had issues with the threads not catching properly on 2 and 3 once 1 was pushed all the way to the bottom, so you had to do a quarter turn on each before starting to screw all the way. And the other one had issues with requiring a lot of down pressure to catch the screw threads.

This chip is really finicky with detection, I even set the RAM to 2400 and 2133, but it made no change, so I’m sure it’s not the sticks or the CPU, but the socket. What I haven’t tried is to just slap the Noctua on top without screwing it down and just testing the system like that, to see if it’s an issue with overtightening the cooler, as it’s weird that just shaking the cooler makes the RAM detection sometimes work properly.

1 Like

Also makes “Random Access Memory” live up to its name.

I hope you get this fixed without using stones or anvil to get uniform pressure on it. This bumping at the case and altering memory detection is an unusual behaviour indeed.

2 Likes

Quick news after yet another remount. I think the UEFI is bugged. Like, really buggy, infested cockroach nest.

The main page only shows random sticks at each reboot, but if I go to the OC tab and check the DRAM details, I can see all slots and all the RAM serial numbers and that all of them are dual rank. However, when booting into an OS, only the memory that was detected at start is actually usable. For example, randomly now I see 64GB of usable memory when in a linux live boot environment.

So, either the UEFI, or the firmware on the chip. I don’t know what to say. The UEFI on the x399 taichi is at 3.90, which is the latest, And this was released in Jan 2020, so I doubt Asrock will release a new UEFI after 2 years, unless someone has some strings to pull around this forum.

Is anyone aware of any limitation of memory on TR 1950x? I got 8 sticks of 16GB dual ranked 2667 MT/s UDIMMs. At this point, I might just return half the RAM and run with “only” 64GB of RAM, if that means stability and guaranteed capacity on each boot. In fact, I’ll be testing this shortly, I’ll keep the RAM sticks from slots C1 to D2, which were made in 2020, move them to A1 to B2 and probably return the older ones made in 2018. I bought two quad-channel kits of the same brand and model, but obviously, this doesn’t seem to work well in my case.

I feel like this is a firmware limitation on the chip itself or memory controller though. I wonder if there are any firmware updates for this chip.

I thought I may have done a rookie mistake and I could miraculously make this to work. Still no dice. I have populated A1 to B2 with the 2018 sticks and C1 to D2 with the 2020 sticks. But then I rtfm and saw that the supported configurations are A2 + B2, A2 + B2 + C2 + D2, or all populated. Even though I had them half-working some of the time despite this.

So I thought: let’s remove the 2018 sticks, boot with the 2020 sticks. Every reboot was constant, 64GB of RAM always detected, even after power cycling it. Added the 2018 sticks back aaaaand… C1 and C2 weren’t detected. TF.

So, that’s it, I think I’ll be returning the older kit tomorrow. Not worth the hassle and if I ever upgrade my system, I can sell the TR and get a 5000 series Ryzen and use those 4 sticks with it. But in all honesty, if this puppy is going to be stable, then I probably won’t be upgrading in a long time.

Also, @here, I still haven’t gotten an answer for this:

@PhaseLockedLoop it’s my turn to spam your user. Also + @HaaStyleCat.

1 Like

I suggest you just reuse the paste for now. It’s not really going to go bad or anything unless you dunk the HSF in your coffee mug or something, and who cares if you lose 5% of cooling capability by introducing a few bubbles for the duration of your troubleshooting? Once you get it running like you want and know you will not reseat the CPU further, then do a final clean paste application.

I have a 1950X in a Zenith Extreme and I have issues with it as well, but it’s a deterministic issue of the second PCIe x16 slot only connecting at x8. Other than this it did run rock solid, which I am grateful for. I was very annoyed with it earlier but I don’t even use the system anymore these days. Which reminds me I should try to think of something to employ it for.

What happens if you boot with the 2018 sticks? So you have a quad channel kit from 2018 and a quad channel kit from 2020?

3 Likes

Scrap that, I just got 48GB of RAM after the 6th reboot. I absolutely hate this. So even with 4 sticks from a kit, this piece of **** still misbehaves. How can it show quadruple channel when only 3 sticks are detected, that’s so ridiculous! And again, all 4 sticks are detected in the DRAM settings, but the capacity is not.

May be time to go ahead and progressively try all the UEFI versions available for the board

I’m not sure I want to run an older UEFI versions.

Ryzen and particularly first generation threadripper were extremely memory picky. Im not sure what to tell you anout the platform. Its down to more than just speed. Its down to memory timings as well

Please don’t make me do that, it’d be such a pain… pff… I guess I’ll lose another 2 hours of my life debugging this.

1 Like

Try the embeddef stuff first.

If command rate 1 fails try 2

Welcome to all modern platforms

2 Likes

By embedded, you mean builtin? I didn’t see any profiles other than for 2x4GB and for 4x8GB something or another, I think one was corsair.

I mean the xmpp profiles aka d.o.c.p