Return to

My new 3960X

So, how did this come about?
Well, we were using a 1950X with 128GB RAM at work as a build server, to speed up our local builds, before sending stuff to the BCI, as everybody uses laptops. It served one team, then two, then 3, 4…

Then it started crashing. And we’re all at home. And I was 300km away, so I had help with the kids. Sometimes it crashed and the security had already left and no one had a key to the server room.
Then they took half the DIMMs because something was failing, and with 64GB we no longer had enough RAM to compile at -j32 (had to max at -j24).
And then I started ordering an Epyc server, except that now, in a biggish company, it was no longer a matter of picking up the phone and talking to the supplier, but rather a long chain of emails, taking more than a month to complete the order.
And we got sacked pretty badly, being asked for 5400€ for something which I could get equivalent Supermicro parts on the Internet for 3300€. So, I gave up on the 7502P, on the 7402P and settled for the 7302P. Welcome to Portugal!
And then it would take 2 weeks to deliver, but it actually took 4, I guess because of COVID.

So, I decided I wanted one for myself. This Threadripper is all I every wanted!
All that I could never get in the 90s, when I used to read about the MIPSes and the Alphas and multi-processor programming. I remember doing a multithreaded program in college, then testing it on a big SPARC and it crashing immediately because of a race condition which wouldn’t happen on a single CPU. Then I built a dual Pentium III in 1999 with Supermicro case and motherboard.

Then for some time we had “multi” cores to play on, which allowed for some testing, but they were never the real thing, how could you plot your Speedup over multiple cores if you only had 2-4? Or your multi-processor efficiency? Observe super-linear speedups?
Then I started learning about auto parallelizing compilers, and Xeon Phi came along, we could test some idea on 32 very lousy cores, but would never buy one of those things for myself.

This is the real thing. Threadripper is the first accessible real multi-processor.

Worn out with the 1950X experience (but it had nvidia drivers, vmware kernel modules…) I though about building an Epyc with a Supermicro motherboard. By chance I found Level1Techs videos on YouTube, “hey, people actually do this, it actually makes sense”, reliability over clock speeds, etc, ECC memory, even Supermicro chassis seemed better to me at the time. Then I saw there were videos about ZFS, which even mentioned FreeBSD passing by, and I thought “I’m not alone in the world!”

But then, I would only receive Supermicro’s H12 motherboard in October, and I couldn’t wait. Then I also found Gamers Nexus, and came to find that Supermicro’s chassis layouts are a bit outdated and old fashioned (I had to let go of the hot-swappable bays…). So I went for the 3960X.

The motherboard was an easy choice. How could it be that the cheapest motherboard I could find in my local store had 10GbE and the more expensive ones didn’t? How could they justify those high prices? It had to be the ASRock Creator.

Nothing was easy, though. I got the CPU and memory in two days. I could buy the memory directly from a global Crucial store cheaply (first time I purchase something that I feel I’m almost on a similar stand as an american buyer).
The motherboard was out of stock in ASRock Portugal. I managed to find the last one in stock of a retailer, and switched my order.
The PSU wasn’t easy. I tried ordering a Seasonic but they couldn’t give me a delivery date. I couldn’t find an efficient and powerful PSU below 200€. I bought an FSP 1200 for 219€, but was unimpressed, it appeared to be a 2014 model… then I found out it had lots of cables missing and returned it. After repeating searches many times I found a less efficient Gold 1000W, a Phanteks Revolt Pro, for only 160€.

First I ordered a NXTZ X63 AIO. Then I saw the GN video that shows only Enermax was better than Noctua U14S on the Threadripper and switched my order to the Enermax. Then I found the other videos about Enermax and switch to the Noctua…

Finally, when the motherboard finally came (it delayed in the delivery!), I mount the CPU on the board, put the paste and when I try to screw the cooler, it won’t screw. It turned out that one of the screws had no thread!

I left home again, went to several stores, and what’s more, the cooler was out of stock at the distributor too! Fortunately, the last store I visit, the one closest to home, just at closing time, had one Noctua U14S TR4… and I could finish my initial build on September 14th.

To be continued with practical usage results, and airflow improvements…
Also requesting advice for improvement… it seemed to me better to use the own heatsink of the Gigabyte NVMe…

Total cost: CPU+memory=2k, chassis+PSU+NVMe+cooler=1k, total 3k

No GPU yet! Was waiting for a 3080 or 3070, then maybe a Biggish Navi, now I don’t know any longer… it’s using a spare Sapphire R7 250 low profile with a fan that is whinning…


I have no room to complain and don’t take this as criticism.

I love that you included the picture of your thermal paste application. A squirt here. A squirt there. A couple bits over there and there.

No pretty Xs or anything. :slight_smile:

It’s kind of how I would do it.


I pretty much did the same to my 2950x build, also using the same cooler too.

I did purchase some Thermal Grizzly when I ordered the CPU and Cooler.

I was following the pattern in Noctua’s manual. First, the 9 small dots, but I made the center one larger. I guess those first 9 small dots are meant to frame the big 4 dots, so that the big 4 dots are placed correctly on top of the dies.

This was the application before I saw the screw with the missing thread. The final application was a bit cleaner than this.

For what it’s worth, that paste application pattern is what Noctua recommends on Threadripper. In the Gamer’s Nexus video on different paste patterns for Threadripper, their better performers were pretty similar (“X with dots” did well, if I recall correctly).

What did I gain here? Well, the binary in I’m working on takes a long time to compile on our high-end (32GB but 15W TDP) company laptops.
This is a C++ project, and C++ is one of the languages that I know that takes longer to compile (Haskell may be worse). In our laptops, compiling at -j8 it takes:

  • 1h15’ to build the binary in Release mode
  • 4h++ (5h?) to build “all” in Debug mode with --coverage

Numbers with gcc or llvm are similar. This is the application layer of the navigation program of some european car brand (one of those which makes its own navi).

So, this is embedded programming, and everything is moving. We need to change the “sdk” almost every week because interfaces above to HMI and below all keep changing. So, clean builds happen frequently… And then you get tickets from testers, for software 2 weeks old, you need to go back and forward, change branches…

With the 1950X build server we had:

  • 7’20" to build the main binary in Release at -j32
    With Epyc 7302P we need:
  • 7’01" at -j32

The most surprising thing is memory comsumption. Compiling doesn’t need a lot of memory, but the problem is when you are running 32 simultaneous instances of gcc.
The old 1950X had no swap space and -j32 usually went fine. But when they took half the DIMMs, after one minute or so we would get a lot of “internal compiler error” and would have to reduce to -j24 to be able to compile.
That allowed me to determine that for a C++ project of this kind with current compilers you need about:

  • -j24 – 64GB
  • -j32 – some 96GB perhaps
  • -j48 – 128GB

And that’s why I aimed at the 3960X with 128GB of RAM.
And the timings I get are these:

  • 3’50" to build the main binary in Release
  • ~11’ to build “all” in Debug with --coverage

That’s a hell of an improvement from 5h. I also observe that for compiling with --coverage the 128GB are almost enough, but not quite. I added 32GB of swap and typically use ccache to further reduce the numbers.



I like to go slow. Initially, I bought an additional Noctua NF-A15 with the intention of using dual fans on the CPU cooler, but didn’t immediately install it because I wanted to evaluate its benefit. I had only the 3 fans that come preinstalled in Fractal’s case.

The Phanteks PSU has a button for Hybrid mode which apparently shuts down the fan but works contrary to what i expected.

My first CPU temperatures while compiling were at 82-84 degrees C and I wasn’t happy. The weather was still a bit hot, but it wouldn’t be above 27-28 in this cooler room. I was particularly unhappy that even at idle my temperatures were above 40C.

The first surprise was the PSU fan. When I discovered that the button was reversed and turned the fan on the temps went from 83 to 76.

I then started putting fans in “Performance”, etc, I put the Noctua fan at the back of the box and moved the Fractal fan to the front (not that it is needed there, I just like to feel that the whole front of the case has an homogenous airflow).
It settled at 74C at full load and the temperatures at idle came down from 40C.

So I bought another noctua NF-a15 for the CPU cooler and this was the final step.
My temperatures came down to 62-63 degrees C at full load with fans set to “Silent”.

This is the final result:

For me, the second fan in the CPU cooler had a benefit of at least 10C.

1 Like

Overclocking and GPU

I was never much into overclocking. I had a 486 in which you could overclock the ISA bus and that was excellent because it would speed particularly the VGA card, but since then, not much…
I don’t change settings on stuff a lot and I don’t want to put expensive hardware to risk.

However, this time, I don’t know… a safe overclock…

I was a beat cheap on the RAM. I bought 3200MHz Crucial Ballistix CL16 BL2K32G32C16U4B.
I was cheap because I wasn’t sure that faster RAM in big amounts, 128Gb, would get me the same benefit as it does in small amounts, 32GB, on gamers PCs. There was also a video in which Wendell mentions some limitation with four channel memory…

I know that it’s the four channels that counts the most, and prefer parallelism to higher clock speeds.
But the truth is that I didn’t research enough the right memory for this system.

Recently I saw a buildzoid video where he mentions that these Ballistix could have some overclocking headroom.
I also see here and there that 3600MHz is optimal here and matches the clock speed of the infinity fabric.
I wonder if I could overclock my RAM to 3600 keeping the same timings…

Also, the CPU, it’s almost disappointed to have all those fans spinning up so little… 63 degrees? Maybe I could overclock a tiny bit. Not compromising power efficiency and reliability.

Then, the GPU, I’m desperate…
I’m almost ready to buy a 5500X or 5600X while I wait for a more powerful one.
(GPU programming is something that I care about, I worked many years processing LiDAR data).

So, these are the open topics where I would appreciate some advise.

I see it now. When I first commented it didn’t look like a pattern to me because of a couple of smears and offset blobs made it look a bit random.

This is exactly why I built myself a Ryzen 3900 system. Work gave us Lenovo T580 laptops which are nice and all, but they are 15W TDP and a 12 core Ryzen 3900X is literally 8 times faster at C++ builds.

I am considering a Threadripper 5000 build with Zen-3 when AMD gets around to it.

Yes, T400s and T500s used to have 35W processors.

In any case, this context I’m working in lends itself so well to remote development, our toolchains are always inside some sort of container, that for me that killed the concept of the powerful laptop.

What I want from a laptop is a nice, colorful screen, good keyboard, and that it is light enough to work in unorthodox sitting positions on the couch.