Building a Threadripper 7960 based Linux workstation

Hello everyone, this is my first post here.

I’m a software engineer by profession (backend, to be specific) and most of my day is spent compiling code. I do not render videos much, but I do work with RAW image files (Rawtherapee) quite often.

I’m currently using an AMD 3950X with a B650 motherboard, and while the performance has been satisfactory, it isn’t anything to write home about. Besides, I’m having some sort of issue with the CPU/Motherboard or the PSU (the computer randomly crashes when using an application like Rawtherapee (which causes spikes)), without anything in the logs: its as if someone pulled the power chord. However, the fans keep spinning. I’ve tried swapping the RAM sticks to no avail, so I suspect the PSU (its 750W, pretty beefy for my HW).

This brings me to a convenient excuse to build a new system. And I’m finally thinking of treating myself to a 7960X.

I use NixOS as my Linux distro of choice, and the current uname output is:

Linux desk 6.1.79 #1-NixOS SMP PREEMPT_DYNAMIC Fri Feb 23 08:12:53 UTC 2024 x86_64 GNU/Linux

Since the TR seems pretty “bleeding edge” from my cursory google searches, I’d like to know how the platform is on Linux before I spend the money. Unfortunately, the TR also means that I need to buy a new case, a new PSU etc, so its pretty darn expensive.

My highest priority is a system that will help me work and be stable underneath. And if Linux is unstable on the latest TR’s, I’m also flexible to pick something else (still high end due to my performance demands, like the 7950X).

My question here is: how stable is Linux on the latest TR processor & motherboards? And which motherboards are a good fit for Linux?

I’ve been hearing that we need the latest kernels for TR processors too, is that true?

I’d mainly like to hear from people who’re using the processors on Linux and what their experience has been like so far.

Thanks!

Lucky you! Wendell did a video on the TR7000 series and Linux just a few months back:

HTH!

PS: welcome :hugs:

Thanks for replying!

I’ve seen the video, but it seems like there are reports of kernel panics, missing drivers, etc. with this motherboard on older kernels, which Wendell’s video seems to refute.

I don’t mind bleeding edge hardware per se, but in the end, I’d like to use the system to work rather than experiment with hardware. The performance is what’s attractive to me.

Probably nothing a kernel update can’t fix? NixOS offers easy reproducible rebuilds, right? So no harm in trying a more recent kernel; 6.1 is over a year old.

Sure, I’m fine with that, but sometimes you end up with unstable hardware due to incompatibilities and then you have to wait on kernel patches (or make one yourself) to get the system to be stable, both of which will be quite the hassle for me.

I’ve been running kernel 6.5 on the Asus WRX90 for a little over a month now. It has been mostly stable. The two main issues I have yet to diagnose and resolve are:

  • All cores will randomly throttle back to 545 MHz for anywhere between 10 seconds and 2 minutes. This happens up to several times a day. The current job I’m running seems to trigger it a lot less; the last occurrence was 6 days ago.

  • When the system is idle, I’ve had it randomly freeze up on me, requiring a reboot. It hasn’t spent much time idle since I got it.

Since I have the Asus WRX90, BT/wifi compatibility wasn’t an issue (because the board doesn’t include it.) I simply installed an AX210NGW M.2 BT/wifi module and it has worked flawlessly.

Of the two issues I listed above, the random freezes are the more serious concern. Until I can diagnose and either resolve or work around that, I cannot recommend this platform to anybody looking for a stable system. My system won’t be idle for at least another two weeks, so I won’t be able to troubleshoot until then.

1 Like

That’s a shame and it sounds like a deal-breaker for me, since I’ll be using this as a workstation with plenty of idling.
The one silver lining I see is that you’re speaking of the Pro series, while I’m considering the non-pro 7960 (and hence a different motherboard family).
These crashes could be motherboard related too, but I’m picking up some serious stability issues when researching this on the web.

Interestingly enough, I also had a 2950x suffer from a very similar problem: it would randomly freeze when idle.

I now have a 3950X and I’ve swapped the:

  1. Memory
  2. GPU
  3. Motherboard

And now I have a different issue: when dealing with “spikey” load, like loading RAW files in Rawtherapee in a directory, the system just shuts down: screens go black, fans/lights keep spinning, motherboard’s debug light warns me of a CPU fault. Only the reset switch is able to power cycle the system back.

This leads me to think it could be the PSU that has been the culprit all along, but hearing your issue, I also doubt another component with the 2950x.


My fallback option would be the 7950X3D which will get me the next best thing to the Threadripper.
Perhaps I’ll revisit an upgrade a few years down the line.

Yeah, it seems like people have had a better experience in terms of stability with the non Pro platform. I knew what I was getting into when I built this, and overall it has exceeded my expectations both in terms of performance and stability. (But based on the problems I read about, my stability expectations were pretty low to begin with.)

I do think the freezing is something I’ll be able to work around. Based on a few web searches I did, it seems to be an issue that has existed on Ryzen for quite some time, possibly related to the lowest-power processor states (C6). I already enabled the “Typical Current Idle” BIOS setting, but I just haven’t let the machine idle long enough to know if it helped or not.

I wish I could find someone who was using the platform with a similar setup.

Do you have any resources I could read up for the non-pro line on Linux? I get all the benchmarks people are interested in, but there really ought to be more people who review things like stability and hardware compatibility. That would save a lot of headaches. Perhaps I can do that, iff I pull the trigger on the 7960X.

Missing drivers for bluetooth etc. are perfectly tolerable, but stability is non negotiable for workstation class processors.


In your case, yeah, it sounds a lot like the processor attempts to set its power draw when its idling, and somehow gets starved of juice. I distinctly remember my 2950X also displaying very similar issues, and not to discourage you, but in the two years I had it, I never managed to fix it completely. In the end, I ran out of patience and just sold the processor and motherboard and switched to a 3950X, which has been my workhorse but does display odd behaviour with Rawtherapee (I’m guessing load spikes are the culprit, so it could’ve been my PSU all along).

Just wondering here: does the system crash while it’s idling or when entering/leaving that state? Any logs that can provide clues?

I’d expect the crashes to happen at state change rather than when in a given state.

Yeah, there are definitely stability concerns with the latest Threadrippers. If rock solid stability is your thing, then I would suggest wait another year or so, at least.

You have two alternatives the way I see it:

  1. Build two systems, one EPYC server where you offload heavy workloads to, and one top consumer system. Drawback here is you will suffer in the PCIe and I/O department even though there are pretty good pro motherboards out there like the Asus ProArt X670E.

  2. Switch to Intels offerings. Not as good, not as priceworthy, but better stability and still come with plenty of PCIe ports.

https://www.intel.com/content/www/us/en/products/details/processors/xeon/w/products.html

… Or you could just wait another year and see if TR heals, though that’s probably not an option either.

I’ve also seen more positive reports like this and I’ve reached out to the user concerned.

So far, in my research it seems like the stability issues are mostly plaguing the “Pro” line of TR processors.

Another option would be to get a OEM machine with the TR Pro platform which seem pretty stable according to Wendell - like the Lenovo Thinkstation P8 for example.

Depending on your needs there might be compromises, like IO, OEM stuff you might not want or need (included dGPU / SSD / memory). or it might not suitable for you at all for your usecase (like putting in couple of consumer GPUs inside this small case), dealing with OEM configuration/shenanigans and non standard parts might also be a dealbreaker.

Yeah; I tend to keep away from OEM parts, my previous experiences with Lenovo Thinkpads have not been great (discrete nvidia GPU’s, overall underpowered platform, etc.)

Besides, there’s probably some pleasure in building and then using your own system.

I’ve also heard from Michael in the meantime, which seems promising .

But I also have limited time for the research. Given my workload, I don’t have the time to debug what’s wrong with my 3950X system (keeps turning off when dealing with load spikes, eg. in Rawtherapee, the last crash of which wasn’t even so “spikey”, it crashed when loading another image). However, I cannot put much faith in this system and need to arrive at a decision fairly soon.

Ah the excitement of picking out new hardware.

My god, why didn’t I think of that? If you live in the US there is a vendor that delivers just that with guaranteed Linux support:

Unfortunately, I’m based in Western Europe and our options here are limited. But yes, System76 is very attractive.