Multi Purpose Threadripper Workstation (or - All Your PCIe Lanes Are Belong To Us!)

Hi Community! I am writing my first post here to ask for your feedback and ideas/advice.

I am looking to build a 7970X-based high-end linux workstation to serve as a number-crunching and data analysis machine (mostly physics-based particle simulations and ML/deep-learning models - 90% of the time) and also a virtualization-capable machine (mostly Windows/Adobe/Blender to create educational and scientific publication material - 10% of the time). Simulations require fast GPUs, and data analysis and visualization requires lots of RAM and fast NVMe storage.

The first thought is: building another machine for image manipulation that will only be active at most 15% of my time seems terribly wasteful in terms of space, time, and $$, given that I will already have a very powerful machine on my desk. So, I am considering adding a dedicated GPU and NVMe drive for virtualization.

The second thought is: I have a variety of use cases and want this workstation to be future proof, but without having to use noise-canceling headphones to be able to work next to it. So, I am choosing low-power “quadro” GPUs for particle simulations, and an “entry-level” RTX for virtualization, so they use less power and produce less noise and heat.

Final though: This should cot around ~$12K, which seems feasible per Micro Center prices.

This is the configuration I have in mind:

  • Asus TRX50-SAGE Motherboard
  • Threaripper 7970X (32 core)
  • Silverstone SST-XE360-TR5 (for CPU AIO Cooling) or Noctua (NH-U14S TR5-SP6)
  • v-color 512GB or 256GB RAM 4-module kit (depending on price)
  • GPU 1 and 2: RTX 4000 Ada (for simulation and AI)
  • GPU 3: RTX 4060 Ti (dedicated for virtualization and PCI passthrough)
  • 1 * Crucial T705 1TB PCIe Gen5 x 4 NVMe (OS and user data)
  • 1 * Crucial T700 4TB PCIe Gen5 x 4 NVMe (simulation and AI cache)
  • 1 * Crucial P3 Plus 500GB PCIe Gen4 x 4 NVMe (for virtualization)
  • M.2 PCIe adapter w/ 4 * 4TB NVMe (local data storage for data analysis)
  • PSU: Corsair AX1600i 80 Plus Titanium
  • Case options: Fractal North XL (if using CPU AIO) or Torrent (for air cooling)

PCIe Lanes considerations:

  • This system has a total of (per Motherboard manual) 48 Gen 5 lanes (CPU only), 32 Gen 4 secondary CPU lanes, and 8 Gen 4 chipset lanes.
  • I will have 10Gb and 2.5 Gb connections active in this machine (2 PCIe Gen 4 lanes for the 10Gb connection and 1 PCIe Gen 4 lane for the 2.5Gb LAN)
  • The RTX 4060Ti is PCIe Gen4 x 8, and the 4000 Ada is PCIe Gen4 x 16.
  • In this motherboard, two NVMe drives are PCIe Gen 5 x 4 connected directly to the CPU, and the third is PCIe Gen 4 x 4 connected to the chipset.

Physically, I would set it up on the board like this:
— PCIe Slots (ASUS TRX50-SAGE)

  1. PCIEC16(G5)_1 (x16 - PCIe 5.0) – RTX 4000 Ada (single slot)
  2.         ---                    -- empty slot
    
  3. PCIEC16(G5)_2 (x16 - PCIe 5.0) – RTX 4000 Ada (single slot)
  4.         ---                    -- empty slot
    
  5. PCIEC16(G5)_3 (x8 - PCIe 5.0) – Gigabyte NVIDIA GeForce RTX 4060 Ti Eagle (dual slot - PCIe 4.0 x8)
  6. PCIEC16(G4)_1 (x4 - PCIe 4.0) – (covered by RTX Dual Slot)
  7. PCIEC16(G4)_2 (x16 - PCIe 4.0) – M.2 adapter

And the PCIe usage would look like this (still trying to follow Motherboard manual):
CPU PCIe 5 - 16+16+4+4 (4000Ada “1” + 4000Ada “2” + 1TB NVMe + 4TB NVMe)
CPU PCIe 4 - 8+16+2+1 (RTX for Virtualization + M.2 adapter + 10G LAN + 2.5G LAN)
Chipset PCIe 4 - 4 (0.5TB NVMe drive)
Total CPU PCIe Lanes: 5=40 ; 4=27
Total Chipset PCIe Lanes: 4=4

Main questions:

  1. The two RTX 4000 Ada GPUs are PCIe Gen 4 x 16, so even though they are mounted on Gen 5 slots, they will downgrade the connection to Gen 4. Will this eat into the 32 Gen 4 CPU lane “budget”, or will the CPU downgrade the Gen 5 PCIe lanes into Gen 4 PCIe lanes? in other words, if I only connect one RTX to the PCIe Gen 5 slot (and no other card or NVMe), will I still have 32 PCIe Gen 4 (CPU) lanes available, 4 Gen 4 chipset lanes, and another 32 Gen 5 CPU lanes?
  2. CPU cooling should be quiet under load, this box will sit next to me on my office. Would the noctua and silverstone CPU cooling options be similar in noise level under load?
  3. I am thinking the North XL would be better for AIO (top mount) and Torrent better for air cooling. I worry that mounting the AIO by the front 180mm fans of the Torrent would bring in hot air for the whole system. Does that make sense?
  4. v-color RAM was in the list of compatible modules, but I never used v-color before. Any details to be mindful about here?
  5. The chosen PSU was based on HW Busters reviews, which indicated this has the lowest power fluctuation among similar choices. Power fluctuation seemed to be a big problem for others in this forum when using Threadripper systems. Seasonic Prime PX-1600 v3 was another option. Was there ever a conclusion to what would be a “maximum” power fluctuation level for a stable system?
  6. I read here that for PCI passthrough, a “non-quadro” GPU is necessary. Is that still the case? This is the main reason for having a 4060 in this configuration.

I am open to any feedback, from “none of this makes sense” to “sounds cool, send me a picture”. What do you think?

There’s a lot there but one thing I will answer real quickly is the 700 and 705 nvme drives in practical use are the mostly the same but the 700 runs a lot hotter than the 705. If you can swing it I would stick with the 705’s

Also I have an eight stick 768gb from v color that’s been solid the month I’ve had it for what that’s worth. It did require some ducting to keep it cooled though as it also runs fairly hot (96gb sticks)

The PCIe speed will adjust to the slowest device, so Gen 4 in your case.
The lane-width will be what is supported, so 16x per card.

I have an Arctic U-4M in my machine, all that air movement just makes noise. Only way around that is to have more surface area and slower air, in other words watercooling.

I am passing a Quadro RTX4000 (non-Ada) through in my machine. Works fine.

I wouldn’t bother. Even Noctua only claims the NH-U14S TR5-SP6’s good for 179 W, plus my experience is their ratings are optimistic if you want a quiet build. The Storm Peak parts are all 350 W.

It might be possible to drill mounting holes for four of a 3x120 rad’s screws in GP-18s or AL-18s but, from memory, I’m skeptical. And, longer term, fan frame warp might be an issue. The more reliable option’s to use the Torrents’ 120/140 mm mounting brackets.

Top exhaust is the default for avoiding elevating system temperatures by an intake rad’s ΔT, yes. Another difficulty in the layout considered is x4/x4/x4/x4 M.2 breakout of PCIEC16(G4)_2 blocks roughly half the 4060 Ti’s intake on its hot side.

Not that I’ve seen but, as there’s only ~420 W of GPU load here, if an AX1600i’s not fine I’d be pretty comfortable concluding its Asus’s problem. (Given Asus in general and the number of TRX50-SAGE issues reported I’d be hesitant to use the board anyways.) A dual supply build’s another option to consider.

I’m not sure I agree. Two or three Zen 5 builds totaling US$ 12k offer more total compute, IO, and cooling with probably less hassle and plausibly less noise. Main issues I see are continued lack of 64 GB UDIMM availability means 192 GB max in any one machine for a while and potential for any one workload to be DDR bound by two channels.

If that forces 85% Storm Peak + 15% Granite Ridge that actually doesn’t look that bad to me. Though, typically, what I do is work on one machine during the day and job on it overnight while the other one crunches as much as I can keep it busy. Usually the bottleneck’s how fast I can write code but, if the work’s in a compute bound phase, ~100% utilization of two machines is no problem.

@eousphoros , Thanks for that note! I can definitely stick to the 705s to reduce the heat production.
Do you know what is the determinant factor here? I read both use the same Phison E26 controller, but different memory modules. Is the 3D memory just that much more efficient at running under cooler temperatures?