Need advice for a 2TB RAM && Dual-GPU system for parallel processing

Good evening all!

Basically, I’m trying to build a 1TB ECC RAM system with dual 4090s to meet my client’s specifications. Their entity deals with massive amounts of data-processing (mostly parallel processing) and has a decent budget for this build.

Currently, I’m considering the following parts:

MOBO: ASUS Pro WS WRX80E-SAGE SE WIFI motherboard (since it supports up to 2TB of RAM and only requires a single CPU);

CPU: AMD Ryzen Threadripper PRO 3995WX 64-core, 128-thread CPU (since it’s capped out at 2TB of RAM, I thought it’d be a good pick for high CPU core-count as well as support for 2TB of DDR4 ECC server-grade LRDIMM RAM);

RAM: 8 x Micron 128GB DDR4-3200 LRDIMM 4Rx4 CL22 (model: MTA72ASS16G72LZ-3G2R);

PSU: Unknown (may need a dual PSU setup, but I’d like advice if this is both possible and reliable);

CPU Cooler: Unknown (Custom water loops are unfortunately not an option for maintenance reasons);

Case: considering the Phanteks Enthoo Pro 2 (since it allows for a dual PSU setup if the wattage exceeds 2000W for titanium rating);

GPU: 2 x MSI RTX 4090 liquid-cooled w/ the 240mm AIO (even though SLI isn’t possible, is NVLINK reliable to link both GPUs for distributed workload? I’d like to somehow make 4 x MSI RTX 4090 liquid-cooled possible, but I fear dimensions on the board side, temperature problems, etc. will arise if I try putting 4x MSI 4090s in the build;

Storage: 2 x 1TB 980 Samsung Pro NVME drives (put into a RAID1 format for mirroring purposed);
Storage: 1 x 4TB 5400RPM Seagate Ironwolf NAS HDD (to make redundant system-image backups on it for extra data backup redundancy);

Lastly, system must be ran on Windows 10 and not Windows 11. Not sure if Windows 10 will be able to fully utilize the modern hardware but I’d like some advice if this requirement will be an issue with the hardware I’m currently planning on;

That’s as far as I got in terms of components…

Definitely need to figure out:
-The PSU issue, must be titanium power-rating
-The CPU cooler (probably will need to be air cooled but I’m afraid the dimensions won’t fit;

I estimate with all these components, assuming only 2 x RTX 4090s, will draw ~2500W as a maximum based on the components TDP specified by the manufacturers;

Any advice would be great and criticism on my current plan is welcomed as well!

Thank you for your time!

If you want NVLink you’ll have to go with a professional card like the RTX 6000 - the 4090 doesn’t support it.

^^This I messed up and ordered 2x 4090 and didn’t realise not NVLink, so went back to my Dual A5000s, wish they still supported NVLink on xx90 cards

I would go with bigger drives like 4TB so you have more space left if you hibernate.

That mainboard should come with an extra PCIe x16 Nvme SSD carrier card. Check that on the page.

If you want to use every slot you need PCIe riser cables that support PCIe 4.0 and a different case or something DIY.

These are suggestions for the EU, the prices in the US region might differ.

The Big Gen4 SSD Roundup - Best SSDs for PC & Playstation 5 in 2022

4TB SSDs Worth Buying - Transcend 250

You might have to go with an AIO or custom loop water cooling.

Custom loops are expensive and AIOs are cheaper.

Use quick disconnects if you use a custom loop.

The custom loop water cooling can be very reliable if you carefully choose components and clean them before you use them.
They might still contain residues from the factory.
There are dual pump cases for redundancy and several sensors.
But it is a hassle when it comes to maintenance.

Do not use those pressure equalizer filters for the loop. You will get algae or other goop in your loop. Seal off the loop.

Leave enough air in the reservoir or the pressure can burst your loop.
Have a look at the reservoir and the volume the water takes at different temperatures. The bigger the loop the bigger the change in volume.

Don’t mix the wrong metals or you speed up corrosion.
Like copper and aluminum if I recall correctly.
Check the galvanic properties / tables.

Don’t use non clear fluids or you gunk up your CPU and GPU cooler.
No fancy liquids.

Maybe it would be better if you get an AMD and a NVIDIA card so you can run apps on the hardware they are fastest on.
It might be a compatibility nightmare. I never tested it so beware.

I guess only the datacenter GPUs support SR-IOV.

Linux might not be the best option for virtualization if you don’t want to constantly tinker and don’t get me started on platform support.
Just have a look at lm-sensors (Linux) and hwinfo (Windows).
You gonna love lm-sensors.
It’s sarcasm if you don’t already know.
lmsensors is about as useless as it gets. If you are lucky it finds your CPU temps.