Asus TRX50 7960x 128gb, dual optane, quad crucial t700s, dual 990s, dual gpu - dev machine

This build was a primarily to create an optimal all round mixed stack development machine, with enough pcie lanes for 2-3 GPUs, mixed storage types, bandwidth, and cores to facilitate development on windows, in wsl2, and dual boot to ubuntu also for non virtualized access to resources.

Building and specifying this machine, was the most enjoyable build I’ve done in 20+ years, by far. I’d already replaced my dev machine with my standard go to desktop setup, but came unstuck as soon as I added a second GPU, that issue around lack of PCIe lanes via desktop processors triggered this build.

Specs:

  • Basics:
    • Motherboard: Asus TRX50 Sage
    • Processor: Threadripper 7960x
    • DDR5: Kingston 6400/32 (2Rx8) RDIMM 128GB
  • Chassis
    • Case: Fractal Design Define 7 XL
    • AIO: Silverstone XE360-TR5
    • PSU: Be Quiet Straight Power 1500W (2x 600w connectors)
    • Fans: 5x PWM Static 140mm
  • GPUs
    • Asus ProArt 4080 Super (2.5 slot)
    • Founders 4070 Super (2 slot)
  • Storage:
    • 2x Intel Optane P5801x 400GB
    • 2x Samsung 990 Pro 4TB
    • 4x Crucial T700 1TB in an Asus Hyper M.2 Gen5

Potato Photo

Basic Benches:

rich text editor image

IF manually set to 2133 to keep the ratios, no overclock just EXPO II profile, PBO set to board stock - this is a work machine I sit beside all day, dealing with heat and noise and instability for minor gains isn’t worth it for me.

Storage Details:

First: Optane p5801x (Windows Boot Drive - Selected for IOPS)

NTFS 4K

rich text editor imagerich text editor image

Second: Optane p5801x (Ubuntu Boot Drive - Selected for IOPS)

ext4 4K - This drive is also mounted under WSL2, primarily optimized for many small files / building.

FIOs, rand4kq1 548MB/s (got to be a record!) w/ 140k IOPS on ext4

randread_4k_q1: (groupid=0, jobs=1): err= 0: pid=22315: Tue Oct 29 22:05:56 2024
read: IOPS=140k, BW=548MiB/s (574MB/s)(16.0GiB/30000msec)
randwrite_4k_q1: (groupid=0, jobs=1): err= 0: pid=22446: Tue Oct 29 22:08:04 2024
write: IOPS=102k, BW=400MiB/s (420MB/s)(11.7GiB/30000msec); 0 zone resets

read_1m_q8: (groupid=0, jobs=1): err= 0: pid=23170: Tue Oct 29 22:19:03 2024
read: IOPS=6803, BW=6803MiB/s (7134MB/s)(199GiB/30002msec)
write_1m_q8: (groupid=0, jobs=1): err= 0: pid=23299: Tue Oct 29 22:21:09 2024
write: IOPS=4459, BW=4460MiB/s (4676MB/s)(131GiB/30002msec); 0 zone resets

Crucial T700s (Mounted in Asus Hyper M.2 Gen 5)

T700s Partition 1, Striped Storage Pool

Optimized for storing and loading AI models, typically requires sequential reading and writing of multiple 4-5GB model files.

rich text editor imagerich text editor image

T700s, Partition 2, Mirror Storage Pool

Optimized for redundancy and storing work critical files, code, etc.

rich text editor imagerich text editor image

Samsung 990 Pro 4TBs - mounted on gen 5 CPU lane m-key slots.

990s, Partition 1, Striped Storage Pool (4K Blocks)

Optimized for storing VHDX and random files, downloads etc.

rich text editor imagerich text editor image

990s, Partition 2, Mirror Storage Pool (64K Blocks)

Optimized for sinking rarely used but important videos and images with redundancy - photography/drone footage etc.

rich text editor imagerich text editor image

Useful Notes:

Partition Manager vs Storage Pools.

Under all tests, storage pools provided noticeably better performance than the equivalent partition setups via stock partition manager. Also no resync’s on mirrors, also much more flexible in reality for usage.

ReFS does give a slight speed up to many small file setups like building programs and libs (~15%) but it’s slower for larger files, so in real world usage net no gain.

WSL2 mount types.

If you use WSL2, bare mounting a VHDx and formatting it with ext4 or xfs leads to much faster performance times, than using native NTFS drives, for real world things like builds (cmake, npm etc). I tested all the possible combinations in depth, a short summary is:

Setup: i7 12700KF, 64GB 3200, 990 Pro / Optane P5801x

Typescript build (compiling typescript lib from source, npm)

  • 30s: 990 Pro NTFS (4k, 64k)
  • 30s: Optane NTFS (4k)
  • 26s ReFS 64k on Storage Pool
  • 26s ReFS 4k on partition
  • 25s Optane ReFS 4k on partition
  • 14.5s Optane XFS –bare mount
  • 14.3s VHDx, bare mounted, formatted XFS or ext4

This 2x speed up on the same hardware by using either a VHDx or --bare mounted raw formatted drive was also observed for bitcoin compilation (-j $(nproc) 5m4s to 2m35s).

Short: if you’re using WSL2, just create a second vhdx, mount it --bare, format it with ext4, and enjoy 2x faster real world virtualized linux.

New TRX50 Threadripper build comparison:

Bitcoin Build: 1m 20s under WSL2 (2-4x faster than i7 12700KF/64gb), 52 seconds on optane under ubuntu boot (5x faster!)

Custom image benchmarking script (resize 5000 images, convert to webp, tar.gz)

i7 12700kf / 64GB / Optane XFS: Conversion 30s, Archive 16s, Total: 46s

7960x / 128GB / Optane XFS: Conversion 10s, Archive 7s, Total: 17s (2.7x faster)

Case / Motherboard.

If you forfeit the usb front headers, you can use the last slot on the Asus TRX50 Sage for a 2-3 slot GPU in the Define 7 XL. In practice this means you can fit four GPUs in this setup. 3x 3 slot GPUs (1x gen 5 x16, 1x gen 5 x8, 1x gen 4 x16) and 1x 2 slot GPU (x16 gen 5) in this case. Or 3x if you don’t use a PCIe extender.

The stock fans in the define 7 XL are not great, suggest swapping out to PWMs, your ears will thank you. 5x 140 static PWMs and machine is running considerably quieter and cooler, even under benchmarking and stress testing.

The m.2 slot nearest the GPU on the asus sage trx50 is a chipset linked slot, and if you put an m.2. in it whilst trying to install windows, you’ll get stuck in an infinite loop. Install windows to a pcie mounted rive, or the other two gen 5 m2 slots to avoid this.

Final general thoughts.

In actuality, this workstation is on average 4x faster than my last setup in all tasks, and it feels just instant whatever I’m doing, I’m a developer with a very mixed workload, coding one minute, building another, spinning up large databases and cycling batches of data through a couple of AI models then saving results back. It’s not just the performance while a task is running, but more so the ramp up times, if you’re getting some code right for some LLM tasks and need to load in the models every few minutes while testing, that near instant load time of models from storage to gpu really makes a difference to your working day.

With the exception of huge LLM tasks on instances, this machine is as fast or faster at everything than all of our bare metal servers, setup for different tasks, it can just do everything without breaking a sweat.

Final tech thoughts.

  • The TRX50 with 7000 series threadripper is just perfect, the PCIe lane allocation and 4 channel ddr setup is so perfectly matched - with my setup I’m using:
  • 16x gen 5 lanes for hyper m2: 63 GB/s
  • 32x gen 4 lanes for 2x GPUs: 63 GB/s
  • 16x gen 4 lanes for 4x NVMe: 31.5 GB/s
  • Total: 157 GB/s
  • future: swap in a gen 5 GPU: 189 GB/s
  • Memory Bandwidth (measured) 182 GB/s

The ASUS Sage trx50 board I chose very specifically, because of how it allocates these lanes, running 3 x16 slots at the same time just wins out for me, others like the ASRock are 2x x16, and 2x x8.

DDR RDIMMs, I found actually getting RDIMMs very hard here in europe, and the choice between going 7200+ v-color and having an imbalanced IF, or trying to get 6000/30 (gskill) for sweet spot with IF 2000, it was all very hard to both work out and then even purchase, I ended up going for kingston 6400/32 so I could test running it at both 6000 and 6400, the 6400 worked out the best balance. I’d have preferred more RAM, but in practice given code base and model sizes I’m using in development / locally, I’ll rarely run above ~98gb ram usage, so the 128gb just fits everything I personally do in my mixed workload.

Traditional benchmarks: This machine, considering no overclock, just flies on them all the build ranks in top 40 on novabench of all time Novabench - Benchmark Result at 9544, and it doesn’t even have a 4090 and wasn’t using the fast mirror when benched, think it’d hit top 5 with easy if I could be bothered, passmark 19350 PassMark Software - Display Baseline ID# 2233425 - remember this is stock, no overclock, not even PBO turned on, and no trickery to get it to bench faster. Cinebench R24 is 2893 on MC.

Temperatures , I’ve had this open all day, and done loads of benching and heavy work, so max values are the max encountered so far, it just hasn’t broken a sweat, and as I sit beside it now there’s no noise.

Hope something in here helps someone!

8 Likes

now twin 1500w straight power 12, in a lian li o11d-xl (not evo)

other perhaps interesting bit is 6x notua 15mm fans in a push pull config on the Silverstone xe360-tr3, and ptm7950 thermal pad - all in barely hit 50 under usage

3 Likes

In which slot did you put the M.2 card into?

Tonight im going to test it in all three x16 slots, because I’d like to move it on to an extender and v mount for airflow optimization (it’s like a wall of metal wherever you put it)

currently slot 1, and 4090 in slot 2 - however changing up as want dual 5090fe:s in it, when they eventually actually release properly

so will reply with more useful information tomorrow

we’ve seen this as well, but ReFS is the only production ready solution for metadata validated storage on Winders. And the resilience and recoverability is surprisingly good.

Still wish I could implement it on Windows 11 Pro workstations, but such is life. If you need ReFS, run a server…

THAT is what I am always telling people. Don’t wait for computers, make them work for you.

very impressive, 32 core EPYC Turin is 32k and a TON more expensive and pain in the ass to… everything.

1 Like

total, or cpu? (this cpu is 89k)

improved a bit with config: PassMark Software - Display Baseline ID# 2389656

over on novabench currently ranks #4, should shoot to 1 after 7080x+5090 upgrade

whoops
i meant 64k…
EPYC 9334

highest i hit with the 9334 was 68k

This is an interesting build, and it has got me considering something similar. How is your second optane drive connected? I cannot see it in the photo, and that adapter card only appears to take one drive.

I have three adapters, and a dual one which doesn’t function - second optane drive is currently sitting in a drawer doing nothing , because of the size of the heatsinks they’re double slot, so it’s a tradeoff of gpus vs storage (and arguably have enough)

v-color 8000 rdimms coming tomorrow to tests, interested to see how it performs