This build was a primarily to create an optimal all round mixed stack development machine, with enough pcie lanes for 2-3 GPUs, mixed storage types, bandwidth, and cores to facilitate development on windows, in wsl2, and dual boot to ubuntu also for non virtualized access to resources.
Building and specifying this machine, was the most enjoyable build I’ve done in 20+ years, by far. I’d already replaced my dev machine with my standard go to desktop setup, but came unstuck as soon as I added a second GPU, that issue around lack of PCIe lanes via desktop processors triggered this build.
Specs:
- Basics:
- Motherboard: Asus TRX50 Sage
- Processor: Threadripper 7960x
- DDR5: Kingston 6400/32 (2Rx8) RDIMM 128GB
- Chassis
- Case: Fractal Design Define 7 XL
- AIO: Silverstone XE360-TR5
- PSU: Be Quiet Straight Power 1500W (2x 600w connectors)
- Fans: 5x PWM Static 140mm
- GPUs
- Asus ProArt 4080 Super (2.5 slot)
- Founders 4070 Super (2 slot)
- Storage:
- 2x Intel Optane P5801x 400GB
- 2x Samsung 990 Pro 4TB
- 4x Crucial T700 1TB in an Asus Hyper M.2 Gen5
Potato Photo
Basic Benches:
IF manually set to 2133 to keep the ratios, no overclock just EXPO II profile, PBO set to board stock - this is a work machine I sit beside all day, dealing with heat and noise and instability for minor gains isn’t worth it for me.
Storage Details:
First: Optane p5801x (Windows Boot Drive - Selected for IOPS)
NTFS 4K
Second: Optane p5801x (Ubuntu Boot Drive - Selected for IOPS)
ext4 4K - This drive is also mounted under WSL2, primarily optimized for many small files / building.
FIOs, rand4kq1 548MB/s (got to be a record!) w/ 140k IOPS on ext4
randread_4k_q1: (groupid=0, jobs=1): err= 0: pid=22315: Tue Oct 29 22:05:56 2024
read: IOPS=140k, BW=548MiB/s (574MB/s)(16.0GiB/30000msec)
randwrite_4k_q1: (groupid=0, jobs=1): err= 0: pid=22446: Tue Oct 29 22:08:04 2024
write: IOPS=102k, BW=400MiB/s (420MB/s)(11.7GiB/30000msec); 0 zone resets
read_1m_q8: (groupid=0, jobs=1): err= 0: pid=23170: Tue Oct 29 22:19:03 2024
read: IOPS=6803, BW=6803MiB/s (7134MB/s)(199GiB/30002msec)
write_1m_q8: (groupid=0, jobs=1): err= 0: pid=23299: Tue Oct 29 22:21:09 2024
write: IOPS=4459, BW=4460MiB/s (4676MB/s)(131GiB/30002msec); 0 zone resets
Crucial T700s (Mounted in Asus Hyper M.2 Gen 5)
T700s Partition 1, Striped Storage Pool
Optimized for storing and loading AI models, typically requires sequential reading and writing of multiple 4-5GB model files.
T700s, Partition 2, Mirror Storage Pool
Optimized for redundancy and storing work critical files, code, etc.
Samsung 990 Pro 4TBs - mounted on gen 5 CPU lane m-key slots.
990s, Partition 1, Striped Storage Pool (4K Blocks)
Optimized for storing VHDX and random files, downloads etc.
990s, Partition 2, Mirror Storage Pool (64K Blocks)
Optimized for sinking rarely used but important videos and images with redundancy - photography/drone footage etc.
Useful Notes:
Partition Manager vs Storage Pools.
Under all tests, storage pools provided noticeably better performance than the equivalent partition setups via stock partition manager. Also no resync’s on mirrors, also much more flexible in reality for usage.
ReFS does give a slight speed up to many small file setups like building programs and libs (~15%) but it’s slower for larger files, so in real world usage net no gain.
WSL2 mount types.
If you use WSL2, bare mounting a VHDx and formatting it with ext4 or xfs leads to much faster performance times, than using native NTFS drives, for real world things like builds (cmake, npm etc). I tested all the possible combinations in depth, a short summary is:
Setup: i7 12700KF, 64GB 3200, 990 Pro / Optane P5801x
Typescript build (compiling typescript lib from source, npm)
- 30s: 990 Pro NTFS (4k, 64k)
- 30s: Optane NTFS (4k)
- 26s ReFS 64k on Storage Pool
- 26s ReFS 4k on partition
- 25s Optane ReFS 4k on partition
- 14.5s Optane XFS –bare mount
- 14.3s VHDx, bare mounted, formatted XFS or ext4
This 2x speed up on the same hardware by using either a VHDx or --bare mounted raw formatted drive was also observed for bitcoin compilation (-j $(nproc) 5m4s to 2m35s).
Short: if you’re using WSL2, just create a second vhdx, mount it --bare, format it with ext4, and enjoy 2x faster real world virtualized linux.
New TRX50 Threadripper build comparison:
Bitcoin Build: 1m 20s under WSL2 (2-4x faster than i7 12700KF/64gb), 52 seconds on optane under ubuntu boot (5x faster!)
Custom image benchmarking script (resize 5000 images, convert to webp, tar.gz)
i7 12700kf / 64GB / Optane XFS: Conversion 30s, Archive 16s, Total: 46s
7960x / 128GB / Optane XFS: Conversion 10s, Archive 7s, Total: 17s (2.7x faster)
Case / Motherboard.
If you forfeit the usb front headers, you can use the last slot on the Asus TRX50 Sage for a 2-3 slot GPU in the Define 7 XL. In practice this means you can fit four GPUs in this setup. 3x 3 slot GPUs (1x gen 5 x16, 1x gen 5 x8, 1x gen 4 x16) and 1x 2 slot GPU (x16 gen 5) in this case. Or 3x if you don’t use a PCIe extender.
The stock fans in the define 7 XL are not great, suggest swapping out to PWMs, your ears will thank you. 5x 140 static PWMs and machine is running considerably quieter and cooler, even under benchmarking and stress testing.
The m.2 slot nearest the GPU on the asus sage trx50 is a chipset linked slot, and if you put an m.2. in it whilst trying to install windows, you’ll get stuck in an infinite loop. Install windows to a pcie mounted rive, or the other two gen 5 m2 slots to avoid this.
Final general thoughts.
In actuality, this workstation is on average 4x faster than my last setup in all tasks, and it feels just instant whatever I’m doing, I’m a developer with a very mixed workload, coding one minute, building another, spinning up large databases and cycling batches of data through a couple of AI models then saving results back. It’s not just the performance while a task is running, but more so the ramp up times, if you’re getting some code right for some LLM tasks and need to load in the models every few minutes while testing, that near instant load time of models from storage to gpu really makes a difference to your working day.
With the exception of huge LLM tasks on instances, this machine is as fast or faster at everything than all of our bare metal servers, setup for different tasks, it can just do everything without breaking a sweat.
Final tech thoughts.
- The TRX50 with 7000 series threadripper is just perfect, the PCIe lane allocation and 4 channel ddr setup is so perfectly matched - with my setup I’m using:
- 16x gen 5 lanes for hyper m2: 63 GB/s
- 32x gen 4 lanes for 2x GPUs: 63 GB/s
- 16x gen 4 lanes for 4x NVMe: 31.5 GB/s
- Total: 157 GB/s
- future: swap in a gen 5 GPU: 189 GB/s
- Memory Bandwidth (measured) 182 GB/s
The ASUS Sage trx50 board I chose very specifically, because of how it allocates these lanes, running 3 x16 slots at the same time just wins out for me, others like the ASRock are 2x x16, and 2x x8.
DDR RDIMMs, I found actually getting RDIMMs very hard here in europe, and the choice between going 7200+ v-color and having an imbalanced IF, or trying to get 6000/30 (gskill) for sweet spot with IF 2000, it was all very hard to both work out and then even purchase, I ended up going for kingston 6400/32 so I could test running it at both 6000 and 6400, the 6400 worked out the best balance. I’d have preferred more RAM, but in practice given code base and model sizes I’m using in development / locally, I’ll rarely run above ~98gb ram usage, so the 128gb just fits everything I personally do in my mixed workload.
Traditional benchmarks: This machine, considering no overclock, just flies on them all the build ranks in top 40 on novabench of all time Novabench - Benchmark Result at 9544, and it doesn’t even have a 4090 and wasn’t using the fast mirror when benched, think it’d hit top 5 with easy if I could be bothered, passmark 19350 PassMark Software - Display Baseline ID# 2233425 - remember this is stock, no overclock, not even PBO turned on, and no trickery to get it to bench faster. Cinebench R24 is 2893 on MC.
Temperatures , I’ve had this open all day, and done loads of benching and heavy work, so max values are the max encountered so far, it just hasn’t broken a sweat, and as I sit beside it now there’s no noise.
Hope something in here helps someone!