I found this forum while looking at motherboard reviews and I love the channel! I’m far from an expert on this. I’m an applied/computational mathematician by trade. Currently in the midst of configuring a robust workstation aimed mainly at agent-based simulations across a massive network.
I’m specifically after advice on the choice of RAM. I need as much and as fast as I can afford, pretty sure I’ll need ECC and DDR4
However I’m also open to any feedback on overall compatibility and potential bottlenecks within my setup. This is just my best given the month or so I’ve been researching(very new at this).
Budget:
Total Cost of Selected Components so far: $4,391.19 (from pangoly)
Maximum Budget: $6,000
Location & Currency:
Country: USA
Currency: USD
Preferred Retailer:
While I have no particular preference. Leaning towards new parts
Peripherals:
Already in possession of a monitor, keyboard, and mouse. Nothing fancy. Will be getting a big curvy one and a mechanical key board once I have money again(many moons from this purchase) but I don’t need crazy refresh or color accuracy or anything.
Intended Use:
This workstation is primarily tailored for rigorous agent-based simulations on expansive networks. If it would be usable for run of the mill daily stuff that would be nice
Overclocking:
No plans to overclock.
Custom Water-Cooling:
Would prefer not to
Operating System:
Linux (ubuntu) would be nice but I’m good with Windows too
Current Build List:
Motherboard: ASUS Pro WS WRX80E-SAGE SE WIFI WRX80
CPU: AMD Ryzen Threadripper PRO 5965WX 24-core 48-thread
CPU Cooler: be quiet! Dark Rock Pro TR4
GPU: NVIDIA GeForce RTX 4070 Founders Edition
Memory: ???
Power Supply: Super Flower Leadex Titanium 1600W Semi-Modular
SSDs: (x2) Western Digital WD_BLACK AN1500 2TB PCIe
RAM: Given the motherboard and CPU’s specs, what would you recommend for fast, reliable RAM with a generous capacity? I’m veering towards DDR4 ECC.
Overall Compatibility: Can you identify any potential hitches or slowdowns in this lineup? Might I have missed any glaring incompatibilities?
Storage: Swift access to storage is pivotal for my simulation tasks. Are my chosen SSDs up to the mark, or are there superior alternatives fitting within my motherboard’s constraints?
Cooling: With an open-air design in mind, should I mull over more cooling solutions, especially targeting the SSDs?
This community’s expertise is invaluable to me, and I’d be deeply appreciative of any insights, feedback, or recommendations you can share. Many thanks in advance for your support and counsel!
Thank you!
note: the ASRock WRX80 Creator would work as well. I am going with ASUS on the assumption documentation/help will be more available.
note: I’m leaning towards an open-air case, potentially the Open Bench Table.)
Not a direct response to your question. Have you looked at Apple M2 Ultra Studio?
Sounds like a fit for your area of work. I assume your toolchain can be ported to its FreeBSD environment.
Memory bandwidth is superb, 800GB/s if I remember right. Although it max out at 192GB at the moment and no board level ECC. They’re stacked LPDDR5 chips stitched very close to CPUs. So board level ECC perhaps not a huge need yet.
Also $6000 budget fits well.
EDIT:
The base model of M2 Ultra outperforms AMD 5965WX on both single core and multi cores by a handsome margin.
My first thought is that you would want to size your memory capacity to the largest simulation you intend to run.
I don’t have much exposure to ABM, only what I overhear from colleagues, but my understanding is one of the quirks of ABM is that iteration to iteration can greatly vary in arithmetic intensity.
If your workload is more compute limited then threadripper is a fine choice, but if memory bandwidth is the bottleneck I would not recommend threadripper.
For example I’m solving some very large matrices with the generalized minimal residual method which is very dependent on memory bandwidth and a 16 core xeon W3400 is 60% faster than a 64 core threadripper 5000 CPU, almost entirely because of how much faster the intel’s memory performance is.
Also one more thing I’d suggest changing are the SSDs, the AN1500’s are a generation 3 SSD at their heart which are getting a little long in the tooth at this point, a gen 4 SSD like a samsung 990 pro would likely be a better choice.
That is the number Apple is quoting, and it is technically true, but misleading. That 800GB/s number is what the CPU plus GPU combined can pull, the CPU can only do ~250GB/s of memoy bandwidth by itself, which isn’t bad, but the SIMD performance that all modern HPC code relies on isn’t very good with ARM (more specifically Apple’s implementation of ARM) which leads to worse performance what the 250GB/s memory bandwidth number would suggest.
You aren’t all wrong…800GB/s is split among “performance cpu core cluster”, “efficiency cpu core cluster” and GPU. So there is some sort of hard/soft limit that no one could grab all the bandwidth.
~250GB/s number is from Anandtech? That’s M1 Max which has memory bandwidth of 400MB/s. I would say more than 50% for CPU clusters more than excellence out there.
Since the downfall of Anandtech, I haven’t seen any measurements beyond M1 Ultra, and M2 family of SoCs.
Hopefully I don’t sound like defending Apple. Software especially HPC need more time to catch up with the hardware. OP’s domain might rule out Apple right away because of not Linux and because of being ARM.
You bring up a good point about there being big and little cores, that could cause problems if the code isn’t aware of the distinction.
All of the HPC code I run on M1/M2 knows to completely ignore the little cores because they will drag down performance since they won’t complete their portion of the solution in a time frame close enough to the big core’s solutions which causes solution assembly to stutter.
I’m getting that 250GB/s figure from passmark’s memory benchmark (so take with maybe two grains of salt).
Yeah, its sad what has become of anandtech since they’ve been hollowed out. They still do decent PSU reviews though.
I feel like it’s the other way around. Almost all the code I use takes advantage of the latest SIMD instructions as soon as they’re implemented in hardware, and I’d love to have the problems I deal with run on GPU but the hardware isn’t capable because of lack of memory capacity.
I know CUDA can keep memory coherency between multiple GPUs now but that is not a very good solution.
I think we’re going to have performance gains for HPC code stagnate in the future because of the imo counterproductive fixation on low precision compute performance that NN and ML require and it being the current fad… just like all that crypto mining development that happened years ago (which I also think was a waste), only this time it seems like all the companies are betting the ranch on NN and ML being the future.
Passmark…I’ll take it with a huge pinch of salts. You should not have quoted the number to begin with IMO.
It’s because people have been spending many years of effort on x86-64’s SIMD instruction extensions. Software has better support for sure.
Apple Silicon has just started for slightly more two years. Software support need time to pick up hardware features. That’s what I meant by taking time.
I also don’t think Apple is specifically targeting HPC markets. Especially they decided not to design Server SoCs and let the team leave the firm years ago. However, their SoC architecture as-is is very “HPC like”. That’s what I think it’s a good fit for smaller problems or at individual level of HPC exploration. A mathematician sounds like a fit to me.
hahaha, passmark’s memory benchmark numbers do line up with aida64’s numbers on most platforms… although I’m not sure how much that really helps passmark’s case because aida64’s numbers are kind of ridicoulous too.
but chipsandcheese has an M1 Max CPU benched and found it to do 127GB/s read and 50GB/s write, which would make the M1 Ultra line up well with passmark’s numbers so it seems valid enough.
my jab at apple’s SIMD implementation was lack of SVE support at the hardware level and instead opting for NEON which is going to be deprecated soon enough.
I can‘t speak for OP obviously, but for a simulation workstation a mac studio is a hard pass from me.
No expandable memory
If you‘ll be running your code on a cluster you‘ll have a different platform locally, not just OS but the whole instruction set.
If you need more storage/fast network you are stuck with thunderbolt, which will be expensive or limiting.
If you want to go with GPU, you‘re limited to Apple‘s solution.
The new xeon 3400 or epycs with DDR5 could be worth considering though. You’ll get more memory bandwith. The question is if the amount of memory needed fits your budget.
I don’t know your storage requirements, but if you wouldn’t be hammering storage heavily a couple of high end pcie 4 nvme‘s would be fine.
Hello, Thanks so much for your recommendation! I’ve been told it’s better to buy them in “kits” to make sure they all work right together. Will I run into issues I If were to just get say 8 of these individually from that list?
Hello! thanks so much for your response. My main concerns with apple would be the lack of expandability and the being locked into their system/form factors(mac in general). Though I think you’ve raised some good points to consider that I hadn’t thought of.
This is a very interesting point. The simulation is definitely CPU heavy though I expect the read/write to be the main bottle neck. I had not dug very deeply into the difference in memory speeds between the CPU itself.
It’s a difficult trade off but especially given price diff and the ability to update overtime that’s a pretty compelling avenue to consider. Tho I’d imagine I’d have to switch out the motherboard
Thank you for the point about the SSD as well! I will look into it, is the M.2-2280 form factor going to be slower than a straight PCIe plug in? Using that would free up some slots tho which is good.
edit: It definitely seems like intel xeon night has the edge memory wise?
thank you! Here’s the thing. Every agent’s update is essentially “read, compute, write updated state”. So if I can’t fit the entire DB in memory (which is basically not happening given my budget and the size of the graph db) then I’m pretty sure I’ll be hammering that SSD pretty hard.
this is going to be a bottle neck especially given my budget I’m trying to limit it as much as possible. So my thought was to lean toward the PCie NVMe storage as that’s probably the best I can afford. Can I ask, if I was to hammer the storage what would the option be?
I’ve heard tale of burst buffers but have no idea how to set them up
U.2, m.2 and pcie cards all use pcie lanes and the nvme protocol. Usually enterprise ssd‘s come with u.2 and consumer ones with m.2 interfaces.
The best would be optane. They are basically indestructible and have low latency. But expensive.
Otherwise rather an enterprise drive than a consumer one; they are usually slower but more durable and offer more consistent performance. Consumer ssd’s tend to slow down after writing large amounts of data. I don’t have specific recommendations but maybe some others do. But look for high endurance (TBW) and high iops.
That is another angle to look at, threadripper 5000 is a dead end platform whereas the xeon W790 platform has one more future CPU release on it.
ASUS makes a SAGE W790 motherboard for the xeon that’s pretty close to the price of the WRX80 SAGE motherboard.
I’m not exactly recommending this, but if you wanted to save thousands of dollars, engineering sample versions of the xeons are plentiful right now and are confirmed to work on the ASUS W790 motherboard. ~200USD for a 48 core CPU, QYFQ is the CPU confirmed to work on the Asus W790 board but I suspect most if not all of the other engineering sample CPUs work as well.
If it were me, I’d probably get the W790 setup with an engineering sample CPU, and then when Emerald Rapids CPU update comes out, pull the trigger on that since I saved so much money on the initial CPU.
Engineering Sample CPUs are going to have lower clocks than a retail CPU and aren’t overclockable.
The M.2 slots will be exactly the same as the normal pcie slots in performance… assuming the particular motherboard has the M.2 slot routed straight to the CPU as opposed through the PCH chipset; also assuming we’re talking about apples to apples lane width, since M.2 is only x4 PCIe lanes wide.
Yes, assuming you get the 8 memory channel version of it it completely blows threadripper 5000 out of the water.
There is also a 4 memory channel version of the Xeon that you’d likely want to stay away from since it only has half the memory bandwidth, both are called W790. The Asus SAGE W790 motherboard is an 8 channel variant.
You only need to worry about matched “kits” when you are purchasing overclocked memory.
Never an issue for any kind of stick running at stock JEDEC frequencies no matter the DDR generation.
RDIMM take any issue of driving compatibility out of the equations. (R)=registered=buffered RAM dramatically lessens the load on the cpu IMC driving the memory. The lower clocks and buffering gives you the ability to never worry about (matched kits)
This seems like a solid option. The more I read about the intel xeon processors the more it seems like it might be the right direction. Goddammm those prices tho haha. The ES cpus seem like a good temp compromise before a larger investment. Even I half the Ghz I think given the memory bottleneck I’d turn out alright. (if I choose to simulate 6 months rather than a year my very napkin calculations say I should be able to handle it in like 24 hrs.)
My only hang up would be I don’t know where to purchase an Engineering Sample CPU with any reliability. Do you have any recommendations toward that?
Normally I wouldn’t recommend engineering sample CPUs because they are the wild west, but because they are available from within the US (buying from China is semi-sketchy), they are confirmed working for the motherboard, and because they are much cheaper than normal, it’s probably worth a look.
Here’s the STH thread with a little more info:
As to where to get them, search on ebay for “intel QYFQ”, you should get a couple hits and they should be less than 200USD.
I’ve done some research since and I think I’m going to go with this solution. In fact I think I’m just gonna copy the build in the link lol. It fits within my budget and it’s as close to a “here’s how” with the components I’ll probably get.
Given the fact that the next gen thread rippers are almost definitely going to be a different socket it seems like a bad idea to invest that much in a set up I"ll have to replace entirely. Even if the 7000s do what the hype says they might do, and that’s if. Current memory performance wise it’s intel. And the WX790 board should serve for at least a few upgrades over time. It makes the most sense. This has been super valuable so thank you!