I’m a homelabber as a hobby and hardware developer by day.
I’ve been using consumer hardware at work for quick builds and compiles for my workloads. I’ve been using an overclocked 12900k (yes I’m crazy) with 3400MT/s 128GB DDR4 for about a year now and it’s been great. The only issue is some of my workloads start to get unhappy when I run out of the dual channel memory bandwidth.
My workloads are extremely single thread performance bound. The faster the core the better, the lower the latency of memory the better.
I’m going to upgrade to a 7950X3D because I’m very interested in seeing if the 3D vcache will help with my specific workload. And it would be nice to lift the memory bandwidth limit a little with DDR5
I’m looking at buying an Asus Prime X670-P and a 198GB DDR5 kit as it’s on the QVL list for the motherboard. Crucial CMK192GX5M4B5200C38
What I’m wondering is if there are any workstation motherboards that support this CPU that might have built in IPMI?
I do use pikvm’s as a solution but if there’s anything with native workstation features available that would be better to have an integrated solution.
Also, having actual full multiple pcie slots for putting in GPUs would be a bonus. I’m jumping on the AI train and I’ve got a couple 4090’s to pop into the system soon.
Generally I’ve only pushed up to about 90GB in general on memory for my workloads but that’s because I would run out of memory bandwidth and cores on my old cpu before I could run more. I had to disable the efficiency cores because I run on a slightly older version of red hat to support my software and it didn’t support the multi core type architecture of my CPU.
Latency is still a big deal for memory in my workload which is why I went with DDR4. It was only after I built the machine that I found the roughly 40GB of memory bandwidth would hamper me more.
Oh sorry I should have stated why I didn’t go with threadripper 5000/7000. With threadripper pro I lose out on some single core performance, that’s really the main reason for going with consumer stuff.
But also the 3D vcache may make a huge difference but I don’t see my workplace paying 10x my request just so I can get a larger cache. Part of this is learning if the 3D vcache will make a difference.
This doesn’t seem to be the case as much (or maybe at all) with TR 7000, it really is fast at single core unlike TR 5000.
If your workload accesses alot of memory regularly, 3D vcache will likely net you worse performance than the normal 7950X due to the reduced clocks on the 3D vcache part.
If your workload has alot of single threaded portions that are memory intensive in it, I’d definitely look at the single threaded memory performance of processors too. The memory bandwidth numbers touted for processors are always what can be achieved with multiple threads hitting the memory controllers.
Get a cloud instance for an AMD Milan-X node. Great lab to test the impact of 3D-VCache on your workload. Saves you from committing a lot of money for something that just isn’t working the way you expect or hope for.
This doesn’t seem to be the case as much (or maybe at all) with TR 7000, it really is fast at single core unlike TR 5000.
Oh that’s good to know, I should look at some more 7000 series benchmarks compared to 5000 and correlate those with the 7950x3d. A lot of the benchmarks for the consumer stuff is targeted at gaming and I haven’t found what common standard benchmark is closest to my workload yet that I can look up stuff. I often have to just guess.
I can see Puget actually has some 7000 series parts but only the 4 slot DDR5 model boards.
Get a cloud instance for an AMD Milan-X node. Great lab to test the impact of 3D-VCache on your workload. Saves you from committing a lot of money for something that just isn’t working the way you expect or hope for.
What’s the easiest way to do this. I tried to go through some of the intel developer cloud but it was so hand wavy it left a bad taste in my mouth and I couldn’t get through it easily
Boards for Threadripper Pro will come “soon”. They just released the normal TR.
Intel doesn’t have 3D-VCache. But you may check out a node with Xeon MAX and on-die HBM. The clocks won’t be high enough, but memory latency and bandwidth should be the best you can get.
I was using intel as an analog for how to actually get a cloud account inexpensively and not jump through so many hoops. I’m even fine with using a personal account to just experiment on my own time. I assume the AMD cloud is similarly marketing heavy and I will have to jump through hoops. But these are just basic questions I should start researching to really try stuff.
The Xeon Max was actually what I wanted to try demo-ing because I do like me the idea of using HBM
Ok I think you folks have convinced me to move to HEDT instead of pushing on consumer platforms any more. The PCIe lane width for multiple GPU’s is pretty much killing any chance of staying on the consumer side.
That being said now I need to figure out whether to wait for TR PRO or just go with regular TR. I think the main difference is the memory channels, and I’d be well off with just 512GB system. Are there any more motherboards coming that would be able to hit 512GB with non PRO TR?
The main driving factor now is cost. I was looking at the non-PRO costs and those seem attainable for my business use case. The TR PRO might be hard to justify, but I’m also wondering if the single core speed with take a hit on the pro’s. MORE RESEARCH TIME!
Create an Amazon account, supply your credit card info and you’re in. Start an EC2 instance with the right type and bob’s your uncle.
Issue is that I could not clearly identify instance types based on EPYC CPUs with 3d-vcache. Well, simply stand up small instances for each of the three types mentioned in the above announcement and check if any of them have 3d-vcache (log in and do a cpuinfo or run CPU-Z or CPU-X depending on your liking). Should not cost more than a couple of bucks and an hour or two.