Single motherboard + gpus vs cluster

I’m migrating from a desktop to a server/cluster.

use case: siemens nx(nastran), comsol(multiphysics), local llm(that can read/train on pdf’s, ocr), anything it needs to do my financials.

after some reading around I got down to two options:

A
single or dual sp3 epyc 7742
asrock rack romed8-2T(more gpus, single cpu) or gigabyte mz72-hb2(less gpus, dual cpu)
512gb ddr4 3200 rdimm
ebay rtx 3090

upgradability : keep adding gpus

B
epyc 4545p
asrock rack epyc4000d4u
96gb ddr5 5600 cudimm(ecc)
pro 2000 blackwell

upgradability : keep adding nodes

I’ve never dealt with servers before, I don’t even know the right questions to ask.

maybe there’s a third better option?

also apparently clusters are more suited for multi-user inference and offer no benefit for single-user. Can anyone confirm this?

Tell us more about your current setup and why you want to migrate from it. What problem do you wacht to solve with the upgrade / migration?
Also describe your workload a bit more detailed, do you want to run all this in parallel? Are you the only user running this apps?

Do you know how to setup any of your use case applications up on a Server OS, e.g. Linux? Are they even able to run in some sort of cluster and/or mGPU mode? What OS do you want them to run on? What would your expectations / definition of a cluster be? What benefits do you want to gain from it?

I would definitely recommend that you don’t buy anything before you can’t at least answer the above questions.

thanks for the questions mietzen, this is exactly what I was hoping for

My current setup is:

ryzen 8600G
b650m pg riptide
2*48gb ddr5 6000
1tb sk hynix p41
seasonic focux gx 750 atx 3.0

I want to migrate because comsol motor flux simulation keeps crashing before it completes (about 2 days to complete), crashes after a few hours, I let it run on two separate instances for 3 days to confirm if its just unresponsive ui). I had the same issue with nastran on my laptop and once I changed over to this desktop, the issue disappeared. I also want to speed it up to a few hours at most.

It’s single user, just for me. I’ve been working on an iron core linear motor for my machining center. I don’t want to start building it or open source the design before I optimize it via simulation.

I want to run siemens nx and comsol on either windows server 2019 or pro11 for workstations. My current license is locked to a single node for both, but I can get the HPC add-on, that would enable support for running the simulation on a cluster.

The llm on whatever OS is best for it. I want it to take scanned documents, use ocr, then take that data and ideally fill financial statements, tax returns etc. Right now I do this manually with grok, I feed it a single pdf at a time, then I get answers on what I need to write in which field etc. I would like to streamline this and avoid completely exposing my business data to a company that has no business knowing it.

I was leaning over to building a cluster mostly because of cooling efficiency, low power draw of the components (4545p+pro 2000 blackwell) and the ability to add additional nodes if I need them(which is also what I would expect, almost linear increase in performance the more nodes I plug in, granted I use nic’s inside with sufficient bandwidth). if the dual sp3 7742 is the baseline I was considering getting 2 nodes of the 4545p initially and see how things go from there, I expected a x1.8 performance gain from adding the second node. It’s also more elegant.

My mains can only support up to 3000W for the system/cluster as a whole.

This might be true for COMSOL Motor Flux simulations, which I haven’t personally used, but I don’t think the same applies to Siemens NX. Do you also run simulations in NX, and if so, can the workload be distributed?

Could you describe your workflows and software setup in a bit more detail?
Does COMSOL Motor Flux benefit from GPU computing?
What do you primarily use Siemens NX for just CAD, or do you run simulations there as well? If so, do those simulations utilize or benefit from GPUs?

If COMSOL Motor Flux is the only software that gains from distributed computing, my next question would be: will you continue using it for future projects, or is this iron-core linear motor a one-off?

Regarding LLMs: I’m no expert, but Ollama works pretty much out of the box with NVIDIA GPUs on Windows and Linux.

Have you heard of paperless-ngx? There are AI plugins like paperless-gpt or paperless-ai that can help with tasks like naming, tagging, and organizing documents. It won’t do your taxes for you, but it can give you a solid basis to gather everything you need for the relevant year.

Do you have enough memory for the simulation? If it reaches into the pagefile for compute the behavior you described is typical.

I’d look into your mesh and element discretization since Comsol doesn’t really benefit from cluster compute outside of sweeps.

I did simplify the mesh so it fit into the 96gb’s. At this point I assume errors in ram piled up(due to overheating?) and that’ what crashes the system.

If comsol doesn’t benefit from clusters, then the case is pretty much solved. LLM’s prefer multiple gpu’s on a single system over clusters from what I’ve read. If clusters aren’t a solution to either, then I have to stick to a single motherboard.

Maybe I should go with 9115 or a pro 9955wx for the pcie 5.0 lanes instead of the 7742.

high core count + ddr4 3200 + 7x pcie 4.0 or low core count + ddr5 6400 + 7x pcie 5.0

I guess I have some more reading to do.

ahh that is another possibility, The denser RAM we run nowadays is getting challenging to cool when it is actually stressed.

Comsol is getting a general purpose (not just for acoustics) CUDA solver in the next version (it remains to be seen if it will actually be faster than CPU solver however), so that might sway your decision too.

Only simple structural, which my current pc does just fine. I used it extensively when I was comparing different structures in terms of stiffness for the machining center. It is my go to CAD though mainly due to convergent body feature.

I have a wip radial direct drive motor for the rotary table so I can also do turning on the machine.

that’s a great looking tool, thanks

Yesterday I found this peculiar board: GENOA2D24G-2L+ / TURIN2D24G-2L+

Once I saw it I knew this was “it” and had to get it. I’ve never seen a more beautiful motherboard! I can fit so many pro 2000/4000 blackwells in it, all while maintaining proper cooling.

Both are Dual Socket Boards. Are you sure your application can profit from it?

I would’ve suggested buying/building a Ryzen Threadripper PRO 9000 Pro Workstation. Aren’t 96 cores enough? :wink:

It’s mostly about the memory, with additional 12 slots from the second cpu, I’m more likely to hit the required amount without needing higher capacity sticks, prices of which can get quite insane the higher you go and since all server motherboards are expensive anyway, might as well get the most out of it.

There are a few good deals locally($700-$750):

epyc 9115 (16 core, 2 CCD’s, DDR5 6400) vs epyc 9334 (32 core, 4 CCD’s, DDR5 4800)

What about any of the QS/ES cpu’s from china, are they worth the risk?

What do you guys think?

The AMD Epyc ES/QS chips have lower clock speeds. Ones I have bought in the past also had non-functional memory channels so your mileage may vary.

Also, dont buy the low end epyc with high power ram. It wont get you anything. The 9115’s are bottlenecked to about 200GB/s due to the low connection count between CCD and IOD.