Help building a system for machine learning

My son (20 years old) who is more tech savy then me is thinking of learning AI models and machine learning. I would like to help foster this and support him. I have lots of spare hardware and am about to upgrade my main server so I have that motherboard, CPU and RAM. I also have several 1TB M.2s available, and about 30TB of HDDs collecting dust. Also, my internet is 1Gb symmetrical fiber.

Here is the specs I’m thinking of building: All this hardware is already owned
24 core Threadripper 2970WX
X299 Asrock Taichi motherboard (2) X16, (2) X8, and (3) M.2 slots.
64GB 3200 MHZ DDR4
1TB M.2 for OS
(4) 1TB M.2 in carrier card in RAID 0 for data sets

Last night I bought a MSI gaming 3090 should be here next week.

I don’t know exactly what his plan is yet but I want to build a general purpose setup that he can remote into from his place and do whatever he wants. Does this seem like a reasonable build, anything I should change? what would be the best operating system to install? Any advice?

Thanks

Quite good build for ML things. I´d say something like a 3060 and 32gb of ram is plenty for getting into ML. Does not hurt to have more ofc. 4000 series is technically better, but afaik some things like tensorflow still do not support the latest version of cuda. So it´s a fancy brick in some cases when not supported.

The 3090 is ofc better. The main benefit being that 24 GB VRAM makes you able to fit larger models. You´d really rather fit it into VRAM fully than use RAM. Wether or not he’ll benefit at all from the larger VRAM depends on what he will be building. I did build some object detection model at work in tensorflow that used around 10gb VRAM. On a 5900x training on the CPU 1 iteration took over 2 seconds. On a 1070 took 0.27 seconds. Got a 3060 at work after showing my boss. That one does it in roughly 0.2 seconds. So… thats really not that big of a difference compared to the 1070 given that thats a 6-7 year old card that predates dedicated ai cores. But the 1070 sometimes fluctuates with the final model up to 0.5s 1s because 10gb > 8gb so it has to grab stuff from RAM occasionally and that slows it down.

Really, I think GPU core speed does not matter that much. You´re mainly looking for a large amount of VRAM. How much you need depends. You can do a lot with 12/16gb already. Using GPU gives a huge improvement. 2 seconds with 10k iterations equates to 5.5 hours with 0.2 seconds it´s more like half an hour.

NVIDIA cards are definitely easier to use with ML as most things are written for cuda. Theoretically, rocm can translate cuda calls to whatever amd uses. But I can´t tell how good that is.

Thanks for the input. The VRAM is the main reason I went with the 3090. I thought about using a 3080 that’s in my gaming system but it’s only 10GB.

I look forward to seeing what kind of projects he (we) come up with.

The 3060 is one of the best value cards I think for ML work as you get 12gb vram on it. NVIDIA charges quite a premium for more vram than that. 100% sure ML is the main reason they are so “cautious” about how much vram they put on their gaming cards.

I would recommend a downsize. Your son is not going to dabble and become an AI god overnight and it will take years of research before he will even start to figure out what to do with a really powerful rig. Even then you can hire CPU and GPU cycles from Amazon which is less costly than an AI rig.

I would go with an AM5 system, B650 coupled with a 7900 plus a 3060 or 4070. VRAM is important here. Here is a quick PC part picker of the core:

PCPartPicker Part List

Type Item Price
CPU AMD Ryzen 9 7900 $429.00
Motherboard MSI MAG B650M MORTAR WIFI $199.95
Memory TEAMGROUP T-Create Expert 2x32 GB DDR5-6000 CL34 $164.99
Storage Kingston KC3000 4TB M.2-2280 PCIe 4.0 $365.75
Video Card Zotac Twin Edge GeForce RTX 4070 12 GB $599.99
Total $1759.68

As always the above is a starting point, feel free to add, alter or remove from the rig as necessary. Since you already own all the parts, though, no need to really change anything either. But do you need 3090s or threadrippers for AI? No, not by a long shot.

That’s plenty overkill for someone who’s just starting, free colab would’ve been more than enough, but since you had most stuff laying around, why not?

Just throw ubuntu at it, since that’s what most of the tutorials online are based on top of.

1 Like

That seems to be wrong by now works since 2.12 needs at least cuda 11.8 I thought 4000 series needs cuda 12.

30 series is also preferred imo cus nvidia removed nvlink for the 40 series.
its possible to buy 2 3090s paired w nvlink for less than a 4090 on ebay

NVLink is not that relevant for just a pair of 3090s doing ML, best case scenario you’ll notice a 5~10% perf uplift.

I already owned all the parts except the 3090. I don’t think it would be worth it to me to buy all these parts when I have a full system worth of parts sitting on a shelf.
Thanks for the feed back.

Fully agree that if you already have all the parts, no reason to buy that. TR will do the trick but it is very overspecced for that purpose, kinda like taking an F1 racer to church.

My post is mainly towards others looking to order a new system as an AI rig :slightly_smiling_face: Why spend an extra $1k-$2k on a threadripper system if you do not have to?

This is what I am using;

I am not sure I would recommend Threadripper when regular AM4 / AM5 offerings are fine. Having an RTX 3090 is likely gonna be more important.

However I strongly encourage using cloud instead. Its just better in pretty much every way for these purposes. More cost efficient, you get access to better hardware, you gain more valuable skills of using cloud vs. bespoke local hardware, there’s really just no reason to build for this purpose unless you already 100% know that you must build something

additionally, if your son is really, really interested in learning AI and ML stuff, then I would also highly recommend instead investing the money on college courses to learn! You will get much better bang for your buck this way. AI / ML is a very complex subject so they are better off paying for education (which will likely supply a compute environment) than paying for hardware which will go out of date rather quickly

maybe on pytorch yeah. im too poor to try it out myself but i know theres code in tensorRT which utilizes memory between two nvlink gpus.

edit: but yes, its diminishing return much like sli for gaming. but if u NEED 48GB vram cheap its a method of accomplishing that

You don’t need NVLink for that, just plopping 2 GPUs will give you 48GB of vram already. What NVLink provides is a fast way for one GPU to get data from the other, but the same can be accomplished with regular PCIe.

Using NVLink for ML won’t suddenly make your dual GPUs appear as a single 48GB GPU, be it either in tensorRT, pytorch or tensorflow.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.