Beginner building a workstation for ML/Data Science $5k US budget

HI Techs I’m new to the forum and am thinking about doing a ML/Data Science build. I’m pretty familiar with building gaming rigs though I’m not a gamer but I have no idea what would be the best set of components for this use. I have four PCpartpicker builds made each a bit more expensive and capable than the next. My budget is $5k US. I’m leaning towards the “Moderate” or “12/24 TR” builds and I’m not sure if I even need a gaming graphics card or if I should spring for a founders edition or something. I’ll just post the builds and thanks for any input.

This is the “Minimum” build I’d need to upgrade my 955BE build from 2008.

This is a better build “Moderate” and more in line with what I’d make regardless of use.

This is the cheap “12/24” TR build in case I need more horsepower.

This is the up the 11 “16/32” TR build just cause I can.

I know that for ML more threads is better and that’s about it.

GPUs make a lot more sense to put money into – not a ton of modern libraries are restricted to CPU any more. Grab a Ti or Titan XP or two and a quad core and call it a day.

of course, this depends on your specific ML application. What frameworks and data manipulation do you plan on doing?

Ya I’d agree with this, If you really still need a bit of CPU power an overclocked R7 1700 should do the trick. Unless of course you wanna spend some money, then the 12 core TR is a good choice too.

Well, for data science my current workstation is a FX-8320e with 32 GB of RAM. Operating system booting on a SSD and lots of HDDs for data storage. For my use cases I need lots of CPU cores and a lot of memory, the GPU on it was a GT-630 until months ago.

Some things that you do need to take into account:
-Case: Choose one with good air flow, lots of fans. You will need space for the HDDs and it would be good if they could get some fresh air, specially if you will keep the workstation running 24/7.

-Power Supply: Go for at least 850W. Keep room for some expansion (GPUs, more SSDs or HDDs)

-Processor/Motherboard/RAM: I think that the R7 1700 is a good choice, but you will need to look for how much memory the motherboards can handle. If you need more than 64GB I would go for the TR. Also, be aware of the number of SATA, M2 and NVME ports.

-GPU: It depends on your use. Do you work with any algorithm that needs a GPU? If not, buy a 1050 TI or 1060 and you are good to go. Eventually upgrade it when the prices start to fall.

-Monitors: I’m using some from Dell and they are very nice. The 27" 1440p are my favorites.

About the operating system, I do not know if you will need the Pro version of windows. You can save some money buying the Home edition.

Also, expend some money with a good keyboard and mouse.

If you have a B and H or best buy or microcenter nearby I would check out the new monitors they have now.
Honestly I would go all out on a drop dead gorgeous monitor and then decide what else you want, I love my 27 inch 1080 that replaced an old 19 inch and it was the best money I ever spent.

Are you into overclocking?

Ah gotcha. I’ll mostly be working with R and Python. I’m just getting started so I’m not sure on the specific libraries as I’ve only done random forest training so far.

Ok that’s pretty much my 12/24 TR build. Maybe just upgrade the graphics card. I didn’t really wanna waste money on a crazy overkill system not knowing what I might need so thanks for the insight.

I’m thinking either dual 25" or 27" Dell or an ultrawide 34" and 27". All 1440p cause I don’t see any need for 4k.

I shy away from overclocking unless there is a specific need for it. If I can’t run something well without it I’ll do it but it’s not just something I do for the sake of doing.

The cost isn’t really a barrier I just didn’t wanna buy a TR and never utilize it properly. But I guess too much is better than too little.

If none of your use cases benefits from GPGPU (doubtful) there’s no reason not to go with TR, If your budget’s 5k, then you can throw whatever overhead towards Titan XPs without a problem.

I get what ya mean, also I don’t want to be that guy but waiting for Volta may not be a bad idea if you can get by on your current GPU for now. Don’t hold me to it but I think Volta may have specific cores on it designed for things like TensorFlow and other ML/Data Science stuff. I’ll link to Nvidia site below and you can decide for yourself. However 2 Titans or 2 1080 Ti’s will also be great. I just don’t know how close Volta is and with the release of Vega I wouldn’t be surprised to see it Soon™. I only mention this as I don’t know how deep into this you are but if you’re willing to drop 5k on a machine I’d imagine you’re pretty into it!

Do let us know what you end up choosing. I’ve been doing a little ML/DS with my machine as well but I’m on a more “budget” solution than what you have planned (R7 1700 oc’d and 1080).

Volta has dedicated tensor cores which are basically ultra wide vector units that are optimized for fused multiply-add with half precision floats only. If this very specific use case is what you are doing expect big improvements with Volta.

Nvidia has been secretive about the card’s release dates but it looks like consumer GPUs aren’t coming before early to mid 2018. Still a long wait…

Neither R nor Python are any good at utilizing lots of cores efficiently, which is a shame considering that often times the work is stupidly parallelizable. Anything better than an R7 1700 would be a waste IMHO, consider switching to c++/go if you want to do lots of stuff on CPU — on which case get all the cores you can get, you can easily saturate a 1950x or even a single socket Epyc system (32c/64t).

If you’re considering using tensorflow, tensorflow works with CUDA/1080Ti , but is not as efficient as it could be … You could theoretically put 4x 1080Ti cards in there for tensorflow if you wanted to, it would use all 4 of them to get the speedup, it’s definitely cheaper to get those same cycles running on a GPU than a CPU.

With your kind of budget, I’m thinking 1950x + 1080Ti so you can mix and match, later on as GPU ML support matures and cards get faster or maybe even specialist cards show up, keep the 1950x and just upgrade GPUs or buy additional coprocessors/silicon/whatever, in the meantime a 1080Ti should be plenty to help you get your feet wet.

Sounds like the OP intends to use frameworks which are merely controlled by python/R. Just like tensorflow.
Nobody in his right mind would do anything computationally expensive in python.

Python uses one core exactly. No more. Ever. If python is indeed used get something with high single thread performance and forget about cores.

Gotcha thanks for clearing that up a bit, I would have waited for Volta but I built my system back in June and found a 1080 for a decent deal.

I also don’t know if I’d wait for Volta if it wont be available until mid 2018. Not to mention for personal use (in OP’s case) two Titan Xp’s would be pretty much all you’d need. My 1080 is solid I could only imagine the Titan is even better.

@Qalmana I skimmed through the posts, and although I am not familiar with your use case, no one mentioned co-processors.

I have seen builds for number crunching running 70+ cores using PHIs and the like.

Would a co-processor be of any use to you?

I’m not sure it would fit into my use case, maybe into OP’s. Looks like it would be great for that type of work. However my system is a bit more general purpose. I use it for everything from recording audio, to gaming, data science, and programming.

1 Like

Oh geez, you are not OP. Ugh, I am groggy.

No worries, I absolutely am as well, still early…

1 Like

I’d only use a Xeon Phi as last resort because they can’t keep up with a graphics card performance wise. Xeon Phis are useful for massively parallel workloads where the developer is too lazy to port the program to a GPU and so just executes CPU code on the card. If your program doesn’t support GPUs but does support Phis go for it. Otherwise get a proper graphics card or two.

1 Like