I am a long-time YouTube channel follower, and this is my first time posting on the forum.
I am in the process of setting up a workstation for a new data scientist position that I will begin soon. I have been asked to suggest a configuration that would be suitable for my new job.
Currently, I have Ubuntu installed on my machines (Lenovo T495 and Ryzen 3600), and I mostly work with
Matlab,
Python (pandas, NumPy, etc.), and
R.
With my upcoming job, I will mostly be dealing with environmental dataset, including
cleaning,
statistics,
some machine learning, and
data presentation.
I am considering a configuration based on either Ryzen 7900x or i7 13700k, as they seem to meet my requirements. Solutions with more āproā CPUs (Threadripper and Xeon) exceed my budget.
Intel Platform
PC Component
Product
Motherboard
ASUS ROG STRIX B760-G GAMING WIFI
CPU
Intel Core i7-13700K
RAM
Corsair Vengeance 2x32GB
Video Card
Zotac VGA Zotac RTX4060TI 8GB Twin Edge OC
Storage HD1
Kingston FURY Renegade 2TB
Storage HD2
Kingston NV2 2TB
Case
Fractal Define 7 Mini
Power Supply
be quiet! Straight Power 11 Platinum
CPU cooling
Noctua NH-U12A
Additional Cooling
Fractal Dynamic X2 GP-14
AMD Platform
PC Component
Product
Motherboard
ASUS TUF GAMING B650M-E
CPU
AMD Ryzen 9 7900X
RAM
Corsair Vengeance AMD EXPO 2x32GB
Video Card
Zotac VGA Zotac RTX4060TI 8GB Twin Edge OC
Storage HD1
Kingston FURY Renegade 2TB
Storage HD2
Kingston NV2 2TB
Case
Fractal Define 7 Mini
Power Supply
be quiet! Straight Power 11 Platinum
CPU cooling
Noctua NH-U12A
Additional Cooling
Fractal Dynamic X2 GP-14
Budget-wise, the two solutions are close in price, with AMD being slightly more expensive at 1690 EUR compared to Intel at 1610 EUR.
I have a few questions and concerns regarding the CPU and GPU choices.
First, I am wondering how Intel CPUs are handled in Linux with the Performance vs. Efficiency core. Is there a risk that I will have issues because of this ānewā architecture if I pick Intel?
Additionally, I am curious to know how AMD compares to Intel in scientific benchmarks.
For the GPU, I have opted for a 4060ti, assuming that Cuda will handle some of the workload (CuPy). However, I have two considerations here:
Professional GPUs from NVidia are way above my budget, with an RTX A4000 costing around 1000 EUR.
I am not sure if AMD GPUs have good support in this field. The Radeon Pro W7500 and W7600 have an interesting price, but I am still determining how I can benefit from having them onboard.
I welcome any thoughts, experience, or suggestions. Thanks!
My first inclination would be 64gb of memory just to give yourself some room for processing the data sets / machine learning.
The other idea is for a 1 tb primary ssd for your os / apps and a 4tb for your data storage. Now this is just a personal preference as I like to keep my data separate incase the os or an app goes south I can just reimage the primary drive without having to worry about data loss perse!
Sooner or later you will encounter situations where 64 GB doesnāt cut it. I think you are wise to use 32 GB modules, giving you headroom to go to 128 GB.
That said, I am frequently running into this problem with 128 GB. I am at pains to āupgradeā to a TR/Epyc/Xeon system, because a lot of my work relies on single threaded performance. I donāt think there is currently a solution to this situation.
EDIT: Iāve been rethinking this and I think I have failed to recognize that even for single-threaded tasks, with the additional memory I should have greater headroom for embarrassingly parallel tasks. Scoping out a 5975wx upgrade now.
Keep in mind memory bandwidth matters more and more as the size goes up.
Iāve got a pretty heavy FEA workload that is very parallelizable and a 16 core Xeon W will outperform a 64 core Threadripper Pro by ~60%. The Intel cores are only about 20% faster than the Threadripper cores single threaded; this just goes to show how much more memory bandwidth is available on the current HEDT Intel platform.
Iām not sure how constrained you are in terms of your software stack, but if youāre looking at needing more than 64GB of memory there should be efficient ways to parallelize. E.g. apache arrow, polars are things to look in to. Convert your csv files to a more efficient format like parquet files, hdf5, etc.
When it comes to hardware, I agree with others to look at the 48 GB modules as you could upgrade to 192 down the line.
Thanks for this. I have since reconsidered the zen 3 route, given zen 4 Epyc is available and Xeon W, as you say.
Can you comment any more on what your setup is?
Quite constrained, unfortunately. Working with single cell transcriptomic datasets that are reaching > 1 million cells and there are only a handful of reputable tools out there for analysis.
Most linux distros will work on a variety of cpu platforms without any problems.
But ther are some with specialized versions customized for the different architecture.
These can be found at distrowatch.
Genoa Epyc is likely even faster than the current Intel HEDT, at least in my workload. Thereās also the 7000 series Threadripper that should be out in ~6 months but it likely wonāt be faster than current Intel HEDT in memory performance.
The situation I described is for a 100m DoF FGMRES problem, the Intel system is a W5-3435x with 512GB of memory at 5958.4MHz, The CPU is running 5GHz when lightly threading and 4.8GHz all thread. The Threadripper system was a non-OCād 5995WX running 3200MHz memory; its a tiny bit apples to oranges since the Intel system can overclock but Threadripper 5000 platform isnāt really conducive to overclocking like the Xeon W-3400 platform is.
That 100m DoF FGMRES problem is actually just a standalone benchmark I cooked up to represent my workload, if anyone else wanted to run it. Iād be really interested to see someone with a Genoa system run it.
As an aside, I spoke to someone who works at our local university and they are buying a Dell w9-3475x workstation for ~Ā£5k. The university must have a beefy service contract with Dell to be getting that ādiscountā.
The Epyc Genoa series has my interest (probably the 9274F), and I can source a motherboard (H13SSL-NT) from the UK. The thing that is putting me off is cooling. I donāt have much time to spend tinkering and need something that just works.
The gotcha with cooling the high end platforms nowadays if often RAM and CPU VRM. The motherboard manufactures seem to be designing assuming the MBs are mounted in rack cases with massive amounts of airflow going over the components.