TLDR
I need to buy a workstation or server for a Data Scientist with an ~$10K budget. How do I get my money’s worth? Is it worth spending the extra money on an A6000 or A40 (requires a rack-style enclosure)? If so what’s the best way to go about virtualizing the hardware?
Background
I’m a Software Engineer who started working on a startup that uses some “AI” about a year ago, we recently raised some money and are on track to have our first hire in around two weeks.
This is also the first time I ever worked with CV or ML. I was about to hobble together some prototypes that impressed investors, customers, and some experts in our domain. I also fully realize I know just enough to be dangerous, so we’re putting this into some actually competent hands before we roll this out to production customers.
We’re bringing on a Senior Data Scientist in the next two weeks and looking at the possibility of bringing on a slightly less senior person to work with them in the next 1-2 months.
I need to give these people the resources they need to do their job.
My Current Development Hardware / Workflow
I managed to make use of my Gaming PC (8700K, 16GB RAM, 2080Ti, 500GB SSD – for Linux) for the work I’ve done so far. I’ve also experimented with some machines on Lambda Labs and AWS EC2.
I use a MBP as my primary machine (work tasks, development) and then use my linux “workstation” to actually run my experiments / pipelines. I want to re-use this paradigm (i.e. MBP as the “work” laptop, a remote server / workstation that allows the heavy lifting to be done).
The Hardware Dilemma
When we developed our financial models (before the silicon shortage) I made the assumption that we can just go high-end consumer hardware: Threadripper or 5950X, RTX 3090 for every Data Scientist to use as their workstation. We were leaning towards having somebody like Lambda Labs or System76 build the actual machine. We were also considering the Lambda Labs GPU Cloud service as an alternative.
Then the GPU, CPU, Storage, Everything Shortage hit.
This is actually problematic from both perspectives: I’ve been having a hard time getting the cheap ($2.5/Hr) Lambda Labs GPU Cloud instances. And an RTX 3090 is now $3090 (if you’re lucky!).
Which puts me in a weird position. I would have considered an A6000 at ~$6000 to be expensive before, but now it’s starting to make some sense seeing how it’s not terribly impacted by the market insanity yet. It has significantly more memory (40GB vs 24GB). That could give us more flexibility to experiment on larger models locally, without going to the cloud.
On the flip side having an A6000 for everybody is going to blow out the budget. So an ideal case would be to create a much beefier host machine running server / workstation grade hardware (ECC seems the main benefit here). I’d need to virtualize this. We have a few weeks to get this right but looking for guidance on the best way to do vGPU (i.e. hypervisor choice).
Closing
I’m aware of my vast ignorance here, I’m not going to make a decision without talking to our hire about his preferences / recommendations. I’m just trying to approach that conversation from a more informed position.