Analysis Paralysis on an AI/ML, Personal/Work, do a little bit of all the things PC - $10k budget - Midwest, USA

Intro and Relevant Information

Hello everyone! first forum post, so be gentle. :slight_smile: I’m working on upgrading my entire home office and a core part of that upgrade is switching back to a desktop PC from almost a decade of operating off a laptop and docking station. Professionally I’m a consultant focused primarily on the Microsoft Cloud. We’re talking Azure and M365 for those who are familiar. I have a home lab for a lot of testing and demos as well as running multiple services for home. I’m quite comfortable in the Debian and RPM-based Linux OSes, though admittedly, almost exclusively on the server side. I haven’t run Linux as a desktop OS on my primary machine ever. My plan for this workstation is to run Pop!_OS or Fedora 40 as the main OS with a Windows VM for the apps I need for work (Office, Teams etc) so that I can stay off Windows for my primary OS. I will also keep my current Macbook on standby in case something doesn’t work and I need to pivot quickly while I work through the transition.

Existing Hardware/Software

The only things of real importance is my audio interface which is a Scarlett Solo Gen 4 with an XLR mic and studio monitors and my USB capture card, the Elgato CamLink 4k. Looking around seems to indicate that the Scarlett Solo is just fine in Jack, Pipewire and PulseAudio. The CamLink 4k seems fine as well from what I can find. Everything else in my environment is either native linux or otherwise fully compatible as I’m already using it with other Linux hosts.

Capabilities for the new system
With the background out of the way, here’s the general things I need the new system to be able to do comfortably with plenty of headroom for me to grow into and expand in the future:

Expand for details on capabilities

Software Development
This is probably the most vague and poorly defined of all the requirements. I dabble in lots of different things but I’m working on getting a bit deeper into “AI” which could be LLM, ML or Generative AI depending on what catches my interest in any given week. I would guess that 80%+ of the time I’m running existing models for various experiments/tasks for my own learning and fun. I have used OpenAI’s APIs for integration with software, but the bills tend to grow quickly as I’m figuring things out so I want to run something local with 7-8 billion parameters which should fit fine on 24 GB of VRAM based on what I can find. I also do a bit of Web Development with Node.js, React and similar. I develop scripts for various purposes in Python, PowerShell and I make judicious use of Docker containers and VMs for testing/validation of lots of different technologies.

Virtualization
Windows/Linux VMs are the primary use case. Many of these machines (with the exception of the persistent Windows Machine) are spin up, test something for a couple hours, then delete type workloads. Persistent VMs will continue to run on my Proxmox server. I plan to have a dedicated network port for my workstation itself and a second port dedicated to VM traffic on a separate VLAN. Given the below hardware that I’m looking at I’ll probably use the 2.5 G port for my workstation and the 10G port for VMs. Most of my work is done in the browser so I don’t really need the 10 G connection for my day to day requirements and the VLAN configuration is much simpler for the VMs if they feed directly into the 10 G switch rather than the 2.5 G.

Other workloads
I do a little 2D/3D modeling to create parts/art for 3D printing, Virtual TTRPGs (Foundry VTT) and misc other things that come up. I use Autodesk Fusion 360 for all of my 3D modeling and will test this in a VM, but most likely keep a second boot device to boot into Windows directly to run the software. Same with Photoshop and other Adobe tools. I plan to move away from Adobe as soon as my subscription runs out and will probably play around with Krita, Inkscape and GIMP ahead of time to relearn shortcuts, tools and figure out a new workflow. I also use DaVinci Resolve for some light video-editing, OBS for recording “How-To” type videos and virtual camera for MS Teams.

Front-runner system
Falcon Northwest Talon

  • Power Supply: Seasonic PRIME TX 1600 Watt
  • Motherboard: Asus Pro WS TRX50-SAGE WiFi 7
  • Processor: AMD Threadripper 7960X (24-Core)
  • Memory: Kingston FURY Renegade Pro 128 GB(4x32GB) - 6000 MHz ECC
  • Video Card: NVIDIA GeForce RTX 4090 Founders Edition - 24 GB
  • Primary System Drive: Crucial T705 2TB - PCIe Gen 5.0 M.2 SSD
  • Windows System Drive: Crucial T705 2TB - PCIe Gen 5.0 M.2 SSD
  • Total with Tax (5.5%) and shipping: $9558.21

Is this overkill for what I need given the above specs? I’m 99% sure the answer is yes. My primary concern is making sure that I can add/upgrade GPUs down the line as I do more in the AI/ML space and improve my capabilities in all of the above areas.

Here’s the main questions I’m trying to answer for myself before I click buy:

  1. Am I making any blatantly dumb decisions?
  2. Is a 4090 a poor use of money for the described AI use cases? Would I be better suited getting a Zen 5 Ryzen 7 or 9 instead of the Threadripper and getting a different GPU? Should I have a second GPU to drive my display so that the 4090 can be dedicated to the AI workloads?
  3. Am I going about this all wrong? Should I get a more modest desktop PC and spend the rest of the budget on a server platform with GPU for these workloads?
  4. Are there other things that I’m missing that would offer better value overall?

I haven’t built my own computer since 2011, and honestly, I don’t really want to put in the effort to get myself up to speed on what’s changed since then and deal with the hassle of sourcing all the parts, assembling, cable managing, testing, tweaking and the whole song and dance. I’m happy to pay the premium for someone else to assemble and warranty the machine for 3 years.

Hope that’s enough information to start the conversation, I’m open to whatever crowd-sourced wisdom you’re willing to bestow upon me. I’d prefer to order by September 6th as I have a dedicated week off to get everything dialed in Sept 30-Oct 5 and that should (hopefully) give enough time for build/shipping of the machine.

I think you are on the right track. What you are essentially doing is building a virtual computer inside a physical one, as such you need the system resources to split between the two machines. With that being said, I might opt for the 7970X over the 7960X, but that is just my preference, and it is the CPU I build my workstation around. Outside of that, I might opt for a larger 2.5" SSD for the secondary drive and go for a 4TB+ size so as to give your Windows a VM it’s 2GB then have a second partition for additional storage on that drive.

In all honesty, I was thinking 2 drives so I could do a full passthrough of the Windows Drive and use it in QEMU completely dedicated to Windows. I’m on a 500 Gb drive right now in my MBP and I dual boot Fedora 40 and MacOS. 350 Gb goes to the Mac Partition and I’ve got almost 100 Gb Free space and 150 Gb to Fedora with ~ 100 Gb free there too. I store next to nothing locally, it’s all on the NAS. I considered a large 2.5" SSD like the Micron 7.5 Tb as I’ve seen multiple places talk about fast bulk storage for training data for AI, but I don’t think I’m at that stage yet so I ultimately decided to skip it for now and add one on later if that becomes a need. Going from 500 Gb to 4 Tb locally feels like an unfathomably large amount of storage at this point.

On the processor, I really debated on that same choice for at least a few weeks, but I’m not sure most of my workloads will really see much benefit from the extra cores and most test machines that I spin up will get 2 maybe 4 cores and run for 6 hours at most. I’m typically demoing detection rules for customers, so it’s usually a cloned Windows machine, run a demo attack, discuss with the customer and shut down all within ~2 hours. I’m usually doing that on my Proxmox server, but some of those demos break RDP so I end up showing it in the browser and I figured if I have the hardware, may as well just do it locally to make sharing a touch easier/nicer looking. Virtualization is the primary reason I went with the 7960X, but I figured the slightly higher base clock would be better for most of what I’m going to use it for. Not sure if I’m off-base with that thought though.

I understand not storing much locally. Like you, there is a reason I have a 28TB of network storage. But I learned a long time ago the value of local storage. Any number of applications can require it for any number of reason. At the moment, I am doing a lot of Blu-ray/DVD/CD ripping and encoding to get that on my NAS. When you consider a Blu-ray disc can hold 30GB of data that has to be ripped in format and encoded to another (now doubling the file size) plus the actions of ripping and encoding are going on simultaneously, that eats into system resources. My point is you never know what workloads you are going to have in the future and no one has ever complained that they have too much storage.

I think the machine is overkill for your specified workload.

A 16 core/32 thread Ryzen7950/9950 or the equivalent EPYC 4xxx platform should suffice. The main reason to upgrade to Threadripper are lack of PCIe lanes/slots and/or memory bandwidth/capacity. I don’t see a strong indicator that would prevent you from potentially saving $5-6k.

The main risk is memory capacity - you are looking for 128GB which can be done on Ryzen, but only at slower speeds (3600MT). Alternatively, you can run 96GB (2x 48G) at full speed.

The system you spec’ed looks awesome, but also costs a pretty penny. If you can/want to afford it go for it.

4 Likes

Totally agree. The primary reason I moved up to the Threadripper was to have more than one PCIe slot that supports the full x16 bandwidth. There’s a strong possibility that I’m overthinking this, but it seems like I would be losing a significant amount of performance if I put two GPUs on a board that supports either x16 or x8/x8 if both slots are populated. I’d prefer to go with less cores and a significantly less expensive CPU/Motherboard if I can still run two GPUs at their full speed. Am I misunderstanding the requirements for two GPUs or could I run say, two 4090’s at full performance with the ROG CROSSHAIR X670E HERO with a 9950x CPU?

I think the threadripper is overkill as well but i understand the bandwidth thing.

pci-e bandwidth is only important if you want to train models. for running larger models it doesn’t make a difference. The stuff is loaded in VRAM and then calculated from there.

So for running models you are better off with a 9950x and 2 4090s in an 8x slot. Or maybe even 3 of them with the 3rd in the 4x slot.

1 Like

Thank you for the explanation/confirmation. I’m glad that I wasn’t completely off-base both in thinking that the Threadripper was overkill for my workloads, as well as thinking that the slots would offer less bandwidth by adding multiple GPUs.

I think that I’d prefer the flexibility and upgrade options of having more PCIe lanes even if it is overkill for my current workloads. I have a tendency to find ways to use resources if I have them and having the flexibility to add multiple GPUs at full bandwidth may just encourage me to start learning about model training even if I’m mostly clueless right now.

I started this whole thing with a budget that I was comfortable spending on the upgrade so while the savings would have been nice and maybe let me sit on the excess to scoop up a second GPU when they eventually come down in price after the next generation releases, I think I’ll just stick with the TRX50 platform and find new and exciting ways to use the extra cores. That’s not at all to devalue the advice provided here, I do genuinely appreciate everyone taking the time to explain their entirely valid reasoning that I agree with.

Knowing my own tendencies, the 2 PCIe 5.0 x16 slots and the 4.0 x16 slot are going to provide me more flexibility for upgrades and changes in the coming years without having to worry about upgrading the CPU/Motherboard at the same time. Plus, if I were to go with either of the AM5 options that Falcon has available, I’d likely want to add a 10G network card to separate the VM traffic anyway which will put me at 20 lanes consumed of the 24 total lanes on the Ryzen 9. Any upgrades past that will require taking half the lanes from the 4090. Having the room to grow and expand feels like a better choice right now, and admittedly, some of this may be spurred on by regrets from my last desktop build where I thought I had plenty of room to grow, only to realize 2 years down the road that I needed a CPU/Motherboard upgrade to add the connectivity I ended up needing.

Knowing that, anything come to mind in terms of weird “gotchas” with linux on this platform that I should prepare for?

2 Likes

I’m in a similar boat, the only thing I’d say is how sure are you that you wont be fine tuning or training in a few months? Having to do a new build because you’ve found you need 2-3 cards at x16 could really cost you a lot, either in rebuild or on demand instance costs. Getting a dev AI machine to test and setup things before a production run on instances is s tricky business.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.