< 96GB RAM on 48GB VRAM?

I have read there is a rule of thumb for machine learning we should aim for the system RAM to be twice the VRAM but with AM5 and my desire to run multiple VMs simultaneously this leaves me with a dilemma.

My upcoming PC build plan includes dual 3090s, so that I have more options for LLMs, but I have a CS background and I am interested in exploring many areas of AI, including but not limited to backtesting and modelling or anything I can relate to a passion I have developed for algorithmic trading, economics, and finance.

The same system will ideally run simultaneously a host, and 3 VMs:

  • main daily driver
  • basic, light, for isolation and some web use
  • Windows gaming

It has been reported in DDR5 4 dimms on am5 – what’s working, what’s not? that 4 x Kingston 32GB ECC [KSM48E40BD8KM-32HM] can achieve speeds of 4800 MT/s after tuning but with 2 x 48 GB there are accounts of speeds exceeding 6000.

Will a faster 96GB total be enough for what I want to do, or should I prioritise having slower but more RAM @ 128GB?

To help you better, We’ll need some more details about your intended setup and usage scenarios. Specifically, what hypervisor do you plan on using for your VMs? What games will you be playing, and at which resolution? Which CPU are you leaning towards?

The myth that RAM should be twice the VRAM is just that – a myth. There’s no hard rule here. The right amount of RAM depends on your specific needs.

As for ECC memory, it’s really only necessary if you’re running workloads that are sensitive to errors, like database servers or scientific computing. For most users, non-ECC is sufficient and more cost-effective.

When it comes to RAM speed and capacity, bandwidth and latency are key considerations. You can achieve higher bandwidth with DDR4 on a 4 or even 8 lane setup instead of the standard 2 lanes on AM5.

To summarize, I’d recommend you prioritize a balance between total RAM capacity and speed. Aim for a middle ground that meets your needs without overspending on excessive amounts or overly fast memory that won’t provide much additional benefit. Let me know if you have any other questions!

2 Likes

Sure, and thanks for the input. I am going to answer partly by quoting from a PC Build thread I started a few days ago for my first VFIO/ML system, and link to it for reference: Black Friday 9950X3D ML, LLM, & gaming build

Or not at an inordinate cost to productivity generally. What I am trying to say here is that I would like to avoid bottlenecks as much as possible and get the best possible gaming performance from a single 3090 and X3D combination while gaming - once it does not come at the cost of disproportionately impairing productivity. e.g. to strike a balance where 1% or 2% extra FPS does not cost me a much greater loss than this in productivity.

quilt has explained that the third GPU is not required and I could pass the iGPU to it but this helps explain the usecase:

I am interested in using Proxmox, not least for the option to delta/incremental backup and sync VMs across systems in future.

At least for everything other than gaming:

I found this article useful for comparing frequency across workloads, including ML:

Glad to hear that. I am unsure exactly how much RAM is normally required for the host, a basic VM3 as I describe, but I assume 16GB is normal for Windows gaming VM2. The remainder from 96GB could be used for the main VM1 and ML.

I would prefer faster RAM but I had thought that I should safeguard against an error being propagated from development on my desktop, perhaps when working with a large dataset, or something used for a trading algorithm, to when later deployed with capital in the cloud - or am I thinking about this wrong?

Also from the build thread supporting non-ECC:

I would like to stick with AM5 for the upcoming 9950X3D.

I have heard 6000 is a good target for AM5 DDR5 even when the RAM is rated higher, at least for gaming. Does that apply for ML, and would a good strategy to be to aim for the best priced >=6000 rated 2 x 46GB RAM from the QVL?

Are there any special considerations for the motherboard? Originally I was considering the ProArt b650 as it seemed widely lauded for VFIO but when I thought I needed 3 GPUs also that 2 x 3090s would run better at PCIe 5.0 x8/x8 than PCIe 4.0 x8/x8, I was leaning towards the Proart x670e. In the build thread some other boards were mentioned including: Asrock Taichi x670e, MSI carbon x670e, and the Asrock b650 Steel Legend.

In terms of storage I was considering 3 x 4TB WD BLACK SN850 or Samsung 990 Pro, whichever is better priced. One for each of VM1 (Main, ML, daily driver), VM2 (Windows, gaming), and another just for model storage. I was considering the host and VM3 (manual trades etc. only) could share something like a slower cheaper Solidigm P41 Plus 1TB, unless the host running on a slower drive would slow the VMs is launches.

There is so much to unpack here. Unfortunate as it may be, you want a lot from one machine, and unfortunately, there are some misunderstandings about how things work. I don’t have time now but will try to unpack this for you tomorrow. 2 AM now :sleeping:.

Proxmox doesn’t require a GPU at all, nor do LLMs. In fact, both can be run headless with access through a web interface. I doubt you can run gaming and AI simultaneously effectively; even if you could, it would probably cause inconsistent performance due to CPU bottlenecks.

A 3090 Ti should be more than enough for 1440p unless you require ray tracing (RT).

PCIe 4.0 x8 lanes shouldn’t max out the card’s potential, but RAM bandwidth on AM5 might still be a limiting factor. All these factors need some calculations that I don’t have time to do right now.

In the meantime, consider looking into Threadripper CPUs; they might be better suited if you want to do so much at once (which you likely won’t). Try to understand about CPU-PCIe lanes and CPU-RAM lanes—what your motherboard provides isn’t always what the CPU can handle.

Running games on a Windows VM might trigger some anticheat systems, though there may be ways around it, they are not easy.

You might find it better to split this into two machines: one for gaming (which doesn’t need to be as powerful) and another dedicated LLM server using older business-class parts and a few P100 GPUs. For the gaming machine, you could dual-boot with Linux for privacy.

2 Likes

Sorry, yeah I know it is quite complex.

Running LLMs headless is something I considered some weeks ago when I was trying to work out the best use of resources. I really would like to make full use of the 2 x 3090s for everything, whether one is just providing display to VM1 which will be my daily driver, or both are assigned to the same VM because I want to use them to load an LLM or for some other intensive task or ML, or one is temporarily assigned to the Windows VM2 for gaming.

I should make clear I do not expect to be perform any resource intensive AI tasks while gaming, but I do want to be able to seamlessly at a glance to my second monitor or through a key combination switch between VM1, VM2, and VM3.

Can I assume that in the same operation where I move 1 3090 from VM1 to VM2 for gaming I can just as easily remove the 275W power limit so that the Ti version is not negated? Shame about RT, if I was just building for gaming I would go for a single used 4090 but alas VRAM. Getting a good deal in the used EU market is not easy but if I can get 1 or 2 Ti versions I will. If the market is really bad I could import from the US and pay customs.

I had assumed that when using a flagship CPU like the 9950X3D in a board, that where the PCie lanes and speeds are enough to support the GPUs that would be enough but please tell me about the other calculations.

I am aware of Threadripper but I really want to stick with a single AM5 for X3D though when I thought I may need 3 GPUs I was considering a fallback of dropping VM3 (trading) to run on a light MiniPC. From the discussion in the other thread and what you have said I should still be able to manage 3 VMs running simultaneously with just 2 GPUs and the iGPU?

Right now I am considering the Asrock Taichi Carrara x670e as apparently is provides adequate spacing for the GPUs, ease of use with VFIO, and great IOMMU groups, but if something like the Tacihi Lite b650 other than 2 x PCIe 5.0 has the same benefits I am happy to save money with it.

Yeah I am aware of this, but right now separate to how native linux gaming support has really come on in recent years if something requires Windows I do not reboot into it because I want access to everything else in linux, so VFIO will be a big upgrade to my experience - even if many games have anticheat issues.

Edit: I guess block diagrams like this for the Asrock x670e Taichi is what you are suggesting I consider in terms of lanes:

I think you’re heading a bit off track with your current approach.
Trying to figure out what you actually need instead of just what you want might help us get somewhere.

A few assumptions about you (feel free to correct me) :

  • You’re not a masochist ; and want a smooth computing experience :wink:.
  • This isn’t just a flex and money is somewhat important to you.

From your posts, here are my thoughts on what you really need:

  1. A secure way to manage your finances online.
  2. A powerful machine for work and playing around with Large Language Models (LLMs).
  3. The ability to play non-competitive games at 1440p with frame rates around 140Hz.

Let’s start by saying the path you’re on is going to be tough and painful. VFIO, Hypervisor, multi-monitors, multiple GPUs, multi CCD CPUs, and different OSes all mixed up? It’s not impossible, but it will likely mean months of troubleshooting and reconfiguration. You might even end up scrapping your entire build!

So let’s try a simpler approach:

  • For Trading:

    • A nice laptop running in safe mode with BIOS password protection, Safe Boot enabled, full disk encryption, and a secure, long-term support (LTS) Linux distribution.
  • Main PC/Workstation:

    • High availability is not necessary here since your trading is secured on another machine. You can have fun here without worry.
    • Run VMs and containers from the main Linux system or even set up Proxmox if you want to go all-in, but it’s probably overkill. (24/7 is what Linux is build for)
  • Build:

    • Go for a Ryzen 9800X3D with all cores on one CCD for ease and smooth operation (Better single-thread performance and i do not see significant need for more multi-threaded performance).
    • For gaming, consider an RX7900XTX (24GB). It’s faster and more efficient easier to get and less expensive than the 3090 TI and has full ROCm support for AI tasks. If you need more VRAM for AI, get another 7900XTX or a couple of 16 GB version of RX 7600XTs.
      Gaming on Linux is extremely easy now days with over 10k titles.
  • Storage:

    • 4TB is plenty unless you’re storing media. For that, consider 2x16TB spinners in mirror mode for speed and redundancy with half the total capacity.
    • Use Btrfs snapshots for incremental backups; they’re super fast and easy to set up.

This way, you keep your finances secure (and maybe even store Laptop in a fireproof safe), while having more time to play with AI and games!

So how does this simpler, more convenient, fun, and possibly cheaper approach sound?

2 Likes

On that diagram, you can see why I mentioned Threadripper. AM5 only has 2 channels for memory and a mere 16 PCIe lanes shared for multi-GPU setups. It’s great for general computing and single-threaded tasks, but once you step over those boundaries, you’ll quickly hit a bottleneck. That CPU just doesn’t have enough pins/connections to cut it in advanced (workstation-level) scenarios. And that’s why the 9950X3D doesn’t make much sense either.

4 Likes

Not a flex, I can afford this but despite my interest in finance this build will still be a significant spend for me, not least why I feel under pressure to take advantage of Black Friday savings before I have the used cards and CPU I was considering.

Please correct me where my understanding is wrong:

Can you please explain what part of the system I propose would be limited by dual channel memory, is it that there are 3 VMs, and 2 VMs would actually be okay?

For normal use my main VM1 (daily driver, any ML etc work) will use both 3090s across 2 x PCIe 5.0 x8 lanes (in the case of Taichi X670E), when gaming 1 of the cards and slots will be assigned to the Windows VM2. VM3 for trading (unless I instead use a MiniPC or laptop as you have suggested) does not use any of the lanes, and just the iGPU.

Is there a rule of thumb to compare requirements to what is available, or a metric where we can say my proposed system requires X but the hardware only supports Y? This would really help me understand exactly why say this AM5 setup could not work but Threadripper would, or why a modified version of my AM5 concept could work.

I would have thought a lot of people are using dual 3090s on AM5 for ML and other intensive tasks so I am trying to understand why running 1 other VM (trading, light basic web use) while doing so, or 2 VMs while not doing so (limit VM1 to basic everyday tasks when gaming with VM2) causes a problem. Same question on the rule of thumb and comparing X to Y.

With you so far.

I definitely want to avoid scrapping the entire build and in fact I fear if I order the wrong components that could happen.

This is a sound plan and I appreciate you taking the time to create it, but it is a ways away from what I had in mind, but perhaps there is a workable compromise incorporating some of it?

As mentioned, open to migrating the trading VM3 workload to a MiniPC or better a laptop, or even my existing laptop setup with dual boot if it cannot manage virtualisation. Though, unless what I have asked about above makes clear VM3 will just not be possible on the new build, I would like this to be what I use for trading in the beginning and use it as a fall back option if VM3 does not work out.

As you mentioned 24/7 I should clarify, where I refer to some of the VMs being always on, I just mean when the system is powered on, i.e. whenever I am using it VM1 and VM2 are running, but I am not intending the system to always be powered on, maybe WOL would be cool though!

I know there were are issues with the 7950X3D, v-cache just on 1 CCD, and infinity fabric delays, but the 9800X3D although just a single CCD, and the 9950X, are a step forward. Is it not the case if I were to pin v-cache cores to the gaming VM2 and split the remaining cores between the host, main VM1 and light VM2 there should be no issue with scheduling etc.? I had hoped with the 9950X3D I would benefit from the v-cache for gaming and the core count for running multiple VMs.

I really would prefer to stick with CUDA for AI. Although I only have very limited experience using it and none using AMD, I have read through many discussions comparing the two and although there are always some recommending AMD for AI the overwhelming majority say it is not as compatible, where it does work it can take a lot of extra configuration, and it is just easier to work with CUDA as that is what most tools were designed for.

I have dabbled with BTRFS and Timeshift and despite some past problems it does interest me.

The compromise I am suggesting comes down to taking this in steps. I do not need to get all of this working immediately, and again this is only if your answers to my previous questions do not completely tear down any possibility of this working.

I have an old system to use. While waiting until I have all components for the new build I can start by configuring my laptop for trading. When I am ready to build I could start by just getting Proxmox setup for the host and the main VM1. Then when I am happy with how that is working I could get the windows VM2 running, and see how well I can make it dynamically use 1 of the 3090s normally pinned to VM1. Then finally, only when everything is working smoothly I broach using the iGPU for VM3 trading.

So I see you’re go with the laptop—great choice! It’s not about being unable to do it on a main workstation; it’s all about keeping your primary machine secure and dependable while you experiment with the other.

I like to warn you about dual-booting. :warning: Microsoft periodically, tends to “ACCIDENTALLY” destroy dual boot configurations. Not to mention, it generally isn’t as secure or reliable, and very limiting OS (you get my bias here :sweat_smile:).

Now, speaking of virtualization: unless you have a top-of-the-line :face_with_monocle: server CPU with some fancy features, the host system has full access to all VMs. That’s why you don’t want to run anything unsavoury on your host. If it gets compromised, game over.

I’d recommend checking out :tinfoil: Qubes OS and its security model. You can implement parts of it in your setup or workflow if you’re serious about privacy and security.

Even on a 9800X3D, you could run hundreds of VMs (it’s not a joke). The limiting factor here isn’t the number of VMs but rather the connections between components.

THE BOTTLENECK :bangbang:
When you start cramming high-end hardware into your machine, you’ll eventually hit one. Servers have up to 8 lanes for RAM because it’s crucial for bandwidth. Think of RAM as Level 4 cache; all that happens on a PC usually goes through it (unless the chipset offloads some lesser tasks). If you want performance, you need fast connections between components.

  • CPU and CCD: Connections between CPU cores can become bottlenecks.
  • CCD and I/O Controller: Both CCDs go through an I/O controller, adding another bottleneck.
  • Confused process will just make it worst.

Any of those 9-series CPUs will suffice. Most computing tasks will be offloaded to GPUs anyway. If you’re gaming at 1080p, a CPU maxes out around 300 FPS; at 1440p, you’re GPU-limited (bottleneck changes depending on what you do). Those GPUs are PCIe 4.0, the whole bus will be limited to that speed. If you go for two GPUs at 8x4.0, they’ll likely bottleneck each other.

:face_exhaling:In reality, it’s a catch-22: something will always bottleneck. The idea is to create a system where you don’t spend money on features you can’t fully utilize. You can redirect that budget toward removing bottlenecks, and the madness just continues. I’d love to help you with these issues, but I’m reaching the limits of my knowledge. In real life, it’s all about testing and more testing…

About CUDA vs ROCm: The field has changed a lot, so most info out there is just echo from the past. Running ROCm on 6950XT was seamless; just a couple lines in Fedora terminal—install Ollama script and ROCm script—and it works flawlessly with no issues at all.

With some testing, I’m running 30B models on my 16GB card with a small offload to my 7800X3D (that fat lazy bastard mostly idles any way :rofl: ). It generates text faster than I can read it!

To wrap it all up:
With what you’re going for, I don’t believe you will need to scrap the whole build. If you’re adamant about AM5, go for it. The biggest problem will be multi-GPU performance; at least you’ll have VRAM and it will work! Just not at 100% load.
If you plan on using VFIO and IOMMU, you might want to invest in hardware KVM switches. Wendel has a great one for that one-button switching (monitor, mouse, keyboard). You might need USB sound for easy switching.
And with a dependable laptop, you’ll be able to mess around with configs and testing, changing and braking stuff as needed.

:smile: It’s an exciting build; make a decision and pull the trigger. Don’t let analysis paralysis kick in. Finding ways to make it all work will be a lot of fun!

2 Likes