Fantasy gpu memory upgrade.. (inventing new hardware)

so im sitting seeing the limits of a perfectly decent in all things, bar memory, gpu’s…
which lead to the thought of.
is there a way to increase gpu memory externally from the card.
basically turn the gpu ram into a cache and store streaming assets in another block of memory on another plug in card.

i get latency would be an issue. if you were to direct access the ram and use it for compute.
but as an external buffer?. where soon to be used assets can be stored rather than keeping them in the gpu’s ram… might be doable?..

thoughts?.

You want to increase the amount of memory on the graphics card beyond what is put in there? Shared memory has probably existed since the days of AGP, RAM as additional memory for the gpu.

Latency and throughput will be a tragedy when we look at what modern gpu VRAMs offer.

1 Like

Like AMD with it’s WX pro cards? But DIMM sockets, instead of M.2 NVMe slots?

I guess, but it seems to me, when a card is held back by its ram, one would probably appreciate a whole new card.

Like, I don’t wanna be all about binning anything old, and for sure, an older card might still be good, even if it’s ram is small, but I would say to just make do with fewer FPS until one can get q new card, than the added complication of a possible variable extension, and all that goes with it…

AMD version:

Nvidia also had low level SSD access plans

1 Like

But of course you are on about larger buffers for screen frames, not for ML / large data crunching like they were on about

1 Like

for sure… but if all you really need is more ram, and the compute side is good enough.
it would be nice if we could just plug in more as needed :wink:

no mate. i am on about large data set crunching. specifically ai image generation.
the card i have will max out at 512x512 with 8gigs of ram. but the gpu usage is barely 20% yet the ram is maxed. :frowning:

yes exactly this :slight_smile:

thanks…

Ahh, then they are already thinking the same, hence the cards.

Which means you are deffo correct, else they wouldn’t bother with the complications

1 Like

That’s not the only issue unfortunately. If you start treating the GPU RAM as cache you get into a whole new rats nest of problems. The major one is cache hit ratio. A caching system requires an algorithm that allows to pick the right data at the right time and predict what’s gonna be needed and what can be flushed. Since a GPU workload has a much narrower computational scope it could be possible to engineer a simpler algorithm compared to the ones used for CPUs, but the sheer magnitude of data that would be needed to be analyzed would negate all the speed advantages.

There has been something similar and it was PhysX by Nvidia running on a different card. So a less powerful GPU handled all the physics calculation needed for the game (so the computing part you were referring to) and the GPU had “just” the task of rendering out the game.
On a side note, there has even been a piece of software that used the Intel iGPU to render out the vsync to speed up the performance in games.

1 Like

You need PCI-E P2P, which is blocked by Nvidia on all cards except their pro cards (Quadro/Tesla/etc.), and on the AMD side my impression is that it is mostly untested, barely documented, and possibly buggy, although that might not be the case anymore.

1 Like

Introducing the MSI RTX 4090 Ti SUPRIM X - USB

Eight additional USB 3.2 Gen 1 ports to expand your GPU RAM to new heights. Perfect if you need an extra bit of room to run those massive LLM.

I was just thinking about something like this today. For Nvidia, I would think they could take advantage of their NVLink/SLI links that work in addition to the PCI-E bus to give faster access.

There are already ML algorithms that allow you to use multiple cards to pool the memory and compute, pulling it away from the CPU that is being accessed by every device seems like it could be a win.

It may not make much sense if you are only going to use it for one card, but what if you have 3 cards? And you put a more general purpose CPU, like the Jetson or whatever the latest Nvidia ARM chip is, and let that handle the memory caching for all the GPUs.

And I wouldn’t do sticks, I would stick with the high performance GPU memory they are using today, I would just make one card that could go back to the Turing Generation.

Its not like this isn’t being done now with the latest accelerators, that is what the direct access to GPU memory from across the network is all about.

1 Like