3090 vs newer GPU options with 16GB. Performance in stable diffusion

I’m an absolute beginner in AI image generation. My current GPU is the 1060 6gb, so I’m looking to upgrade my system.

I see the 3090 at the most best budget option, but there is so little info. It’s mostly just gaming reviews, or get the 5090 peasant!

the thing is, there are loads of 16gb cards on the market now. Is the 24gb on the 3090, and it’s power use… A better choice? than the newer 16gb cards?

I’ll appreciate any input , as my knowledge is limmited

Hi, the 3090 is a great GPU and with lots of potential, but always depends on what a you going to use it for. To start, I recommend you this site and their youtube channel, there is a lot of information about GPUs, specialy regarding the 3090.
Digital Spaceport – Homelab Insanity – Ai, Software and Hardware Tutorials – Servers and Home Datacenters
Best Budget Local Ai GPU
The Perfect Local AI GPU? NVIDIA’s 5060 Ti 16GB Tested!

1 Like

Very simple:

I have a 3090 and have used sd-forge for a very long time, and believe me when I say that I am SO freaking glad I opted for more memory rather than speed when selecting gpu.

VRAM > speed any day of the week!
You would not even be able to run FLUX models properly with less than 20G.

1 Like

oh, Ty. I’ve been looking into a 3090 and have bid on one ,but the 3090 is from 2020 and the 9070xt is from 2025

the 9070xt smokes the 3090 in rasterisation, do you have the numbers for AI?

I sad to the other commenter on this post…
I’ve been looking into a 3090 and have bid on one ,but the 3090 is from 2020 and the 9070xt is from 2025

the 9070xt smokes the 3090 in rasterisation, do you have the numbers for AI?

No, but it’s common knowledge that cuda beats the crap out of any other framework, like rocm. The gap is shrinking, but the last I looked into it, cuda was about double speed as rocm.

If you want anything other than just chatting with a pre-trained model, cuda is more or less a requirement.
Sucks, but it is what it is…

The ~2x is the lack of an effective tensor core compete in RDNA3 and earlier. ROCm’s supported much of CUDA for some time.

GPU perf comparisons tend to be more variable across workloads and driver versions than CPUs. So really depends on the specifics of what you want to do and how you’re doing it. An obvious starting point for this thread’s Navi side would be AMD’s stable diffiusion.

I don’t know of any current comparison and the early difficulty of getting 9070s means launch data’s almost entirely absent as well. GPGPU performance typically tracks with rasterization, though, while deep learning tends to follow tensor capabilities.

In the absence of good comparisons between Ampere and RDNA4 the answer’s basically buy both and measure. For workloads requiring more than 16 GB DDR the R9700 is not cost competitive to 3090, 7900 XT, or 7900 XTX. But it is quite competitive to 4090 and 5090. B60s are unobtainium here but, if you can find one, Battlemage might be of interest as well.