AI R9700 Pro - 32gb AMD GPU for AI and creative workloads -- Benchmarks

I would also like to see benchmarks for larger models. It’s very curious that most the content I’ve been able to find about these GPUs so far seem to be testing smaller models.

I suppose it could be but they don’t have it up on their certified hardware list (URL sluig /hardware-certification) yet, so YMMV

I did gpt oss 120b in the video, using windows, and every byte of vram on both cards. worst case scenario, and with an 8k context, it was still 115 tokens/sec

2 Likes

(perfect for high-speed endless seahorse emoji hallucination!)

…and that’s the problem with “watching” videos while doing 3 other things XD. Thanks for the call out

1 Like

@wendell I would like to play!

I have just started an AI Agent Company. I would be very into building a local AI system with 128 Gigabytes of VRAM if it could accomplish my goals. Rite now I would need it to host n8n for my self and my customers, I would need to run WAN2.2 in ComfyUI, I would need it to run Ollama for LLM’s and I would like to train Small Language Models that I can deploy, preferably locally, for my self and my clients.

The hardware I would be trying to use includes a Threadripper 1920X (24 cores) on a Gigabyte DESIGNARE EX (rev 1.1) with a Seasonic Noctua Prime 1600w Platinum PSW.

I have ATT fiber and an acceptable UPS that can keep things running while the backup Generator clicks on. And considering I have been watching you since you were hiding behind monitors on The Tek, I would not have an issue giving you remote access if you need it to run tests or play with it for a Level1 episode.

Let me know what yo think.

@wendell To frame this a different way: These cards are 2x price for 2x RAM. So why not just buy 2x whole 9070XT 16GB cards? With 2x more cores and everything. :smiley:

But, in practice, which is which is faster for that same $1300 spend? That would be an interesting real-world test.

1 Like

It has “PRO” in its name.

Yeah, US$ 650 is a steep upcharge for +16 GB GDDR and a power connector downgrade, closer to +$750 with an Ampinel. And pin balancing comes with awkward cable routing and an inability to slot multiple cards. Even at Nvidia pricing +16 GB is +$200.

64 GB GDDR from two R9700s on x8+x8 X870E is an interesting increase from the 32 GB of 2x 9070 XT but, even if you’re willing to take the melt risk, it’d still not be a very good buy at 35% below launch pricing.

1 Like

Has anyone been able to buy one? I’m seeing them out of stock basically everywhere

being the cheapest , new , single core 32gb gpu you can buy right now. not surprised.

Same reason why you’d buy a 16TB hard drive instead of 2x 8TB hard drives. There’s a nonzero cost of adding sockets to the system, especially once you go past 4 GPUs. If you’re targeting 128GB VRAM, it’s a lot easier to do with 4 GPUs than 8!

@wendell I wonder how dual of these compare to single 5090 in Metashape.

1 Like

Just bought one on Newegg business, they showed out of stock last night and I signed up for stock notifications for all models. Newegg business is enforcing 1 per account limit right now though.

3 Likes

Valid points of course. All else being equal, more memory per card is better. But here a closer analogy is 16TB HDD vs 2x 8TB SSD. For a hypothetical low end $1299 GPU budget, 2x 9070XT does have twice the compute resources vs. a single R9700.

Is that an actual advantage given the segmented memory and, on AM5 at least, dropping to a 8x Gen5 interconnect? Plus the headaches of more parts?

The answer will certainly be workload dependent. Hence the benchmark request. :smiley:

@wendell Are you allowed to benchmark the R9700 vs 9070XT or is there a “suggestion” from your AMD reps to not cross compare vs. the lower cost consumer parts?

I am sure I could benchmark 2x9070 vx 1 R9700 but what precisely do you want to see? a MoE ai model is likely to perform okayish where the layers can run on each card more or less independently but other ai models will run like crap because the pcie gen5 bandwidth isn’t that much – only 64 gigabytes/sec. Which is glacial compared to ‘local’ on the card.

1 Like

FWIW I’m mainly interested in

  • CNN hyperparameter tuning and layer structure optimization, where training different topologies in parallel on x8+x8 X870E seems reasonably anticipated to be close to 2x speedup
  • how well 24-pin and EPS grounds can handle the potential ~630 W of return current from 2x 9070 XT (pretty close to same load as 3x 5090)

Not that I can’t torch on H100 or H200 but local workload capacity’s useful too and RDNA4 might be at or near a low point if GDDR prices also explode.

What kind of specs would be required to let 1 or 2 of these cards stretch their legs in AI inference and AI image generation workloads? I’d love to slap 1-2 of these in my current homelab setup but I’m not sure it’s up to par. (i9-10900k, I can always buy more RAM) Wouldn’t want to have to buy a threadripper system (yet). At a minimum I’d guess you need the RAM to be able to load MOE models into memory.

For the 2x 9070XT vs. 1x R9700 tests on AM5 a similar mix of AI/LLM inferencing and creative workloads as the original video would be ideal. But a subset is fine too. Let’s see what works and what falls over. Plus what the overall system cost/value looks like.

As an aside, I noticed a couple gfx12 related fixes hitting the ROCm repos. Dunno if they all made it in to the 7.1.0 release.

A follow-up, and possibly painful, side-quest might look at how well 9070XT/R7900 scale when all connected to an external 100 lane Gen5 MCIO switch. Such as this or similar. Talk about a chonky external GPU rig. :rofl:

Really interesting results — thanks for sharing all the charts. The most surprising part (for me at least) is how well the dual-R9700 setup scales for AI workloads when using the Vulkan backend. Hitting ~150 tokens/sec and saturating both cards’ VRAM definitely puts it into practical territory, especially considering the price gap vs 5090-class hardware.

The creative-app benchmarks make sense too: great performance until the compute-heavy nodes (like Nuke’s Defocus) hit the architectural limits. Synthetic benchmarks still favor Nvidia as expected, but the real-world tests here show AMD’s newer cards are finally becoming a viable option for mixed AI + creative workflows.

Also nice to see that ComfyUI/ROCm setup has gotten smoother — the last few weeks of updates seem to have made a big difference.

Appreciate the deep dive and the dual-GPU scaling numbers. Very helpful.

1 Like