Which GPU should I try with Stable Diffusion?

l1t2100 · January 13, 2024, 7:07pm

I have two GPUs sitting here and a desire to jump into the AI/ML world with them. My two main goals are to setup my own chatbot using LLM as well as Stable Diffusion to have some fun with images.

I have rack servers in case that matters, so airflow shouldn’t be an issue. Also plan to run the Linux instance that will do the AI/ML work under Proxmox with PCI passthrough. I am very familiar with pass through and not worried about any of that setup.

To the point, the two GPUs I’m trying to decide between are a NVidia Tesla P4 and a Radeon Mi25. The P4 I believe has 8 GB of VRAM, but I believe on only 7 or 7.5 are usable due to how they designed the card. The Mi25 has 16 GB of VRAM, but I believe it’s a slower card - that may be a false assumption though.

Size: I will need to be selective in the server I choose for the Mi25 as size can be an issue. The P4 is so tiny, it fits in anything.

Power: The Mi25 will be a power hog compared to the P4. Again, this is an assumption - as I’m new to this, maybe the P4 will take longer and consume more power for the same task.

Memory: I’m new to all this AI stuff, but from my initial reading, it sounds like VRAM is a major concern. This alone may force the decision to be the Mi25 for the flexibility the 16 GB will provide me.

Brand: nVidia seems to be supported with everything and everyone, while AMD can cause some additional hassle from my initial research. Personally I love AMD cards compared to nVidia due to those cards just working in Linux without installing 3rd party drivers from nvidia - this is why I abandoned nvidia as my primary video card years ago. But in the world of AI, I understand that nvidia is still king.

Any feedback, experience, suggestions, or even ideas you all could share would be very welcome.

Thank you!

Naelith · January 13, 2024, 8:37pm

the Tesla P4 is basically a GTX 1080 limited to 75Watts, mine idles at 21watts (according to nvidia-smi) which is surprisingly high imho.

the Radeon instinct MI25 which is limited to 110Watts in the stock bios, (I’ve seen it spike to 130watts during AI work loads) and mine idles at 3watts (according to rocm-smi), and if you are doing stable diffusion you will want more vram, so you can run bigger models or upscale to higher resolutions.

a benefit of the radeon instinct mi25 is that it has a fan header you can get an adapter for, and use

rocm-smi -d 0 --setfan [0-255]

to control the fan, unlike the Tesla P4 where you have to use a motherboard fan header or an external one.

and getting the radeon instinct mi25 working with things like comfyui is fairly simple these days, the biggest part is getting rocm installed properly, which is basically running a single command if you use ubuntu, and remembering the environment variable required to get rocm working properly

HSA_OVERRIDE_GFX_VERSION=9.0.0

Domrockt · January 13, 2024, 9:14pm

whats your budget? There are some “cheap” RTX a4000 16GB VRam on ebay… i bought myne RTX a4500 for 670€

RTX 3060 12GB are quite good too, with what you have in mind.

l1t2100 · January 13, 2024, 11:17pm

Thanks for the response. I do not have the Mi25 plugged in right now, but I have been running the P4 over the last 24 hours in the server, and it’s only consuming 5 watts additional based on the power meter I have hooked up.

If you had these two cards on hand, which one would you go with?

l1t2100 · January 13, 2024, 11:19pm

Thanks for the suggestions, but I’m mostly trying to use what I have on-hand right now. I got this Mi25 card about 6 months ago for about $50 and the P4 card for not much more than that.

Trying to play/experiment on a budget here.

Domrockt · January 13, 2024, 11:41pm

then give the Mi25 a chance there is a thread here Thread mi25-stable-diffusions-100-hidden-beast

i bet you will find comrades there

GigaBusterEXE · January 14, 2024, 6:11am

The p4 will just work the MI25 will have more power

Both will need additional active cooler via blower

shadragon · January 15, 2024, 3:34pm

I asked Google Bard the question: For the following GPU: NVidia Tesla P4 and a Radeon Mi25, which would be theoretically better for Stable Diffusion?

Answer:

Theoretically, for Stable Diffusion, the NVIDIA Tesla P4 would be the better choice compared to the Radeon MI25. Here’s why:

NVIDIA Tesla P4:

Ampere architecture: Designed for AI and scientific computing, offering better FP32 and mixed-precision performance, crucial for Stable Diffusion’s calculations.
Higher Tensor Cores: Possesses 4096 Tensor Cores compared to the MI25’s 2560, which significantly accelerates tensor operations used in AI tasks like image generation.
Larger L2 cache: Boasts 48MB L2 cache compared to the MI25’s 8GB, improving data access speed and potentially increasing performance.
Better CUDA and TensorRT support: NVIDIA enjoys wider industry adoption and has more optimized libraries for AI workloads like Stable Diffusion.

Radeon MI25:

RDNA 2 architecture: While capable, not specifically tailored for AI like Ampere, potentially leading to lower performance in AI tasks.
Lower Tensor Core count: Fewer Tensor Cores translate to slower tensor operations, impacting Stable Diffusion’s performance.
Smaller L2 cache: Limited L2 cache might result in more frequent data access from slower memory, potentially slowing down processing.
Limited software support: While improving, Radeon GPUs currently have less optimized AI libraries compared to NVIDIA, potentially affecting performance with Stable Diffusion.

However, it’s important to consider context:

Specific implementations: Different software implementations of Stable Diffusion may perform better on specific GPUs. Check benchmarks for the implementations you plan to use.
Price and availability: The Tesla P4 is typically more expensive and harder to find, while the MI25 might be more accessible.
Other workloads: If you need the GPU for other tasks outside Stable Diffusion, consider its performance in those areas as well.

Overall, while the NVIDIA Tesla P4 has strong theoretical advantages for Stable Diffusion due to its architecture, Tensor Cores, and software support, consider your specific needs and budget before making a decision.

Naelith · January 15, 2024, 4:46pm

This is literally a completely wrong answer, first off

FP32 performance is higher in the Radeon Instinct mi25 by like 100%
Both has 0 Tensor cores, it is refering to shader units of which the Radeon instinct mi25 has 4096 vs the tesla p4s 2560

It is honestly impressive how bad the answer is

EDIT: and it honestly only gets worse as you read more the answer

shadragon · January 15, 2024, 5:35pm

AI - You need to know the answer, before you ask the question, to know if it is accurate or not.