Running local AI on AMD Instinct mi50 16gb, can it be done?

Sirheroguy · January 29, 2025, 3:52pm

Hello everyone!

I was browsing eBay the over day and saw that the AMD mi50 16gb cards are going for peanuts nowadays and a thought came into my head that these cards might be a good way to play around with AI. Only thing is I can’t really find any reports on using such a card anywhere.

Has anyone on here had some experience with these cards and would be worth the $160 bucks for something like AI or maybe a homelab?

yerodev · February 15, 2025, 5:22am

This is a reddit post of a person making a Mi50 server for AI and homelab use https://www.reddit.com/r/LocalAIServers/comments/1il5cde/new_8_card_amd_instinct_mi50_server_build_incoming/

They done/use before a Mi60 server you can see in their post history. Probably in a few days they’ll make a more detailed post about their experience using the Mi50s.

Sidiox · February 15, 2025, 9:10am

There are people doing this, I am currently testing out a config like this myself.
I’ve documented testing ROCm on my blog: Running LLama.cpp on ROCm on AMD Instinct MI50

Works pretty well even if AMD is already kinda cutting support for the MI50 (at least a warning triangle on their compatibility matrix…)

The more end-user friendly vllm doesn’t work with these cards though as far as I understand.

Online I see people reflashing them, and there are people discussing this on the koboltcpp discord as per: Performance of llama.cpp with Vulkan · ggerganov/llama.cpp · Discussion #10879 · GitHub

Schaka · February 22, 2025, 1:24pm

What are people reflashing them to?
I know they use the same die as the Radeon Pro VII and those work great in Mac/Hackintosh.

I’m trying to get my hands on one of them to see if it’s worth it for inference at a low price. Idle power draw is a big deal for me as well, because they’re supposed to run idle with models loaded 90% of the time and only fire up when I send some HA voice assistant requests.

Eonan · April 16, 2025, 3:08am

I am running two of them at the moment to play around with some of the distilled Deepseek and Gemma models using Ollama. They work great, just don’t try to virtualize them, they are bare metal unless you really want to try and fix the gpu reset bug. I am also running them re-flashed as Radeon Pro VII’s.

As for idle power draw, each card runs at around 20-25w while idle. I get around ~18 tokens per second running the 32b parameter model for Deepseek on Ollama.

I haven’t integrated it into my HA setup yet, that is on the to-do list. But for $110 per card, I’ve been having some fun.