Necessary hardware to train or modify an LLM

I’m aware that a previous regarding this topic already exists.

I don’t plan on training an LLM from scratch so I’m hopping that I’ll be able to train/adjust the parameters on an LLM slightly without needs 40+ nvidia a100 GPU’s.

Here’s the game plan. Get an LLM to sound like me (should be relatively doable) using the same tech as Dave (see link below), and then have the LLM listen to my IT calls. The end goal would be to have the LLM handle easy calls (like password resets or locked accounts).

Could such a think be done using consumer lvl hardware 9800x3d, 64GB or non ECC RAM and an AMD GPU’s 7900xt or the newest 8800 series? Know that Nvidia would be the better choice for that but I don’t want to give Nvidia any $. They have more than enough.

Was thinking of potentially getting ECC RAM to reduce the chance of errors while data is stored in memory.

All input is welcome

definitely no way this ends poorly

Feeding it ChatGPT answers?

As for actual hardware, you just need a ton of VRAM and system memory.
Reboots will take literally an hour at best from an M.2 to reload the data sets.

But it is doable.

Biggest issue will be parsing client input.

The LLM would need to be able to understand different accents and deduce what account needs to be reset. Chat GPT won’t be of use here.

1 Like

I’ve got 64GB of DDR5 RAM 6000 MT/s. It’s not ECC, but I’m not certain if that will have an impact on training/modifying the LLM.

For GPU’s I’m considering using AMD GPU’s, possibly the newest 8800XT when it’s released.

Also this is a long term project, don’t plan on having anything build and working till the end of next year.

I could however start collecting the calls and process them later on.

Thoughts?

Just like RAM is an order of magnitude faster than NVME, VRAM is an order of magnitude faster than CPU RAM.

You need gobs and buckets of VRAM to parse inputs and parse using LLM’s in pseduo real-time.

Home automation is one thing as there’s 1-4 training sets (each household member) but what you are attempting will take an order of magnitude more hardware.

Would love to see it, but understand you’ll effectively need a crypto mining rig with more system RAM and processing power to pull it off.

1 Like

You seem to be talking about 3 different parts in there.
One would be a model that does Automatic Speech Recognition (ASR, use those keywords to google a bit more about it), this would be used to actually convert those calls of yours into text.

This would be the part that’s relevant on understanding accents and whatnot. It should be doable to fine-tune one of the ASR models on consumer hardware, although I’m not sure how hard it’d be on your AMD GPU.

The second part would be the LLM, which you can pick the one you like most, you should be able to run models up to ~40B parameters in 4-bit with that GPU of yours (maybe less depending in your context size), and you can either do some Lora fine-tuning on it, or you can go for a smaller model (in the 5~10b range) and do a full fine-tune.

Lastly you’d need a text-to-speech model (TTS) that you would fine tune in your voice.

ECC ram wouldn’t be that useful in here to be honest.

Your RAM speed is not that relevant either. The hardware you have is fine.
Only thing you could improve would me moving into a 3090/4090, or grabbing more GPUs.

I’d be genuinely surprised if a single 4090 could do what he’s attempting in near real time.
I’d say a 4090 or larger to parse ASR
Any high end card for the LLM
A mid to high end card for TTS

But we’re already talkin $3-5k in GPU’s alone
Now you need something with enough PCIe lanes to keep data flowing

and if you’re replacing an employee it better have ECC, stable enterprise grade processor, and all the stability of a server.

Quite the investment, but cheaper than a single employee per annum.

I’d say try to load it up for proof of concept on a high end gaming rig now then decide if it’s what you want.

1 Like

We are talking about inference for this case.

ASR and TTS are not that hard.
I’ve worked quite a lot with ASR, and you can achieve 10~20x real-time speeds on a 3090 even with the larger models (what I have), and don’t need that much vram. Whisper large running in my 3090 ATM uses less than 4GB of VRAM.

For TTS I only did some minor experiments, but it’s also doable to achieve 10x real-time for generation (not sure how doable it would be for streaming audio tho, I haven’t tried that at all).

For an LLM you can totally do generation of a small model at blazing speeds (over 100tok/s for 7b models), or reasonable speeds for medium sized ones (30+tok/s for 30b), really depends on what OP is planning and how much free vram they’ll have for the other models.

This website gives some really nice insights on performance:

For inference at this small scale? Not really, PCIe bandwidth won’t be that relevant.

1 Like

I have considered build a dedicated rig just for adjusting the LLM, but currently that would be cost prohibitive.

Hopping to get stings started with the equipment I currently own and go from there.

1 Like

I summarized everything so it would be easier to understand, but you’re right. There would be several pieces that would have to work together to get this to work.

At this point it’s just an idea that I have in my head, but I’d still like to try creating something like this. Regarding the LLM training/guided learning, I think it could be speed up if it listened to my calls and I explained what I did an why.

BTW, I’ll try to answer most questions in one post so you guys don’t have to go back and forth to read everything. First time on a website that uses such a system.

So for the hardware, I’ve made up my mind. I’ll go with a motherboard that supports ECC RAM (in case I need it in the future). When it comes to the GPU, I’ll wait and see how the latest ones perform and then make my decision.

@TryTwiceMedia & @igormp thank you both for the helpful advice. I’ll post an update when the LLM is getting trained (will probably be months from now tho)

Might even create a dedicated article on the topic, so people can add their thoughts to it.
Anyhow that will be a project for another day.

:wave:

1 Like

To be honest, you could just rent some cloud instances for a little bit and train your own model in there, should be way cheaper than a new GPU just for that, and then you can use the resulting model for inference with your existing hardware.

As stated above: FOOKIN SEND IT!!!
I wanna see this thing run. Hollar when you have problems.