Hardware recommendations for large local LLM

TLDR: My model right now is about 60gb. Uses a context window of 1million tokens.

I’m curious what kind of hardware should I look to upgrade to? I’d like something that is also future proofed a bit as I continue to tinker with the model and it gets more demanding.

I was thinking of either a Mac Studio with 512gb of ram or the Ryzen 395 max with 128gb but I’m open to other suggestions or recommendations.

Thanks in advance!

Full context:

So my use case is a bit more extreme than most people.

I am a fan fic writer as a hobby. I have written 6 fan fiction books in my life. Each around 100-200k words. I have built a whole fictional universe for my characters. This is something I really enjoy but I actually hate the writing part of it. This is actually why I never publish anything for money and write under a fictional name as I have never been proud of my books.

Making fictional outlines is super fun for me but creative writing is my weak point and frankly just unenjoyable to me.

I’ve been training an AI model from Ollama on my previous works and all my outlines. I want to use this model to help me refine my prior works to improve the writing and use it for turning my unwritten outlines into full novels.

I know there’s paid software out there to do this but having used them I felt they produced a product that was no better than my meager skills. I want to actually produce a product that I would be proud to put my name on.

I did test my model and was actually very happy with the result. It’s not perfect but It’s much better than the paid models online but it took about 4 weeks to produce a single response which consisted of 1 chapter or about 1500 tokens.

I’d like to reduce that response time into hours if possible.

My model right now is about 60gb. Uses a context window of 1million tokens.

My rig has 64gb of ram and a 1080ti w/11gb. I also have an old 4tb mechanical hdd as paging for windows otherwise ollama would complain I didn’t have enough memory.

I’m curious what kind of hardware should I look to upgrade to?

I was thinking of either a Mac Studio with 512gb of ram or the Ryzen 395 max with 128gb but I’m open to other suggestions or recommendations.

Not directly relevant to your question, but you may want to pay attention to the long context performance of the model you use. (I assume it is some quantization of Minimax or Llama 4, given the 1M context).

Context Bench is an interesting source of such benchmarks, so is Fiction.liveBench, linked therein.

Under such standards, most open-weight models do not work well past 32K, and all but the absolute SOTA won’t past 100K. Beyond that, to use 1M context and wait 4 weeks for 1.5k tokens of output…I could only say I’m impressed with your patience. :sweat_smile:

You may wish to look into RAG setups rather than trying to fit everything into a single context, especially for 100K+ words of worldbuidling background, not 5% of which directly relevant at any one time, most of the time.

As to the setup itself, maybe look into the kind of setups people are using to run things like the DeepSeek-R1. Most of which are costly, but probably not too much so, if the M4 Max 512GB is within the realm of possibility.