Yeah, as much as I love the hardware/software challenge of running big LLMs locally, I don’t actually use them for very many things.
I haven’t tried the paid API versions of “deep research” but did setup a local self-hosted sandbox to try some open source alternatives e.g. dzhng/deep-research + mendableai/firecrawl and was not impressed with the results.
I’ve found that while “agentic workflows” are interesting and hold promise, that in practice typically adding up lots and lots of mediocre results do not yield good results.
Specific to your task of translating/mapping english/german episodes, my guess is it is a fair amount of “context” that you had to copy paste into it. Most LLMs I’ve played with don’t do well with questions that involve many areas across a long context. They are trained mostly to pick out a “needle in haystack” single small reference from a lot of text, not more complex relationships…
A few things that I have had success with running larger open weight models like DeepSeek-V3-0324 and now experimenting with Qwen3-235B-A22B:
- Quickly translating technical discussions on github to/from chinese/english and some other languages. It is good enough at translating while having some contextual tech information to get the ideas across. It still takes a lot of copy-pasting though.
- Summarizing longer text into a few short key bullet points. Not perfect, but can be handy given how much content there is out there and never enough time to read it all (including my long azz response lmao) xD.
- Generating throw away python code for common tasks like making graphs from arbitrary formatted data files. It is pretty good for parsing text files and making simple charts, but I do have to double check and fixup a few things or know enough to suggest the approach.
- Refactoring code to make formatting, comments, static type checking, and other suggestions. You can
vibe code
up some quick prototypes to test your ideas, but I definitely treat it as throw away code most of the time. - Entertainment like co-writing silly stories with your friends by taking turns prompting the bot.
If you can keep your context to under 8k tokens (like ~5000 english words), many of the bigger models you can run at home have a few handy uses in my opinion.
But totally as others have said here, it is definitely unfortunate that LLMs and ai in general are over-prescribed when there are more simple existing solutions that are skipped over due to all the hype.