Training a local AI model on a specific codebase?

oguruma · March 14, 2024, 12:02am

I build applications with Frappe framework (an open source python framework for building webapps).

I’ve used ChatGPT/CoPilot to help me build some things. It has it’s use case, but it doesn’t seem to be much more useful above spitting out some basic boiler-plate level code.

It clearly has some understanding of Frappe Framework, but it obviously doesn’t have a completely grasp on it. For instance, it regularly gives me code that would break things.

I’m a complete AI-amatuer, so forgive me if this is a stupid question, but would it be feasible to train a local AI model such that it’s intimately familiar with the Frappe codebase (or any codebase for that matter) so that it can more usefully help me build stuff?

progressEdd · March 14, 2024, 8:56am

Yes the technique you are describing is fine tuning, where you take a pretrained model, feed it some new example input/output.
This is a great hands on guide that lines up with your usecase

This reply suggests unsloth, a fine tuning library
https://www.reddit.com/r/LocalLLaMA/comments/18ysntg/comment/kgd7jps/
Which uses QLoRA

Alternatively, if the frappe already has detailed documentation, and all you need is a chatbots to retrieve and explain the documentation, then you might consider the retrieve augment generate (RAG) technique.

This guide is for the huggingface documentation

Also, there’s another thread with a similar use case

oguruma · March 14, 2024, 7:23pm

Thanks for the input. Mainly what I’d like is the LLM to

Make sure none of the code I’ve written (or the code it writes for me) will conflict with the code in Frappe.
Write code most efficiently within the framework. For example, I don’t want to write a whole new method if Frappe already has something that will do it.

progressEdd · March 15, 2024, 3:51pm

if that’s the case, you might want to go for the RAG use case. That’s what wendell was referring to in the news. Basically you put the entire codebase into a llm with a high context window and have it write a response back

mikejmcfarlane · March 15, 2024, 5:00pm

This reference design from nvidia is suitable for local deploy and RAG usage. The example uses llama 2.

I haven’t tried the RAG component yet as been trying to get the code llama 34B model working. It’s fighting!

Dependent on your GPU setup, this example might not be useful though as to get good results at a decent token rate you need the medium to large parameter models, which takes a lot of VRAM and fast GPUs. To get the 34B parameter code llama, or 13B parameter chat model running, I’m running twin RTX A6000 with nvlink for a total of 96GB VRAM, and most of that memory is used. Getting about 45 tokens/s which is ok.

There appear to be a few open source projects that seem to be optimising the models for more modest GPU setups. See a lot of chat about The Bloke models for running locally, but not tried myself TheBloke (Tom Jobbins)

Some useful chats at Ask HN: Cheapest hardware to run Llama 2 70B | Hacker News

This is the first LLM i have tried running locally, so still a learner. Compared to claude for generating code and technical information (and diagrams of code!!!), so far it has been very crude.

system · December 14, 2024, 11:00am

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.