I am curious how I can better use AI to code with existing codebases. I regularly develop plugins for Wordpress and an open source web app framework called Frappe Frapework.
Co-Pilot and ChatGPT certainly have some training with regards to both of these, but both routinely give me code completions that are either flat-out wrong, incoherent, or at least not optimal.
Is there a way I can better use AI models to code with existing codebases? Specifically I think i need a way to tell the AI to hone in on certain parts of the codebases that are relevant to what I am working on (without me having to manually go into the existing code and manually feed it).
The technique called Retrieval Augmented Generation creates tokens from an authoritative documents and feeds it to the LLM with a prompt like ‘using these documents/code as the most important references, answer the following question with examples from these documents/code…’
If you’re running locally, it’s a good way to reduce confabulation/hallucination/made-up nonsense, but there’s a lot of computation needed to convert source documents into RAG tokens, and your model may not have a token capacity to deal with a big codebase.
Thanks for the input. I am still very much an AI novice. I have been using various models daily for coding, but I know basically nothing about how it works…
Using RAG, do I assume that it remembers all of the prompts for that entire conversation, regardless of size? For instance, suppose in one prompt I feed it enough code to basically fill it’s entire context window, then the next prompt I feed it the same amount of some other code.
I would expect it drops some of the tokens, either the oldest items in the context window or fails to use some of the newer tokens. You may need to get it to give a compressed edition of the new state or break your problem down so that it can move in small increments to work around the size of this context window.