I do a lot of backend and frontend web development, mainly on open source projects like Wordpress.
I’ve tried Windsurf and Cursor with Sonnet 3.5 and GPT-4o.
I find it incredibly frustrating for anything other basic boilerplating. It’s incredibly frequent that I spell out what I want as specifically as I can, and it makes massive unwanted changes to my code. It’s almost as if it’s saying “I don’t know why this is here (even if the code is perfectly functional), so I’ll just remove it”. Often times, these are breaking changes.
Hell, I’m more productive just manually passing in the relevant pieces of code to ChatGPT’s free 4o-Mini model, because at least that way it doesn’t just take a fat shit on code that it has no business touching in the first place. Of course, I can review the changes before it makes them, but that ends up taking more time than if I just made the changes I wanted myself to begin with.
What am I doing wrong? How do I keep it from cluster-fucking my code? Am I just a paid-beta-testing pay-pig and this crap isn’t ready to be actually used?
Couldn’t agree more with you, the way I view it is that those models are really good for frequently-written code, but struggle with less commonly occurring programming patterns so far.
I mostly write on compilers, and at the runtime levels. I went back to a pure neovim setup with linters, and templating, but without any LLM suggestions.
LLMs just generate output that is similar to what’s in their training data. They cannot reason about, and have no “understanding” of, the specific use case. There’s no intelligence behind them. They do not understand the difference between fact and fiction. They are just very good at generating text (or images or whatever) that “look plausible” (as long as you don’t take a close look)!
So they can be used by someone who’s already a subject matter expert to maybe get new ideas about how to do things or to double-check that one didn’t forget something basic, but they won’t be able to write real code for you! They simply don’t “understand” what code is!
Not sure that’s fully fair to modern LLMs, which have started to lean more heavily into reasoning such as Open AI’s O1, and Deepseek’s R1. They can reason within bounds.
But what is the goal?
You need to be able to clearly define your goal, be it through tests, or execution traces. That may not always be available.
Debugging large amounts of AI-generated code takes too long if you ask me, at that point you might as well write the code yourself.
So far it’s been my experience I can write trivial code faster and more reliably on my own than I can correct and debug AI attempts at implementing common things. If it’s something which requires some effort then AI’s useless due to its lack of understanding.
For most the coding I do really the most useful investment in an AI feature would be if it’d recognize whether the codebase style convention is
for (int i = 0; i < bound; i++)
or
for (int i = 0; i < bound; ++i)
to be able to get the increment part of autocompletes right. Doesn’t look to me like the hype’s going to start connecting with practical day to day realities like this any time soon, though.
As another example, one of the codebases I work on does some geometry things. Since AI doesn’t understand x and y axes it arbitrarily interchanges fooX and fooY variables, meaning if it comes up with a correct line of code it’s only by accident of happening to get the right x and y variables and parenthesis in the right locations. So pretty much every line has to carefully proofread, fixed, and reinspected to make sure it’s actually right. If it’s a complex formula, it’s easy for bugs to be introduced.
I rarely go to an LLM unless I get stumped and it’s something hard. Then I always get back garbage, complex nonworking code, or replies that have “Implement the RTSP server here.”
They are only good for solved problems. We get a co-pilot license at work, and it’s okay for small snippets. But any large generation it tries to predict always looks fucking awful. It’s difficult to read, and I’m always shocked if and when it actually works.
I haven’t done any machine learning since college (in undergrad we the Green Book. When I was in grad school, all the undergrads had the Blue Book. Kinda funny how you can tell when a CS major went to school by the color of the A.I. book).
But if you took A.I once in your life and are familiar with the basics of back propagation, 3Blue1Brown has a great video series on how LLMs work:
Both of these are really good. If you don’t know how neural networks work, he has some older videos that get into the basics of that.