The quality of an LLM outputs is greatly dependent on how many guard rails you have setup to keep it on track and heuristics to point it on right direction (type checking + running tests after every change for example).
What is health of your enterprise code base? If it’s anything like ones I’ve experienced it’s a legacy mess then it’s absolutely understandable that an LLMs output is subpar when taking on larger tasks.
Also depends on the models and plan you’re on. There is a significant increase in quality when comparing Cursors default model on a free plan vs Opus 4.5 on a maximum Claude plan.
I think a good exercise is to prohibit yourself from writing any code manually and force yourself to do LLM only, might sound silly but it will develop that skill-set.
I routinely make an implementation plan with Claude and then step away for 15 mins while it spins - the results aren’t perfect but fixing that remaining 10% is better than writing 100% of it myself.
The code is quite easy to follow to be honest, we have documented a lot of stuff and segmented functionality into libraries that follow an app/feature/models pattern. Almost every service we have, has unit tests explicitly describing what the public api is doing or supposed to do on several scenarios, we never test implementation details.
Given it to new people of course carry questions, but most of them (juniors) could just follow the code given an entry point for that task, this from BE to FE.
I use the github copilot premium models available.
> I routinely make an implementation plan with Claude and then step away for 15 mins while it spins - the results aren’t perfect but fixing that remaining 10% is better than writing 100% of it myself.
I have to be honest, I just did this two times and the amount of code that needed to be fixed, and the mental overload to find open bugs was much more than just guide the LLM on every step. But this was a couple of months ago.
The quality of an LLM outputs is greatly dependent on how many guard rails you have setup to keep it on track and heuristics to point it on right direction (type checking + running tests after every change for example).
What is health of your enterprise code base? If it’s anything like ones I’ve experienced it’s a legacy mess then it’s absolutely understandable that an LLMs output is subpar when taking on larger tasks.
Also depends on the models and plan you’re on. There is a significant increase in quality when comparing Cursors default model on a free plan vs Opus 4.5 on a maximum Claude plan.
I think a good exercise is to prohibit yourself from writing any code manually and force yourself to do LLM only, might sound silly but it will develop that skill-set.
Try Claude code in thinking mode with the some super powers - https://github.com/obra/superpowers
I routinely make an implementation plan with Claude and then step away for 15 mins while it spins - the results aren’t perfect but fixing that remaining 10% is better than writing 100% of it myself.
The code is quite easy to follow to be honest, we have documented a lot of stuff and segmented functionality into libraries that follow an app/feature/models pattern. Almost every service we have, has unit tests explicitly describing what the public api is doing or supposed to do on several scenarios, we never test implementation details.
Given it to new people of course carry questions, but most of them (juniors) could just follow the code given an entry point for that task, this from BE to FE.
I use the github copilot premium models available.
> I routinely make an implementation plan with Claude and then step away for 15 mins while it spins - the results aren’t perfect but fixing that remaining 10% is better than writing 100% of it myself.
I have to be honest, I just did this two times and the amount of code that needed to be fixed, and the mental overload to find open bugs was much more than just guide the LLM on every step. But this was a couple of months ago.
Besides my other response, it can also be I am not smart enough for it.