Just want to clarify, this is not my Substack, I’m just sharing this because I found it insightful.
The author describes himself as a “fractional CTO”(no clue what that means, don’t ask me) and advisor. His clients asked him how they could leverage AI. He decided to experience it for himself. From the author(emphasis mine):
I forced myself to use Claude Code exclusively to build a product. Three months. Not a single line of code written by me. I wanted to experience what my clients were considering—100% AI adoption. I needed to know firsthand why that 95% failure rate exists.
I got the product launched. It worked. I was proud of what I’d created. Then came the moment that validated every concern in that MIT study: I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.
Now when clients ask me about AI adoption, I can tell them exactly what 100% looks like: it looks like failure. Not immediate failure—that’s the trap. Initial metrics look great. You ship faster. You feel productive. Then three months later, you realize nobody actually understands what you’ve built.



I find Claude Sonnet 4.5 to be good up to 800 lines at a chunk. If you structure your project into 800ish line chunks with well defined interfaces you can get 8 to 10 chunks working cooperatively pretty easily. Beyond about 2000 lines in a chunk, if it’s not well defined, yeah - the hallucinations start to become seriously problematic.
The new Opus 4.5 may have a higher complexity limit, I haven’t really worked with it enough to characterize… I do find Opus 4.5 to get much slower than Sonnet 4.5 was for similar problems.
Okay, but if it’s writing 800 lines at once, it’s making design choices. Which is all well and good for a one off, but it will make those choices, make them a different way each time, and it will name everything in a very generic or very eccentric way
The AI can’t remember how it did it, or how it does things. You can do a lot… Even stuff that hasn’t entered commercial products like vectorized data stores to catalog and remind the LLM of key details when appropriate
2000 lines is nothing. My main project is well over a million lines, and the original author and I have to meet up to discuss how things flow through the system before changing it to meet the latest needs
But we can and do it to meet the needs of the customer, with high stakes, because we wrote it. These days we use AI to do grunt work, we have junior devs who do smaller tweaks.
If an AI is writing code a thousand lines at a time, no one knows how it works. The AI sure as hell doesn’t. If it’s 200 lines at a time, maybe we don’t know details, but the decisions and the flow were decided by a person who understands the full picture
That’s a bit of the power of the process: variety. If the implementation isn’t ideal, it can produce another one. In theory, it can produce ten different designs for any given solution then select the “best” one by whatever criteria you choose. If you’ve got the patience to spell it all out.
Neither can the vast majority of people after several years go by. That’s what the documentation is for.
Yep. It’s also a huge chunk of example to work from and build on. If your designs are highly granular (in a good way), most modules could fit under 2000 lines.
That’s should be a point of embarrassment, not pride. My sympathies if your business really is that complicated. You might ask an LLM to start chipping away at refactoring your code to collect similar functions together to reduce duplication.
Sure. If you look at bigger businesses, they are always striving to get rid of “indispensible duos” like you two. They’d rather pay 6 run-of-the-mill hire-more-any-day-of-the-week developers than two indispensibles. And that’s why a large number of management types who don’t really know how it works in the trenches are falling all over themselves trying to be the first to fly a team that “does it all with AI, better than the next guys.” We’re a long way from that being realistic. AI is a tool, you can use it for grunt work, you can use it for top level design, and everything in-between. What you can’t do is give it 25 words or less of instruction and expect to get back anything of significant complexity. That 2000 line limit becomes 1 million lines of code when every four lines of the root module describes another module.
Far from it. Compared with code I get to review out of India, or Indiana, 2000 lines of AI code is just as readable as any 2000 lines I get out of my colleagues. Those colleagues also make the same annoying deviations from instructions that AI does, the biggest difference is that AI gets it’s wrong answer back to me within 5-10 minutes, Indiana? We’ve been correcting and recorrecting the same architectural implementation for the past 6 months. They had a full example in C++, they are going to “translate it to Rust” for us. I figured, it took me about 6 weeks total to develop the system from scratch, with a full example like they have they should be well on their way in 2 weeks. Yeah, nowhere in 2 weeks, so I do a Rust translation for them in the next two weeks, show them. O.K. we see that, but we have been tasked to change this aspect of the interface to something undefined, so we’re going to do an implementation with that undefined interface… and so I refine my Rust implementation to a highly polished example ready for any undefined interface you throw at it within another 2 weeks, and Indiana continues to hack away at three projects simultaneously, getting nowhere equally fast on all 3. It has been 7 months now, I’m still reviewing Indiana’s code and reminding them, like I did the AI, of all the things I have told them six times over the past 7 months that they keep drifting off from.