I Went All-In on AI. The MIT Study Is Right.

AutistoMephisto@lemmy.world · edit-2 3 days ago

I Went All-In on AI. The MIT Study Is Right.

MangoCats@feddit.it · 2 days ago

It’s good at writing it, ideally 50-250 lines at a time

I find Claude Sonnet 4.5 to be good up to 800 lines at a chunk. If you structure your project into 800ish line chunks with well defined interfaces you can get 8 to 10 chunks working cooperatively pretty easily. Beyond about 2000 lines in a chunk, if it’s not well defined, yeah - the hallucinations start to become seriously problematic.

The new Opus 4.5 may have a higher complexity limit, I haven’t really worked with it enough to characterize… I do find Opus 4.5 to get much slower than Sonnet 4.5 was for similar problems.

theneverfox@pawb.social · 22 hours ago

Okay, but if it’s writing 800 lines at once, it’s making design choices. Which is all well and good for a one off, but it will make those choices, make them a different way each time, and it will name everything in a very generic or very eccentric way

The AI can’t remember how it did it, or how it does things. You can do a lot… Even stuff that hasn’t entered commercial products like vectorized data stores to catalog and remind the LLM of key details when appropriate

2000 lines is nothing. My main project is well over a million lines, and the original author and I have to meet up to discuss how things flow through the system before changing it to meet the latest needs

But we can and do it to meet the needs of the customer, with high stakes, because we wrote it. These days we use AI to do grunt work, we have junior devs who do smaller tweaks.

If an AI is writing code a thousand lines at a time, no one knows how it works. The AI sure as hell doesn’t. If it’s 200 lines at a time, maybe we don’t know details, but the decisions and the flow were decided by a person who understands the full picture

MangoCats@feddit.it · 10 minutes ago

but it will make those choices, make them a different way each time

That’s a bit of the power of the process: variety. If the implementation isn’t ideal, it can produce another one. In theory, it can produce ten different designs for any given solution then select the “best” one by whatever criteria you choose. If you’ve got the patience to spell it all out.

The AI can’t remember how it did it, or how it does things.

Neither can the vast majority of people after several years go by. That’s what the documentation is for.

2000 lines is nothing.

Yep. It’s also a huge chunk of example to work from and build on. If your designs are highly granular (in a good way), most modules could fit under 2000 lines.

My main project is well over a million lines

That’s should be a point of embarrassment, not pride. My sympathies if your business really is that complicated. You might ask an LLM to start chipping away at refactoring your code to collect similar functions together to reduce duplication.

But we can and do it to meet the needs of the customer, with high stakes, because we wrote it. These days we use AI to do grunt work, we have junior devs who do smaller tweaks.

Sure. If you look at bigger businesses, they are always striving to get rid of “indispensible duos” like you two. They’d rather pay 6 run-of-the-mill hire-more-any-day-of-the-week developers than two indispensibles. And that’s why a large number of management types who don’t really know how it works in the trenches are falling all over themselves trying to be the first to fly a team that “does it all with AI, better than the next guys.” We’re a long way from that being realistic. AI is a tool, you can use it for grunt work, you can use it for top level design, and everything in-between. What you can’t do is give it 25 words or less of instruction and expect to get back anything of significant complexity. That 2000 line limit becomes 1 million lines of code when every four lines of the root module describes another module.

If an AI is writing code a thousand lines at a time, no one knows how it works.

Far from it. Compared with code I get to review out of India, or Indiana, 2000 lines of AI code is just as readable as any 2000 lines I get out of my colleagues. Those colleagues also make the same annoying deviations from instructions that AI does, the biggest difference is that AI gets it’s wrong answer back to me within 5-10 minutes, Indiana? We’ve been correcting and recorrecting the same architectural implementation for the past 6 months. They had a full example in C++, they are going to “translate it to Rust” for us. I figured, it took me about 6 weeks total to develop the system from scratch, with a full example like they have they should be well on their way in 2 weeks. Yeah, nowhere in 2 weeks, so I do a Rust translation for them in the next two weeks, show them. O.K. we see that, but we have been tasked to change this aspect of the interface to something undefined, so we’re going to do an implementation with that undefined interface… and so I refine my Rust implementation to a highly polished example ready for any undefined interface you throw at it within another 2 weeks, and Indiana continues to hack away at three projects simultaneously, getting nowhere equally fast on all 3. It has been 7 months now, I’m still reviewing Indiana’s code and reminding them, like I did the AI, of all the things I have told them six times over the past 7 months that they keep drifting off from.