

Are you trying to make a point that agents can’t use MCP based off of a picture of a tweet you saw or something?


Are you trying to make a point that agents can’t use MCP based off of a picture of a tweet you saw or something?


Your real problem is that, even with all rights reserved (full copy protection), the law won’t disallow someone from running a statistical analysis on your work.


Again, read and understand the limitations of the study. Just the portion I quoted you alone is enough to show you that you’re leaning way too heavily on conclusions that they don’t even claim to provide evidence for.


Do you think that like nobody has access to AI or something? These guys are the ultimate authorities on AI usage? I won’t claim to be but I am a 15 YOE dev working with AI right now and I’ve found the quality is a lot better with better rules and context.
And, ultimately, I don’t really care if you believe me or not. I’m not here to sell you anything. Don’t use it the tools, doesn’t matter to me. Anybody else who does use them, give my advice a try an see if it helps you.


More to the point, that is exactly what the people in this study were doing.
They don’t really do into a lot of detail about what they were doing. But they have a table on limitations of the study that would indicate it is not.
We do not provide evidence that: There are not ways of using existing AI systems more effectively to achieve positive speedup in our exact setting. Cursor does not sample many tokens from LLMs, it may not use optimal prompting/scaffolding, and domain/repository-specific training/finetuning/few-shot learning could yield positive speedup.
Back to this:
even if it did it’s not any easier or cheaper than teaching humans to do it.
In my experience, the kinds of information that an AI needs to do its job effectively has a significant overlap with the info humans need when just starting on a project. The biggest problem for onboarding is typically poor or outdated internal documentation. Fix that for your humans and you have it for your LLMs at no extra cost. Use an LLM to convert your docs into rules files and to keep them up to date.


This lines up with my experience as well and what you’ve described is very close to how I work with LLM agents. The people bragging about 10x are either blowing smoke or producing garbage. I mean, I guess in some limited contexts I might get 10x out of taking a few seconds to write a prompt vs a couple of minutes of manual hunting and typing. But on the whole, software engineering is about so much more than just coding and those things have become no less important these days.
But the people acting like the tech is a useless glorified Markov generator are also out of their mind. There are some real gains to be had by properly using the tech. Especially once you’ve laid the groundwork by properly documenting things like your architecture and dependencies for LLM consumption. I’m not saying this to try to sell anybody on it but I really, truly, can’t imagine that we’re ever going back to the before times. Maybe there’s a bubble burst like the dotcom bubble but, like the internet, agentic coding is here to stay.


This is not really true.
The way you teach an LLM, outside of training your own, is with rules files and MCP tools. Record your architectural constraints, favored dependencies, and style guide information in your rule files and the output you get is going to be vastly improved. Give the agent access to more information with MCP tools and it will make more informed decisions. Update them whenever you run into issues and the vast majority of your repeated problems will be resolved.


It’s worth noting that good IDE integrated agents also have access to these deterministic tools. In my experience, they use them quite often. Even for minor parts of their tasks that I would typically just type out.
The generalized learning is usually just the first step. Coding LLMs typically go through more rounds of specialized learning afterwards in order to tune and focus it towards solving those types of problems. Then there’s RAG, MCP, and simulated reasoning which are technically not training methods but do further improve the relevance of the outputs. There’s a lot of ongoing work in this space still. We haven’t seen the standard even settle yet.


Maybe Fallacy is a better word than Paradox? Take a look at any AI-related thread and it’s filled to the brim with people lamenting the coming collapse of software development jobs. You might believe that this is obvious but to many, many people it’s anything but.


I can read your code, learn from it, and create my own code with the knowledge gained from your code without violating an OSS license. So can an LLM.
Not even just an OSS license. No license backed by law is any stronger than copyright. And you are allowed to learn from or statistically analyze even fully copyrighted work.
Copyright is just a lot more permissive than I think many people realize. And there’s a lot of good that comes from that. It’s enabled things like API emulation and reverse engineering and being able to leave our programming job to go work somewhere else without getting sued.


Yeah I don’t think we should be pushing to have LLMs generate code unsupervised. It’s an unrealistic standard. It’s not even a standard most companies would entrust their most capable programmers with. Everything needs to be reviewed.
But just because it’s not working alone doesn’t mean it’s useless. I wrote like 5 lines of code this week by hand. But I committed thousands of lines. And I reviewed and tweaked and tended to every one of them. That’s how it should be.


I’m not sure I get your analogy. This is more to me like two people got into a bath and one went “Ooh, that’s a bit too warm” while the other screamed "REEEEEEE HOOOOOT”. The degree is the same. The response is not.


Kinda funny the juxtaposition between the programmers’ reaction to this compared to the “techies” reaction on the crosspost.
Maybe we’re still early yet so I’ll write the difference right now for posterity: Programming post is generally critical of the article and has several suggestions on how to improve the quality of agent-assisted code.
Technology post is pretty much just “REEEEEEEEEEE AI BAD”


You’re not going to find me advocating for letting the code go into production without review.
Still, that’s a different class of problem than the LLM hallucinating a fake API. That’s a largely outdated criticism of the tools we have today.


I’ve thought about this many times, and I’m just not seeing a path for juniors. Given this new perspective, I’m interested to hear if you can envision something different than I can. I’m honestly looking for alternate views here, I’ve got nothing.
I think it’ll just mean they they start their careers involved in higher level concerns. It’s not like this is the first time that’s happened. Programming (even just prior to the release of LLM agents) was completely different from programming 30 years ago. Programmers have been automating junior jobs away for decades and the industry has only grown. Because the fact of the matter is that cheaper software, at least so far, has just created more demand for it. Maybe it’ll be saturated one day. But I don’t think today’s that day.


Agents now can run compilation and testing on their own so the hallucination problem is largely irrelevant. An LLM that hallucinates an API quickly finds out that it fails to work and is forced to retrieve the real API and fix the errors. So it really doesn’t matter anymore. The code you wind up with will ultimately work.
The only real question you need to answer yourself is whether or not the tests it generates are appropriate. Then maybe spend some time refactoring for clarity and extensibility.


There are bad coders and then there are bad coders. I was a teaching assistant through grad school and in the industry I’ve interviewed the gamut of juniors.
There are tons of new grads who can’t code their way out of a paper bag. Then there’s a whole spectrum up to and including people who are as good at the mechanics of programming as most seniors.
The former is absolutely going to have a hard time. But if you’re beyond that you should have the skills necessary to critically evaluate an agent’s output. And any more time that they get to instead become involved in the higher level discussions going on around them is a win in my book.
What? I’ve already written the design documentation and done all the creative and architectural parts that I consider most rewarding. All that’s left for coding is answering questions like “what exactly does the API I need to use look like?” and writing a bunch of error handling if statements. That’s toil.
I find it best to get the agent into a loop where it can self-verify. Give it a clear set of constraints and requirements, give it the context it needs to understand the space, give it a way to verify that it’s completed its task successfully, and let it go off. Agents may stumble around a bit but as long as you’ve made the task manageable it’ll self correct and get there.