Coding Agents As An Interface To The Codebase

omegastick

Attack Dogs

I mentioned previously that coding agents kind of suck for lots of people. As of January 2026, coding agents lack the long-horizon skills needed to produce effective codebases independently.

However, it's clear to anyone who has used modern coding models - Claude Opus 4.5, GPT 5.2-Codex, hell even GLM 4.7 (open source) - that they are smart, knowledgeable, agentic, and tenacious in a way that is almost uncanny.

Setting Claude Code on a problem with "--dangerously-skip-permissions" feels like letting an attack dog off the leash. It sprints straight at the problem and attacks it with the terrible certainty of something that has never known hesitation, all the violence of its training distilled into pure forward motion.

Which is fine as long as there isn't a fence in the way.

Rather than expecting the attack dog to catch the perp, cuff him, bring him in, and file the relevant papers independently - we can repurpose its knowledge, power, and tenacity as an extension of our own will. The interface of Saying What You Want combines with the utility of the model to present a new view on a codebase.

Codebase Interfaces

The most common interface to a codebase is a text editor. VSCode, Notepad++, IDEA, Vim, etc. You select the file you want to read and it presents a window of text which you can scroll and edit by interacting with your keyboard and mouse to add/remove characters. Maybe it has some functions like find/replace, find symbol references, rename symbol, git integration, DB querying, test runner, build automation, etc.

Text editors are pretty good. The majority of all code ever produced prior to 2025 went through a text editor. Code generation exists, but it's really more of an amplifier for text editor-produced code. Visual programming interfaces exist, but no one likes them because they suck (okay some people like them, sorry Scratch).

Text editors give you one view of the code. A very low-level, raw view of the code. Like reading "SELECT * FROM table" output. You can read the functions, classes, variables, etc. and produce a model at a higher level of abstraction (see Object Oriented Programming, Domain Driven Design, etc.). Then, you make changes at that higher level of abstraction, and translate them back down to key presses in your text editor.

Coding agents can give you a view of a codebase that is already on that higher level of abstraction. You can say:

What does the data flow through the system look like for the interaction in this log file?

And get back an accurate diagram of the data flow structure of the system. Then you can say:

This is unnecessarily chatty. All the data used for this operation is available in the first request, is there another use-case for this flow that makes the extra requests necessary?

And get back a correct answer. Then:

Okay, let's simplify the client-server interaction model here so that the client only sends one request. Produce a plan for the implementation.

And the plan will be wrong. Probably. Sometimes you get lucky and it's right. But that's okay, you're a skilled engineer who's been tapping keys on a keyboard to manually add/remove individual characters from codebases for years. You can read a PR and tell if it actually accomplishes what is says it does, right? There's definitely a new skill to be learned here, which senior engineers with experience reviewing junior PRs already have a head start on. Worst case scenario, you break the plan down into small parts and go over them individually with the agent.

But hey, don't despair, in a couple of years the models will probably have improved enough to get the plans right first try, too.

Operating at a higher level of abstraction like this has a number of benefits. Subjectively, I find that:

It's easier stay focused on high-level objectives without going down implementation detail rabbit holes.
Stamina is improved many times over. My brain gets tired after maybe 3-4 hours of coding via a text editor, but at a higher level of abstraction I can go all day.
You can build mental models that would be so time consuming to build manually that the utility-per-unit-of-effort tradeoff is not worth it with a text editor interface.
You can confirm/disprove notions that you're not 100% sure on with very little effort.
You can try out and evaluate multiple designs very quickly.
Debugging is 100x easier.
You can perform refactors that would take far too much effort to be practical with a text editor easily.

To finish, a few (edited for ease of understanding) prompts from my recent history to give some concrete ideas on how this can be used:

Let's have {component A} take ownership of the streamed content entirely. So the callback triggers on each new content piece coming in, with just that content piece (not the whole accumulated message). {component A} would then let the broadcaster know on each callback. What would an architecture like that look like? What would have to change? How does this affect the rest of the system?

Let's do an analysis of which repos take {format A} models and which take {format B}.
So, where/how do we create the repos in this design? What does the model of the session/DB connection management look like?
Please compare the compaction mechanism from this repo with the one in {other project}. What are the key differences?
What is the performance overhead like for spawning 2 tasks for every stream event sent? Let's say we're streaming 10 events per second per worker, with 10 workers streaming at the same time, that's 200 new events per second on an instance. Do a quick benchmark.
Pick one failure and use the production logs to piece together what happened. If you can't figure it out, then determine which extra logging would have helped you figure it out.

You can read a PR and tell if it actually accomplishes what is says it does, right?

Mostly I can't, not if there are subtle issues. Certainly I can look and see if any bugs jump out at me, or any areas look suspicious, but understanding a piece of code I didn't write deeply enough tk execute it in my head usually takes longer than writing it myself.

What I can do is read a set of clearly-written functional or end-to-end tests, and see if they look like they should exercise the code written in the PR, and whether the assertions they make are the ones I'd expect, and whether there are any obvious cases that are missing. And, of course, I can look at CI and see whether said tests have passed.

100%. A good test suite is worth its weight in gold.

The "attack dog" metaphor is definitely sticky but personally I've found that you can build the fence directly INTO the task definition, so the dog runs free but in a "directed" and "constrained" path.

When context is ambiguous and objective (the "why") isn't clear enough, it doesn't matter how much the model is smart, problems are going to arise and technical debt is certain. If you invest in complete and crystal-clear specs your chances that the plans come out right increases in a significant way (the so called Context Engineering).

I ran this as an experiment: I've invested about 80% of my time into specs building, then code generation became literally automatic. As a result Claude produced 7 modules (46 endpoints) in 4.5 hours (including testing), and the code was virtually bug free and production-ready. The spec was so complete and all hard decisions already made that Claude didn't have to guess, having in this way more time to do the right things instead wasting time in trying to decide a course (a task where it's often wrong).

Of course you lose the vibe coding exploration, but that experience moves upstream: into the process of spec-building. The hard thinking still happens, but just earlier and with a sizable prize: predictability and peace of mind for the "day two" onward.

I'd be interested to see a write-up of your experience doing this. My own experience with spec-driven development hasn't had so much success. I've found that the models tend to have trouble sticking to the spec.

With great pleasure! The experience was so revealing that led me to codifying the process and prepare the Stream Coding Manifesto (available on GitHub here: Stream Coding), just launched last month btw.

I've also created the relative Claude Skill in order to make it immediately actionable, downloadable as well via GitHub.