You can read a PR and tell if it actually accomplishes what is says it does, right?
Mostly I can't, not if there are subtle issues. Certainly I can look and see if any bugs jump out at me, or any areas look suspicious, but understanding a piece of code I didn't write deeply enough tk execute it in my head usually takes longer than writing it myself.
What I can do is read a set of clearly-written functional or end-to-end tests, and see if they look like they should exercise the code written in the PR, and whether the assertions they make are the ones I'd expect, and whether there are any obvious cases that are missing. And, of course, I can look at CI and see whether said tests have passed.
The "attack dog" metaphor is definitely sticky but personally I've found that you can build the fence directly INTO the task definition, so the dog runs free but in a "directed" and "constrained" path.
When context is ambiguous and objective (the "why") isn't clear enough, it doesn't matter how much the model is smart, problems are going to arise and technical debt is certain. If you invest in complete and crystal-clear specs your chances that the plans come out right increases in a significant way (the so called Context Engineering).
I ran this as an experiment: I've invested about 80% of my time into specs building, then code generation became literally automatic. As a result Claude produced 7 modules (46 endpoints) in 4.5 hours (including testing), and the code was virtually bug free and production-ready. The spec was so complete and all hard decisions already made that Claude didn't have to guess, having in this way more time to do the right things instead wasting time in trying to decide a course (a task where it's often wrong).
Of course you lose the vibe coding exploration, but that experience moves upstream: into the process of spec-building. The hard thinking still happens, but just earlier and with a sizable prize: predictability and peace of mind for the "day two" onward.
I'd be interested to see a write-up of your experience doing this. My own experience with spec-driven development hasn't had so much success. I've found that the models tend to have trouble sticking to the spec.
With great pleasure! The experience was so revealing that led me to codifying the process and prepare the Stream Coding Manifesto (available on GitHub here: Stream Coding), just launched last month btw.
I've also created the relative Claude Skill in order to make it immediately actionable, downloadable as well via GitHub.
Attack Dogs
I mentioned previously that coding agents kind of suck for lots of people. As of January 2026, coding agents lack the long-horizon skills needed to produce effective codebases independently.
However, it's clear to anyone who has used modern coding models - Claude Opus 4.5, GPT 5.2-Codex, hell even GLM 4.7 (open source) - that they are smart, knowledgeable, agentic, and tenacious in a way that is almost uncanny.
Setting Claude Code on a problem with "--dangerously-skip-permissions" feels like letting an attack dog off the leash. It sprints straight at the problem and attacks it with the terrible certainty of something that has never known hesitation, all the violence of its training distilled into pure forward motion.
Which is fine as long as there isn't a fence in the way.
Rather than expecting the attack dog to catch the perp, cuff him, bring him in, and file the relevant papers independently - we can repurpose its knowledge, power, and tenacity as an extension of our own will. The interface of Saying What You Want combines with the utility of the model to present a new view on a codebase.
Codebase Interfaces
The most common interface to a codebase is a text editor. VSCode, Notepad++, IDEA, Vim, etc. You select the file you want to read and it presents a window of text which you can scroll and edit by interacting with your keyboard and mouse to add/remove characters. Maybe it has some functions like find/replace, find symbol references, rename symbol, git integration, DB querying, test runner, build automation, etc.
Text editors are pretty good. The majority of all code ever produced prior to 2025 went through a text editor. Code generation exists, but it's really more of an amplifier for text editor-produced code. Visual programming interfaces exist, but no one likes them because they suck (okay some people like them, sorry Scratch).
Text editors give you one view of the code. A very low-level, raw view of the code. Like reading "SELECT * FROM table" output. You can read the functions, classes, variables, etc. and produce a model at a higher level of abstraction (see Object Oriented Programming, Domain Driven Design, etc.). Then, you make changes at that higher level of abstraction, and translate them back down to key presses in your text editor.
Coding agents can give you a view of a codebase that is already on that higher level of abstraction. You can say:
And get back an accurate diagram of the data flow structure of the system. Then you can say:
And get back a correct answer. Then:
And the plan will be wrong. Probably. Sometimes you get lucky and it's right. But that's okay, you're a skilled engineer who's been tapping keys on a keyboard to manually add/remove individual characters from codebases for years. You can read a PR and tell if it actually accomplishes what is says it does, right? There's definitely a new skill to be learned here, which senior engineers with experience reviewing junior PRs already have a head start on. Worst case scenario, you break the plan down into small parts and go over them individually with the agent.
But hey, don't despair, in a couple of years the models will probably have improved enough to get the plans right first try, too.
Operating at a higher level of abstraction like this has a number of benefits. Subjectively, I find that:
To finish, a few (edited for ease of understanding) prompts from my recent history to give some concrete ideas on how this can be used: