kc
EssayAgentic engineering
what "write loops" gives you vs. what actually ships valuethe advice everyone repeats — the easy 10%the loop~10 lines, unchanged since ReAct (2022)what actually decides ship vs. burn — the hard 90%verification: prove donehalting: cap iterations/$skills: tested, named opsrecovery: durable git statethe slogan sells the blue box. the value is entirely in the green ones.and the green ones are specific to your repo, your tests, your money — nobody can hand them to you

Everyone told me to write loops.
The loop was the part that didn't matter.

KC·Jun 24, 2026·11 min read

I took the most viral advice in AI engineering at its word, gave up a weekend to rebuild my production stack around it, and the bill waiting on Monday morning taught me what the slogan left out: the loop was never the part that mattered.

In early June I did something I almost never do, which is take viral advice literally. Peter Steinberger had posted six words — you should be writing loops — and watched them gather two and a half million views in a week, and five days before that Boris Cherny, the engineer who actually built Claude Code, had told a room that he doesn't prompt Claude anymore, that his job now is to write the loops that prompt Claude for him. I run a production agent stack every day, the phrase was everywhere, and I decided to stop arguing with the timeline and just do what it said. I spent a weekend rebuilding the way my agents run so that loops, not me, were the things issuing the prompts.

I want to tell you what that weekend taught me, because the lesson is the whole point of this essay and I'd rather say it plainly than make you dig for it: the loop is the part that doesn't matter. Writing it took an afternoon. Everything that determined whether the result was worth running — whether it shipped real work or quietly set fire to the weekend's budget — lived in the part the slogan never mentions. So my honest verdict on the loudest advice in AI engineering this year is that it isn't wrong, exactly, it's just pointed at the wrong thing. "Write loops" is true the way "just exercise" is true: it names the right room and then leaves you stranded at the door. What follows is what I found on the other side of that door, and the advice I'd give instead.

The loop is ten lines, and it's a decade old

The first thing that weekend deflated was the mystique, because once you actually write a loop you see there's nothing to it. A loop is a small program that prompts your agent, reads what comes back, asks whether the work is finished, and prompts again if it isn't — and you can write the skeleton in the time it takes to pour a coffee.

def loop(goal):
    state = None
    while True:
        result = agent.run(goal, state)   # the model does the work
        state = result.state
        if done(state):                    # <-- everything that matters lives here
            return state

Look at that and try to find the hard part, because I couldn't. It isn't the while, which has behaved the same way since long before any of us were prompting anything, and it isn't agent.run, which is just somebody else's API call. The entire question of whether this little program does something useful or quietly bankrupts you collapses down into one function, done(), and the tweet that earned two and a half million views does not so much as gesture at it. That should have been my first clue. The control flow has been a solved problem since Yao and his coauthors published the ReAct paper back in 2022, laying out the reason-act-observe-repeat cycle that every agent loop since has been a minor variation on. AutoGPT wrapped a goal in this exact loop in 2023 and became a punchline for spinning forever while accomplishing nothing, and the reason is worth sitting with, because it had a perfectly good while and a completely worthless done(). Everything that came afterward has been the same one-liner getting its done() slowly less wrong — Geoffrey Huntley's ralph in 2025, a bash loop that re-pipes a prompt file over and over, and then /goal and /loop shipping inside Codex and Claude Code in the spring of 2026. Every advance in that history improved the ten percent nobody struggles with, and the loud version of the advice still stops at the while.

what "write loops" gives you vs. what actually ships valuethe advice everyone repeats — the easy 10%the loop~10 lines, unchanged since ReAct (2022)what actually decides ship vs. burn — the hard 90%verification: prove donehalting: cap iterations/$skills: tested, named opsrecovery: durable git statethe slogan sells the blue box. the value is entirely in the green ones.and the green ones are specific to your repo, your tests, your money — nobody can hand them to you

The done() function is the entire essay

The move that separates a demo that wows a conference from a system you'd trust to run while you sleep turned out to be a single question, and I'll put it the way I eventually had to put it to myself: what does done actually look like, and can the loop prove it's true without me sitting there to check?

When I got that question wrong, my loop didn't fail loudly, which would at least have been honest — it failed with total confidence. The model would write some code, and then the very same model would be asked whether the code looked done, and naturally it said yes, because it had just written it, and the loop would ship a green checkmark stretched neatly over a broken build. It had graded its own homework and given itself an A. That is not autonomy in any sense worth the word; it is a confident intern with nobody reviewing the work and a copy of your production keys. When I got the question right, the loop was forced to answer to something it could not talk its way past — and a good done(), I learned, is almost never another model. It is your test suite, your linter, your type checker, a diff that either applies cleanly or doesn't. It is ground truth, the kind the agent cannot flatter into submission.

def done(state):
    # BAD: the model judges its own work
    # return llm.ask("is this done?")            # that's a vibe, not a verdict

    # GOOD: graded against something it cannot fake
    tests = run("pytest -q")
    types = run("mypy .")
    return tests.passed and types.clean

Small as that difference looks on the page, it is the whole gap between an agent that genuinely helps and one that hands you a weekend of cleanup, and it is, not by accident, the exact part of the conversation that stays quiet. A good done() is bound to your repository, your tests, and your own appetite for things going sideways, which means nobody can tweet it to you. The slogan travels so well precisely because it is hollow, and the real value refuses to travel precisely because it is yours.

same loop, no verification: converges confidently on the wrong answerprompt agent"looks done?"self-judged by the same model that did the workno ground truth — it grades its own homeworkships a green checkmark over a broken buildsame loop, real verification: fails honestly, ships truthfullyprompt agentrun tests + lint(ground truth)pass? ship.fail? raise.graded against something the model can't fakea red test is a real signal, not a vibethe green checkmark means the build is actually green

If you doubt me, follow the money

If the argument from experience doesn't move you, then watch where the dollars went, because money is rarely sentimental about hype. In 2024 the model was the expensive line on the invoice; by 2026 the model is nearly free and the loop is the expensive thing, to the point where Uber capped its own engineers at fifteen hundred dollars per person, per tool, per month after burning through its annual AI budget in four months flat. A working engineer summed the whole situation up in one reply that did more honest work than the viral thread above it, when he said that every AI agent he shipped that year was a for-loop, an LLM call, and a try/catch around the JSON parsing, and that the only thing agentic about any of it was the Anthropic bill it ran up. My own weekend bill said the same thing in smaller numbers.

That bill is what a missing done() looks like once you translate it into dollars, because a loop with no halting logic doesn't stop being wrong over time, it just stops being affordable. This is why every serious write-up from this year, wherever it happens to start, quietly arrives at the same three guard rails the slogan never bothers to mention — a ceiling on how many times the loop may run, a way to notice when no real progress is being made, and a hard limit on how much money the thing is allowed to spend before it gives up and asks for a human.

def loop(goal, max_iter=50, budget_usd=15.0):
    state, spent = load_git_state(), 0.0      # durable, survives a crash
    for i in range(max_iter):                 # 1. it has to halt
        r = agent.run(goal, state); spent += r.cost
        state = save_git_state(update(state, r))
        if spent > budget_usd: raise BudgetBlown(spent)   # 2. it can't bankrupt you
        if no_progress(state): raise Stuck(i)             # 3. it can't spin forever
        if done(state): return state          # the gate that actually matters
    raise MaxIterations(max_iter)

Read that honestly and you'll notice it is almost entirely guard rails and done(). The loop that everyone is busy selling — the for i in range line itself — is one line out of the whole function, and it is the only line nobody ever had to think hard about.

where the money goes once the model is free: 2024 vs 20262024 — model tokens were the billmodel inferenceloop mgmt2026 — model near-free, the LOOP is the bill (Uber capped engineers at $1,500/mo/tool)modeliterations, re-runs, validator calls, runaway spins — the part the slogan ignores

What I'd tell you instead

So here, after the weekend and the bill that ended it, is the version of the advice I'm willing to put my name on. Yes, stop being the thing inside the loop typing prompts all day; that much of it is genuinely right and I won't pretend otherwise. But the work that replaces you was never the writing of the loop — it is building the thing the loop has to answer to. You can confirm this just by watching the people who do it well rather than the people who tweet about it: Dan Kornas's roborev sits in the background reviewing every commit and feeding what it finds back into the agent while the context is still warm, and Steve Yegge's "Gas Town" runs twenty or thirty Claude instances under a single Mayor agent with patrol loops continuously auditing the work. In both, the impressive part is never the loop. It is the verification layer wearing the loop as a coat.

The honest version of the advice is longer than six words and far less fun to retweet, which is, I'd argue, exactly the evidence that it's the true one:

Don't write loops — write the thing that knows when to stop, because everything that matters lives inside done(), and done() is yours alone to build.

What I actually keep from the weekend

If you skip the rest and take only the receipts, take these. Each one cost me something over the weekend to learn, and each one is the thing I'd check first before I trusted a loop to run unattended.

  1. A loop without a non-model done() is a liability, not an agent. If the same model that did the work decides whether the work is finished, you've built a thing that grades its own homework. Wire done() to your test suite, your linter, your type checker, an exit code — something with ground truth the model cannot flatter. The single highest-leverage line in the whole system is the one the slogan never mentions.
  2. Set three hard stops before you set anything else: max iterations, no-progress detection, and a dollar ceiling. A loop with no halting logic doesn't stop being wrong, it stops being affordable — that's what Uber's $1,500-per-month cap looks like translated onto a single laptop. Cap iterations so it halts, hash the state across turns so a no-op spin trips Stuck, and kill the run the moment it crosses your budget. These aren't polish; they're the contract that makes autonomy safe to leave alone.
  3. Persist state to git every iteration, not at the end. The failure that hurt most wasn't a bad diff — it was a crash at hour six that threw away five good hours because state lived in memory. Checkpoint after every turn so a restart resumes instead of restarting. Durable-by-default is the difference between a long run and a long, repeated run.
  4. Invest in named, tested skills, not better prompts. A loop calling run_tests(), open_pr_with_context(), classify_build_failure() compounds: each skill gets sharper, and every loop that calls it gets better for free. A loop wrapping a bare while True around an open-ended prompt drifts, burns context, and dies around turn thirty. The skill is the asset; the loop is just the thing that calls it.
  5. Measure the loop, not the model. Track iterations-per-task, dollars-per-task, and the gap between what the loop reported done and what was actually green. When that gap is non-zero, your done() is lying and the dashboard is the last place you'll notice. Watch the loop's own metrics or the bill will be your monitoring.

I went looking for a shortcut and found a craft instead, and I've made my peace with that. The loop is the on-ramp, and it's a generous one — it's a slash command, you'll have it working before lunch, and you should. But the verification layer underneath it is the rest of your career, and no phrase short enough to go viral is ever going to hand it to you. That's not a complaint about the trend; it's the reason the trend is worth your time at all. Spend your weekend the way I did, by all means — just spend it on the ninety percent that went conveniently missing somewhere on the road to two and a half million views.

← Browse all