BlogTechnologyJune 25, 2026

The Hard Realities of Loop Engineering

Failure modes, token economics, and Souso-specific guardrails: what nobody tells you about unattended AI coding loops

Nicolás Torres

Loops are early. The economics and verification challenges are real. Every conversation about Loop Engineering tends to focus on the power, and it is powerful. But the people doing this work are also the ones who've learned the hard lessons. This post collects them, with mitigations specific to Souso, where we just built our first loop.

Haven't run the example yet? Post 4 has the clone-and-run walkthrough.

Top loop failure modes and their mitigations

Token Economics: The Hidden Cost

PR Babysitter (from the patterns post): polls open PRs every few minutes. Cheap when nothing is wrong; expensive on a busy branch because every run may spawn sub-agents.

CI Sweeper: only runs meaningful work when CI is red, but each incident can chain implementer + verifier + full test suite. That adds up fast at a 15-minute cadence.

Token costs vary wildly depending on design:

Loop	Cadence	Runs/day	Rough daily tokens
Daily Triage on Souso (L1)	1d	1	~50k
PR Babysitter (active `develop`)	5m	288	Very high
CI Sweeper (full implementer + verifier)	15m	96	~5M

A 5-minute loop that spawns implementer + verifier on every run will burn through a limited plan before breakfast. Triage should be cheap; sub-agents spawn only when state says actionable. An empty watchlist should exit in under 5,000 tokens.

Souso cost controls (encoded in loop-budget.md):

500k tokens/day cap → pause schedule on exceed
L1 only until one week of accurate triage
Log estimated tokens per run in loop-run-log.md
Active hours 08:00–20:00 Europe/Amsterdam

How to read severity labels (S1, S2, S3)

These are not the same as readiness levels (L1, L2, L3 from the patterns post). L = how much autonomy you gave the loop. S = how bad things get if a failure mode is left alone.

Label	Meaning	Example
S1	Early warning. Wasteful or annoying, but the loop is still basically under control.	Token bill creeping up; noisy Slack
S2	Active problem. The loop is misleading you, stuck, or burning resources. Fix this week.	Infinite fix retries; stale `STATE.md`
S3	Serious harm. Wrong code merged, denylisted paths touched, or production behaviour at risk. Stop the loop and roll back.	Auto-refactor on owned pricing code

An arrow like S1 → S2 means: starts mild if you catch it early; becomes S2 if you ignore it.

Failure Mode Catalog

This catalog adapts the failure modes and anti-patterns in Cobus Greyling's loop-engineering repo (operating docs, MIT). Mitigations below are Souso-specific.

Infinite Fix Loop (S2)

Symptom: Same PR or CI job gets 5+ fix attempts; never converges.
Souso mitigation: Hard cap of 3 attempts in loop-budget.md. Separate verifier model. For matcher eval failures, escalate. Don't let the loop rewrite src/lib/pricing/*.

State Rot (S1 → S2)

Symptom: STATE.md references merged PRs or closed issues. Loop acts on ghosts.
Souso mitigation: loop-triage skill requires pruning closed items every run. Validate issue numbers against gh issue view before acting.

Verifier Theatre (S2)

Symptom: Verifier "approves" but pnpm quality fails in CI.
Souso mitigation: Verifier must run pnpm test at minimum; for L3, full pnpm quality. Souso's pre-push hook already runs format + lint + typecheck + build + tests + matcher eval. The verifier should mirror that, not eyeball diffs.

Notification Fatigue (S1 → S2)

Symptom: Slack pings every 5 minutes; team mutes the bot.
Souso mitigation: L1 is silent (STATE.md only). Slack posts only on escalation to #loop-escalations, not every triage run.

Token Burn (S1)

Symptom: Bill spikes.
Souso mitigation: Daily Triage at 1d cadence. PR Babysitter deferred until L1 proven. loop-run-log.md catches creep early.

Over-Reach: Wrong Scope (S2 → S3)

Symptom: Loop refactors unrelated modules or touches denylisted paths.
Souso mitigation: loop-budget.md denylist: src/lib/pricing/*, week generation, migrations, auth, Mollie tipping. ship-flow-and-ownership skill: raise owned-flow bugs with a failing test, don't silently rewrite internals.

Comprehension Debt Spiral (S2)

Symptom: Velocity up; no one can explain recent changes.
Souso mitigation: Eval gates (pnpm eval, replan-agent eval, memory-classifier eval) exist to lock AI behaviour. A loop that ships code nobody understands bypasses those gates. Mandatory human review on non-trivial PRs. Weekly read of loop-run-log.md.

Cognitive Surrender (S2, Cultural)

Symptom: "The loop handles it." No opinions on correctness.
Souso mitigation: Success metric = time saved with quality bar held. pnpm quality + branch protection on develop and main are human gates the loop cannot override.

Parallel Collision (S2)

Symptom: Two sub-agents edit same files; merge conflicts.
Souso mitigation: isolation: worktree for all L2+ code edits. Queue ready-for-agent issues in STATE.md: one active worktree per issue.

Escalation Failure (S2)

Symptom: Loop stuck retrying; human never notified.
Souso mitigation: "Waiting on Human" section in STATE.md. Alert if item sits >24h. Slack connector on third failed attempt.

Anti-Patterns (Design Time)

Same agent implements and verifies: confirmation bias wins
No attempt cap: infinite fix loops on flaky CI
Vague triage output: paragraphs the loop can't parse; Souso requires one-liners
L3 before L1 quality: auto-fix on day one; measure triage accuracy first
Shared state without schema: three loops appending to unstructured STATE.md
MCP with write-everything scope: one bad triage decision, maximum blast radius
No kill switch: document pause in LOOP.md and loop-budget.md
Fixing flakes with code: masks infra problems; classify → quarantine → escalate
Auto-merge without allowlist: security bugs pass weak verifiers
No run log: can't debug "why did it touch pricing Tuesday?"

Safety & Guardrails on Souso

Path Denylist

Never auto-edit without human approval:

src/lib/pricing/* (matcher, ownership-sensitive)
Week generation / replan internals
.env, secrets, credentials
drizzle/migrations/*
auth/, Mollie tipping

Auto-Merge Policy

Default: no auto-merge. If you must at L3, allow only: typos in comments/docs, lint auto-fix in test files. Never: behaviour changes, dependency bumps, denylist paths.

MCP Least Privilege

Connector	Read	Write
GitHub	issues, PRs, checks	comment, label (not merge)
Linear	team issues	comment, status (not delete)
Slack	channel history	`#loop-escalations` only
D1 / prod DB	n/a	never from loops

Human Gates (Always Required on Souso)

Security, auth, payments, PII
Changes to owned AI flows
Dependency upgrades (Workers compat risk)
Changes touching >10 files
Third failed attempt on same item
Promotion develop → main (existing process unchanged)

The Most Important Lesson

"The same loop design can accelerate someone who stays the engineer, or let someone abdicate judgment entirely."

Addy Osmani

We built a Daily Triage loop on Souso because the issue queue is real and the stakes are real: a deployed app with lots of active users, strict TDD, and a repo that already moved fast (350+ PRs merged in a Megathon weekend). The loop surfaces work. It does not replace judgment, code review, or pnpm quality.

Build the loop like someone who intends to stay the engineer. Not just the person who presses go.

Sources: Addy Osmani's Loop Engineering. Failure modes, anti-patterns, and safety framing from Cobus Greyling's loop-engineering repo (MIT). Cobus Greyling's Loop Engineering essay.