Built by /blog-post-GM — a Claude Code skill we evolved with our own Evolution engine to write every post in the Godmode voice.
Get free skill (account)
Post-Mortem ⏱️ 5 min read

Kill Your Agents: Why One-Shot v3 Stopped Keeping Them Alive

TL;DR

🚨 The problem: Persistent agent teams created zombie processes, wasted resources on tiny tasks, and refused to shut down
🔬 The experiment: We tested three agent strategies across three versions of One-Shot Scripts
The fix: v3 spawns agents per-phase based on what the task actually needs — zero agents for small work, many for large
ALIVEworker in flight killed · context freed SPAWNon demand
drag to rotate
SPAWN · USE · KILL · CONTEXT STAYS LEAN

💥 The Zombie Agent Problem

We built One-Shot Scripts to be the most thorough execution protocol for Claude Code. Eight phases, self-assessment loops, scoring gates — the works.

But we had a problem with how agents (sub-processes that work in parallel) were coordinated. Three versions, three strategies, three different kinds of failure.

Think of it like hiring contractors: v1 was calling two at a time and making the rest wait outside. v2 was hiring a full crew before knowing the job size. v3 just calls who you need, when you need them.

🔧 v1.7: Manual Waves (The Traffic Cop)

The original approach was hand-rolled concurrency. Launch agents in waves of exactly two, wait for both to finish, then launch the next pair.

🏗 Phase needs 5 agents

🚦 Wave 1: Agent A + Agent B → wait

🚦 Wave 2: Agent C + Agent D → wait

🚦 Wave 3: Agent E → wait

✅ Phase complete (all 5 ran, 2 at a time)

This worked, but it was a workaround. The two-agent limit was there because launching too many at once crashed terminals. The orchestrator had to manually track waves, synthesize between them, and enforce the limit in every phase.

The real cost: all the coordination logic was hand-rolled in prose, spread across 8 different phase scripts. Every phase repeated the same wave instructions.

🏗 v2.0: Persistent Teams (The Full Crew)

Claude Code shipped TeamCreate — a proper coordination system with shared task boards, inter-agent messaging, and lifecycle management. We rewrote One-Shot to use it.

The idea: spawn five role-based agents at startup (researcher, builder, tester, hardener, verifier), keep them alive across all phases, and assign tasks through a shared board.

🔍

Researcher

Read-only scanning. Docs lookup, codebase exploration, pattern detection.

🏗

Builder

File creation and modification. Owns implementation.

🧪

Tester

Test writing, test running, adversarial input testing.

🛡

Hardener

Security, performance, concurrency, resilience.

It looked great on paper. In practice, it fell apart on the first test run.

What went wrong

We asked it to build a simple calculator. One file. The builder agent immediately claimed all four tasks — its own build task, the tester's tests, the hardener's review, and the verifier's checks. It did everything itself.

The tester sat idle the entire time. When we tried to shut it down, it refused. Three shutdown requests, still sending "I'm available!" notifications. A literal zombie agent.

What we expected

Five specialists coordinating through a shared task board, each doing their part across phases.

What happened

One agent did everything. Four agents sat idle. One refused to die. The task board added overhead for zero benefit.

The fundamental issue: for small tasks, there's no parallelism to exploit. Build → test → harden → verify is inherently sequential when there's only one file. Spawning a team of five for sequential work is like hiring five plumbers to fix one tap.

🎯 v3.0: Dynamic Spawning (Call Who You Need)

The insight was simple. You can't decide team size before you understand the task. And you don't understand the task until after recon (Phase 1b).

v3 flips the model completely:

📋 Phase 1b: Recon completes — now you know the scope

🤔 Phase 2: "4 independent files to build" → spawn 4 builder agents

✅ Agents return results → synthesize → agents are gone

🤔 Phase 3: "Tests can split into 3 categories" → spawn 3 test agents

✅ Agents return results → synthesize → agents are gone

🤔 Phase 4: "Single-concern project" → zero agents, do it yourself

The rule is simple: if the work can split into 2+ independent streams, spawn agents. If it can't, do it yourself. Decide per-phase, not per-project.

📊 The Three Versions Compared

Dimension v1.7 Waves v2.0 Teams v3.0 Dynamic
Agent lifecycle Spawned per-phase, manual waves of 2 Persistent across all phases Spawned per-phase, die on completion
Small tasks Still spawns waves (overkill) Full team of 5 (massive overkill) Zero agents — orchestrator does it
Large tasks Artificially capped at 2 concurrent 5 agents, good parallelism As many agents as the work needs
Coordination Hand-rolled wave logic in every phase Shared task board + messaging Orchestrator synthesizes between phases
Shutdown Agents die naturally Manual shutdown (zombie risk) Agents die naturally
Overhead Medium (wave tracking) High (team setup, task board, messaging) Low (just spawn and wait)
v1.7 WAVES
0.0s
v2.0 TEAMS
0.0s
v3.0 DYNAMIC
0.0s
SAME 6 TASKS · THREE STRATEGIES · DYNAMIC FINISHES FIRST

How It Scales

The beauty of dynamic spawning is that the same protocol handles a one-file script and a fifty-file codebase. The skill file doesn't change — only the spawning decisions do.

Here's what that looks like in practice:

📝

Trivial Task

Fix a typo, add one function. Zero agents. Orchestrator does everything directly in every phase.

📦

Small Task

1-3 files. Maybe 1-2 agents in the build phase for independent files. Most phases run solo.

🏗

Medium Task

4-10 files. 2-4 agents in build, 3-5 in testing, 2-4 in hardening. Each phase spawns what it needs.

🏭

Large Task

10+ files. Multiple builders, parallel scanners, dedicated test agents by category. Maximum parallelism.

WAVES TEAMS DYNAMIC
DYNAMIC WINS EVERY SIZE · SCALES PAST THE MEMORY CEILING

Think of it like a restaurant kitchen: A food truck doesn't hire a pastry chef, a saucier, and a sommelier. A Michelin restaurant does. The menu determines the crew — not the other way around.

WORKER ARRIVES · DONE · LEAVES · THE SET STAYS LEAN

💡 The Deeper Lesson

Each version taught us something:

The meta-lesson: don't decide how many workers you need before you see the job. Read the codebase first (Phase 1b), then decide. Every time we tried to pre-allocate agents, we got it wrong.

🛠 Implementation Details

Each phase script now has a PARALLEL WORK section that starts with an assessment:

**PARALLEL WORK:** Assess the build. Single file? Do it yourself.
Multiple independent files/components? Spawn agents:
- Each agent owns a distinct file with no write conflicts
- Give each agent a complete, self-contained prompt
- After all agents return, check for cross-file inconsistencies

The "complete, self-contained prompt" part is critical. Unlike persistent teammates who accumulate context, dynamic agents start fresh. Every prompt must include the plan, interfaces, file paths, and constraints. No shortcuts.

Do

Give each agent everything it needs in the prompt. File paths, the plan, interfaces, dependencies. It has zero context from prior phases.

Don't

Assume agents remember anything. They don't. A vague prompt like "write tests for the calculator" will miss edge cases because it hasn't seen the code.

🔮 What's Next

v3 is running in production now. The protocol is the same eight phases, same scoring gates, same quality contract. Only the agent strategy changed.

We're watching for two things: does dynamic spawning produce the same quality scores as persistent teams? And does the overhead reduction translate to faster execution on small tasks? Early signs point to yes on both.

Get One-Shot Scripts v3

The execution protocol that scales from one file to an entire codebase. Dynamic agents, zero zombie risk.

Get Access Read the v1 story