Launch April 22, 2026 ⏱️ 6 min read

Introducing /one-shot-orchestra — Our Most Powerful Skill Yet

TL;DR

🚀 What it is: A Claude skill that acts as a conductor — spawning parallel Claude processes, each with its own fresh 1M-token memory
🎼 How it works: Conductor writes briefs → launches worker terminals → reads back result.json → synthesises delivery
✨ Why it matters: Big builds stop hitting the memory ceiling. Each worker starts empty, gets exactly what it needs, and reports back

EVOLUTION CLEANUP · FIVE PHASES · ONE CANONICAL SKILL

🚀 The Launch

Today we're shipping /one-shot-orchestra — the biggest architectural jump we've made on the max-effort build skill. It's live inside Godmode now.

Every prior version (/one-shot, /one-shot-beta, /one-shot-scripts) was a single Claude instance running a long protocol in one working memory. Orchestra is different: it's a conductor pattern. One Claude orchestrates, many Claudes build.

Analogy: /one-shot-scripts was one chef cooking every dish in sequence at a single workbench. /one-shot-orchestra is a head chef running a kitchen of line cooks — each at their own station, each given a written order, plating back to the pass when done.

🔎 Why We Built It

Claude's working memory is large but finite. When the main instance has loaded the skill, read the project, made a plan, spawned sub-agents, and started building — memory fills up. On truly big builds, quality drops before the work is done.

The obvious fix is parallelism. But the in-session Agent tool still runs inside the conductor's memory. Sub-agents help, but they don't buy you more memory — they share the same ceiling.

Orchestra breaks the ceiling by launching genuinely separate Claude Code processes in their own terminal windows. Each one boots with a fresh ~1M-token context. None of them inherit the conductor's loaded state.

⚙️ How It Works Under the Hood

Orchestra's execution loop is a file-based protocol. No sockets, no message bus — just directories and JSON files that both sides can read.

👤 User invokes /one-shot-orchestra build X
↓
🎼 Conductor plans — decomposes the task, writes one brief per worker
↓
🖥️ orchestra-spawn.sh launches a mintty terminal per brief
↓
🔧 Each terminal runs claude fresh, loads its brief, works in isolation
↓
📄 Worker writes work/result-<Name>.json — status, artefacts, notes
↓
👻 orchestra-wait-and-kill.sh polls for result, then force-closes the terminal
↓
🎯 Conductor synthesises — reads all results, verifies, delivers

FILE-BASED PROTOCOL · live

Every worker gets a shared run directory: a sandbox containing briefs/, work/, media/, a live chat.md, and a rolling telemetry.jsonl. You can watch the whole orchestra cooperate in real time by tailing those files.

🧠 Fresh Memory Is the Point

Every token a Claude instance reads costs working memory. By the time a skill has loaded, the project has been scanned, and the plan has been drafted, a meaningful chunk is already spent on knowing what to do instead of doing it.

A fresh-spawn worker starts empty. It gets a one-page brief describing only what it needs. The rest of its memory is free for the actual work — reading files, running tests, building output.

🧱

Before (single process)

Plan + execution share one 1M-token memory. Large plans eat into the space available for the build itself.

🎼

After (orchestra)

Conductor holds the plan. Each worker holds only its slice. N workers × 1M tokens of useful capacity.

🎭 Two Modes, One Conductor

Not every task wants a fresh spawn. Some work benefits from a collaborator who's already read the plan. Orchestra supports both modes and has a written rule for when to pick which.

Fresh-spawn (mintty)

Blind judges, heavy independent file reads, parallel builders that shouldn't share plans. Full 1M memory per worker.

In-session Agent

Critic passes, quick scans, teamwork that needs the conductor's context. Shares memory, starts instantly, no terminal overhead.

The decision table lives at the top of orchestra-protocol.md. Every phase script points to it — so when the conductor hits "need a reviewer here," it knows exactly which mode fits.

🔁 The Worker Contract

Every worker, regardless of mode, honours the same contract. When it's done, it writes one JSON file. That's it.

{
  "status": "success",
  "worker": "BuilderA",
  "artifact_paths": ["work/variant-A.txt"],
  "notes": "Drafted a single-sentence pitch, no markdown.",
  "token_usage": { "total": 48213 }
}

work/result-Builder-A.jsonhover keys

{
  "status"status · polled by orchestra-wait-and-kill.sh in a loop. The instant this field appears, the conductor terminates the worker terminal.: "success",
  "worker"worker · the brief↔result match key. Conductor uses it to find which brief just completed and route output to the right downstream phase.: "Builder-A",
  "artifact_paths"artifact_paths · file paths the conductor reads next. Verifier and Polisher phases pick up exactly these files; nothing else is opened.: ["work/variant-A.txt"],
  "notes"notes · free-form summary spliced into chat.md so the human watching the live feed sees what the worker did without reading every artefact.: "Drafted a single-sentence pitch.",
  "token_usage"token_usage · written to telemetry.jsonl for the audit trail. Used by /godmode-evolution to grade efficiency dim across runs.: { "total": 48213 }
}

conductor.poll()

The conductor waits on that file, reads it, kills the worker's terminal, and moves on. No ambiguity about "is it done yet?" — the file either exists or it doesn't.

🧪 Smoke-Tested On Arrival

Orchestra ships with a test/TEST-BENCH.md containing three real smoke tests. We ran them before flipping the skill to live.

Smoke	What it proves
1. Single-worker spawn-and-kill	Fresh spawn works, `result.json` gets read, terminal gets killed, telemetry logged.
2. Two builders + judge	Parallel spawn + shared workspace + result-JSON hand-off (judge reads both builders' outputs).
3. Timeout & kill	A worker that intentionally hangs gets killed at the deadline with telemetry flagged.

🧬 Audited By Evolution

Every new skill we ship gets graded by /godmode-evolution — the meta-skill whose entire job is making other skills better. Orchestra's scorecard:

Dimension	Score
Contradiction removal	0.95
Decision clarity	0.92
Doc coherence	0.90
Delegation hygiene	0.88
Smoke test coverage	0.85
Backwards compat	0.90
File size discipline	0.90
Mutation scope	0.93
Process quality	0.92
Polish	0.88
Composite	0.903 / 1.00

/godmode-evolution · scorecard · published verbatim

Contradiction removal

0.95

Contradiction removal — are skill rules internally consistent? Counted contradictions before vs after. 19 → 1.

Decision clarity

0.92

Decision clarity — can a worker pick a path without re-reading? Counted vague phrases ("maybe", "consider") in the protocol.

Doc coherence

0.90

Doc coherence — do the docs cover the actual surface? Counted undocumented commands and dead links in orchestra-protocol.md.

Delegation hygiene

0.88 PUBLISHED ANYWAY

Delegation hygiene — are agent calls scoped + briefed? Solid at the protocol level, but a couple of phase scripts still under-brief their workers. Logged for the next loop.

Smoke test coverage

0.85 PUBLISHED ANYWAY

Smoke test coverage — does TEST-BENCH.md exercise the protocol? Golden path is covered; timeout/backoff edge cases aren't yet. Lowest dim, shipped anyway.

Backwards compat

0.90

Backwards compat — old skill commands still work? Counted breaking changes in command surface vs prior /one-shot-scripts release.

File size discipline

0.90

File size discipline — are skill files under the size budget? Counted any phase script > 800 lines (the soft cap).

Mutation scope

0.93

Mutation scope — did edits touch only what they needed? Counted collateral edits per commit; tight scoping correlates with lower regression rates.

Process quality

0.92

Process quality — did the conductor follow protocol on its own build? Audited from telemetry — every spawn, every kill, every result-read.

Polish

0.88

Polish — filenames, headings, prose. Counted typos and dangling cross-refs across the skill surface.

Composite

0.903

Composite — weighted mean across all 10 dims. Threshold for "ship" is 0.85; this run cleared at 0.903 with the two weakest dims printed in plain sight.

The two lowest scores are the honest ones. Smoke tests were written and executed on the golden path but not every timeout/backoff edge case. Delegation hygiene is solid at the protocol level but we'd still like deeper integration across a couple of phase scripts. Both are logged for the next loop.

Why publish the weak dims: if a rubric only grades what shipped, it learns to ship less and say more. Evolution grades the gaps too — and we print them. It's the difference between a launch and a pitch deck.

📦 What You Get When You Use It

Bigger builds without memory drops — fresh-spawn workers let Orchestra attack work that would have stalled a single-process skill.
Live visibility — chat.md updates as workers sign off. You can watch the build happen in any markdown viewer.
Two-mode flexibility — fresh-spawn when isolation matters, in-session agent when speed does.
Deterministic hand-off — every worker writes result.json. No guessing, no polling for "is it done."
Auditable — telemetry for every spawn, every kill, every result, in one jsonl you can replay.

Run Orchestra On Your Next Big Build

Ships inside Godmode. Works with Claude Code on Windows, macOS, and Linux.

Get Godmode See How Evolution Grades It →

← The Gallery Opens All Posts →