Launch ⏱️ 6 min read

Introducing /one-shot-orchestra — Our Most Powerful Skill Yet

TL;DR

🚀 What it is: A Claude skill that acts as a conductor — spawning parallel Claude processes, each with its own fresh 1M-token memory
🎼 How it works: Conductor writes briefs → launches worker terminals → reads back result.json → synthesises delivery
Why it matters: Big builds stop hitting the memory ceiling. Each worker starts empty, gets exactly what it needs, and reports back
VARIANT CLOUD · v0.5 → v0.6 v0.5aDROP v0.5bDROP v0.5cKEEP v0.5dDROP v0.5-rc1KEEP v0.6-expDROP v0.6-forkDROP v0.6-rc1KEEP v0.6-rc2DROP v0.6-prune-testDROP v0.6-mergeKEEP v0.6-rc3DROP v0.6-spikeDROP v0.6-freezeKEEP v0.6.5-testDROP v0.6-finalKEEP AUDIT 1 · Audit tag KEEP/DROP PRUNE 2 · Prune drop dead variants MERGE 3 · Merge splice the wins FREEZE 4 · Freeze lock canonical SHIP 5 · Ship v0.7 released v0.7 v0.7 SHIPPED VARIANT CLOUD · MERGED v0.5c v0.5-rc1 v0.6-rc1 v0.6-merge v0.6-freeze v0.6-final AUDIT1 · Audittag KEEP/DROP PRUNE2 · Prunedrop dead variants MERGE3 · Mergesplice the wins FREEZE4 · Freezelock canonical SHIP5 · Shipv0.7 released
EVOLUTION CLEANUP · FIVE PHASES · ONE CANONICAL SKILL

🚀 The Launch

Today we're shipping /one-shot-orchestra — the biggest architectural jump we've made on the max-effort build skill. It's live inside Godmode now.

Every prior version (/one-shot, /one-shot-beta, /one-shot-scripts) was a single Claude instance running a long protocol in one working memory. Orchestra is different: it's a conductor pattern. One Claude orchestrates, many Claudes build.

Analogy: /one-shot-scripts was one chef cooking every dish in sequence at a single workbench. /one-shot-orchestra is a head chef running a kitchen of line cooks — each at their own station, each given a written order, plating back to the pass when done.

🔎 Why We Built It

Claude's working memory is large but finite. When the main instance has loaded the skill, read the project, made a plan, spawned sub-agents, and started building — memory fills up. On truly big builds, quality drops before the work is done.

The obvious fix is parallelism. But the in-session Agent tool still runs inside the conductor's memory. Sub-agents help, but they don't buy you more memory — they share the same ceiling.

Orchestra breaks the ceiling by launching genuinely separate Claude Code processes in their own terminal windows. Each one boots with a fresh ~1M-token context. None of them inherit the conductor's loaded state.

⚙️ How It Works Under the Hood

Orchestra's execution loop is a file-based protocol. No sockets, no message bus — just directories and JSON files that both sides can read.

👤 User invokes /one-shot-orchestra build X

🎼 Conductor plans — decomposes the task, writes one brief per worker

🖥️ orchestra-spawn.sh launches a mintty terminal per brief

🔧 Each terminal runs claude fresh, loads its brief, works in isolation

📄 Worker writes work/result-<Name>.json — status, artefacts, notes

👻 orchestra-wait-and-kill.sh polls for result, then force-closes the terminal

🎯 Conductor synthesises — reads all results, verifies, delivers
CONDUCTOR lean · 1M ctx Builder-A [fresh 1M] Judge-B [fresh 1M] Polish-C [fresh 1M] Verify-D [fresh 1M]
FILE-BASED PROTOCOL · live

Every worker gets a shared run directory: a sandbox containing briefs/, work/, media/, a live chat.md, and a rolling telemetry.jsonl. You can watch the whole orchestra cooperate in real time by tailing those files.

🧠 Fresh Memory Is the Point

Every token a Claude instance reads costs working memory. By the time a skill has loaded, the project has been scanned, and the plan has been drafted, a meaningful chunk is already spent on knowing what to do instead of doing it.

A fresh-spawn worker starts empty. It gets a one-page brief describing only what it needs. The rest of its memory is free for the actual work — reading files, running tests, building output.

🧱

Before (single process)

Plan + execution share one 1M-token memory. Large plans eat into the space available for the build itself.

🎼

After (orchestra)

Conductor holds the plan. Each worker holds only its slice. N workers × 1M tokens of useful capacity.

🎭 Two Modes, One Conductor

Not every task wants a fresh spawn. Some work benefits from a collaborator who's already read the plan. Orchestra supports both modes and has a written rule for when to pick which.

Fresh-spawn (mintty)

Blind judges, heavy independent file reads, parallel builders that shouldn't share plans. Full 1M memory per worker.

In-session Agent

Critic passes, quick scans, teamwork that needs the conductor's context. Shares memory, starts instantly, no terminal overhead.

The decision table lives at the top of orchestra-protocol.md. Every phase script points to it — so when the conductor hits "need a reviewer here," it knows exactly which mode fits.

🔁 The Worker Contract

Every worker, regardless of mode, honours the same contract. When it's done, it writes one JSON file. That's it.

{
  "status": "success",
  "worker": "BuilderA",
  "artifact_paths": ["work/variant-A.txt"],
  "notes": "Drafted a single-sentence pitch, no markdown.",
  "token_usage": { "total": 48213 }
}
work/result-Builder-A.jsonhover keys
{
  "status"status · polled by orchestra-wait-and-kill.sh in a loop. The instant this field appears, the conductor terminates the worker terminal.: "success",
  "worker"worker · the brief↔result match key. Conductor uses it to find which brief just completed and route output to the right downstream phase.: "Builder-A",
  "artifact_paths"artifact_paths · file paths the conductor reads next. Verifier and Polisher phases pick up exactly these files; nothing else is opened.: ["work/variant-A.txt"],
  "notes"notes · free-form summary spliced into chat.md so the human watching the live feed sees what the worker did without reading every artefact.: "Drafted a single-sentence pitch.",
  "token_usage"token_usage · written to telemetry.jsonl for the audit trail. Used by /godmode-evolution to grade efficiency dim across runs.: { "total": 48213 }
}
conductor.poll()

The conductor waits on that file, reads it, kills the worker's terminal, and moves on. No ambiguity about "is it done yet?" — the file either exists or it doesn't.

🧪 Smoke-Tested On Arrival

Orchestra ships with a test/TEST-BENCH.md containing three real smoke tests. We ran them before flipping the skill to live.

SmokeWhat it proves
1. Single-worker spawn-and-killFresh spawn works, result.json gets read, terminal gets killed, telemetry logged.
2. Two builders + judgeParallel spawn + shared workspace + result-JSON hand-off (judge reads both builders' outputs).
3. Timeout & killA worker that intentionally hangs gets killed at the deadline with telemetry flagged.

🧬 Audited By Evolution

Every new skill we ship gets graded by /godmode-evolution — the meta-skill whose entire job is making other skills better. Orchestra's scorecard:

DimensionScore
Contradiction removal0.95
Decision clarity0.92
Doc coherence0.90
Delegation hygiene0.88
Smoke test coverage0.85
Backwards compat0.90
File size discipline0.90
Mutation scope0.93
Process quality0.92
Polish0.88
Composite0.903 / 1.00
/godmode-evolution · scorecard · published verbatim
Contradiction removal
0.95
Contradiction removal — are skill rules internally consistent? Counted contradictions before vs after. 19 → 1.
Decision clarity
0.92
Decision clarity — can a worker pick a path without re-reading? Counted vague phrases ("maybe", "consider") in the protocol.
Doc coherence
0.90
Doc coherence — do the docs cover the actual surface? Counted undocumented commands and dead links in orchestra-protocol.md.
Delegation hygiene
0.88 PUBLISHED ANYWAY
Delegation hygiene — are agent calls scoped + briefed? Solid at the protocol level, but a couple of phase scripts still under-brief their workers. Logged for the next loop.
Smoke test coverage
0.85 PUBLISHED ANYWAY
Smoke test coverage — does TEST-BENCH.md exercise the protocol? Golden path is covered; timeout/backoff edge cases aren't yet. Lowest dim, shipped anyway.
Backwards compat
0.90
Backwards compat — old skill commands still work? Counted breaking changes in command surface vs prior /one-shot-scripts release.
File size discipline
0.90
File size discipline — are skill files under the size budget? Counted any phase script > 800 lines (the soft cap).
Mutation scope
0.93
Mutation scope — did edits touch only what they needed? Counted collateral edits per commit; tight scoping correlates with lower regression rates.
Process quality
0.92
Process quality — did the conductor follow protocol on its own build? Audited from telemetry — every spawn, every kill, every result-read.
Polish
0.88
Polish — filenames, headings, prose. Counted typos and dangling cross-refs across the skill surface.

Composite
0.903
Composite — weighted mean across all 10 dims. Threshold for "ship" is 0.85; this run cleared at 0.903 with the two weakest dims printed in plain sight.
Published anyway: Smoke test coverage (0.85) + Delegation hygiene (0.88) are the honest ones. Both are logged for the next evolution loop. That's the difference between a launch and a pitch deck.

The two lowest scores are the honest ones. Smoke tests were written and executed on the golden path but not every timeout/backoff edge case. Delegation hygiene is solid at the protocol level but we'd still like deeper integration across a couple of phase scripts. Both are logged for the next loop.

Why publish the weak dims: if a rubric only grades what shipped, it learns to ship less and say more. Evolution grades the gaps too — and we print them. It's the difference between a launch and a pitch deck.

📦 What You Get When You Use It

Run Orchestra On Your Next Big Build

Ships inside Godmode. Works with Claude Code on Windows, macOS, and Linux.

Get Godmode See How Evolution Grades It →