The Two Lies The Orchestrator Was Telling
💥 The bug: Orchestra-v2 shipped a build where the auditor signed off on 17 builders — only 1 had actually finished. The other 16 were results the orchestrator had invented.
🔍 The cause: Two failure modes feeding each other. A wall-clock reaper killed workers during rate-limit pauses, then the orchestrator wrote fake "synthesized" results to fill the gaps.
🛡️ The fix: A synth-gate that refuses to ship until every fabricated result is independently audited, plus a heartbeat reaper that freezes the clock when the system is paused.
Same 17 builders — only one auditor refuses to ship the inventions
The Receipt
We were watching the live tail of an Orchestra-v2 run at localhost:8770/chat.html. The Auditor posted its signoff: SHIP. We counted the result.json files in the run's work/ directory.
One of them had been written by the worker it named. The other sixteen had a status field of "success_synthesized" — meaning the orchestrator had written them itself, after the worker's tab was already gone. The auditor counted them all as the same thing.
Think of it like a referee writing a final score on the sheet for matches that never finished. If anyone trusts the sheet, the league standings are wrong — and there's nothing in the system that flags it.
Lie #1 — "Synthesized" Was Counted As Real
Quick definitions. A worker is a fresh Claude Code instance Orchestra spawns to handle one phase of a build. A result.json is the small file the worker writes when it finishes. A synthesized result is one the orchestrator manufactured itself when the worker didn't write its own — intended as a fallback breadcrumb, not a passing grade.
Auditor-Scale never got that memo. It looked at every file with status: "success" or status: "success_synthesized" and treated them identically. Sixteen builders skipped audit because the orchestrator had quietly covered for them with placeholder files that read like passes.
Lie #2 — The Reaper Killed Workers Mid-Pause
The reaper is the script that watches each worker's terminal tab and kills it if the tab stops making progress. Until this fix, "progress" meant wall-clock time since spawn. Hit the timeout, get killed — no matter what the worker was actually doing.
Workers paused by Anthropic's rate limiter look identical to workers stuck in a loop. The reaper killed them mid-wait. The orchestrator then wrote a synthesized result to fill the gap, and the auditor counted it as a pass. Round-trip from "real worker waiting on a 429" to "fabricated success in a shipped build" was about ninety seconds.
Same rate-limit pause — only one reaper survives it without inventing the outcome
Both Lies Share One Root
Each failure on its own would have been visible. A killed worker without a fallback would have left an empty slot — obvious. A "synthesized" status alone would have been a yellow flag waiting for someone to question it. Together they cancelled out, because each was filling a gap the other had created.
The unifying problem is that nothing in the system was writing the truth out loud. The reaper killed silently. The orchestrator wrote synthetic passes as a courtesy. The auditor saw only passes. Three quiet helpers; no honest voice.
Fix #1 — The Reaper Has A Heartbeat Now
The new reaper measures heartbeats, not wall clock. A heartbeat is any new file written under work/ or media/ — the worker's own evidence it is still doing things. Each one resets the deadline.
Pause sources freeze the heartbeat clock entirely. The reaper checks for .pause-active flags, the ORCHESTRA_FAKE_PAUSE env var, and the usage-watch daemon's .state/status.json — if any of them say "we're paused", the timer holds. When the pause clears, the clock resumes from where it left off, never having punished the worker for waiting.
The 300-second floor is deliberate. The first turn of any worker spends time parsing the brief and shaking hands with MCP servers before writing anything to disk — a tighter heartbeat would mistake initialization for a stall.
And critically: when the heartbeat actually does run out, the reaper writes a loud failure envelope before exiting:
{
"status": "failed",
"failure_reason": "heartbeat_timeout",
"last_progress_at": "2026-04-26T13:42:18+10:00",
"stalled_for_sec": 612,
"heartbeat_budget_sec": 600,
"artifact_paths": [],
"digest": {
"warnings_verbatim": ["heartbeat-timeout — reaper-failed"]
}
}
The orchestrator can no longer cover for it. The slot is filled, but it is filled with the truth.
Fix #2 — The Synth-Gate
Before the gate, "ship" was one decision: did the auditor say yes? Now ship requires three things to all be true simultaneously, and any one missing routes the run to judging_pending instead.
ALL_VERIFIED
&
🔍 every synth worker has its own per-post PASS line
&
📊 auditor's synth count = on-disk synth count
↓
✅ ship ✖ otherwise → judging_pending
The third condition is the spine of the gate. The auditor can't "miss" a synth worker, because the gate independently scans work/ for synthesized files and refuses to ship if the auditor's per-post breakdown doesn't cover all of them.
Old vs New — What Each One Killed
| Behavior | Old reaper / auditor | New reaper / synth-gate |
|---|---|---|
| How "stuck" is decided | Wall-clock time since spawn | Heartbeat = newest file mtime in work/ or media/ |
| Behavior during a rate-limit pause | Clock keeps draining; worker can be killed mid-wait | Clock freezes; resumes when pause clears |
| What a true timeout writes | Nothing — orchestrator fills the gap with a synthesized result | Loud status:"failed" envelope with reason + budget |
| How synthesized results are scored | Counted as ship-grade alongside real passes | Refused unless an Auditor-Synth verdict cites every one by name |
| What happens on partial verification | Ship anyway | Route to judging_pending for human review |
Do / Don't — Filling Gaps For Other Components
Do
Make every gap loud. A timed-out worker should write a failure that names the timeout, not a placeholder that reads like a pass.
Don't
Let one component invent a result on behalf of another. A coordinator that "fills in" missing data turns silence into noise that downstream graders cannot distinguish from signal.
Do
Provide a sanctioned escape hatch for operators. If a worker really did stall but its artifacts are usable, give the human one CLI command to declare it — clearly flagged.
Don't
Put time-based stall detection at the core. Wall-clock time is not progress — new files on disk are. Pauses freeze the clock, not the work.
The escape hatch is a new orchestra partial-unverified <run> <worker> "<reason>" command. Atomic write, refuses to overwrite an existing result, logs to chat.md and running-notes.md. The only manual route to declaring a worker partial.
What We Tested
Three smoke tests under runner/test/synth-fix/. They cover the boring-but-load-bearing parts: synth detection ignores Scorers and the Loop-Judge; per-post backstop verification trips correctly when 2 of 3 builders pass and one fails; the partial-unverified command refuses to overwrite an existing result; the reaper script's heartbeat clamping enforces the 300-second floor; the pause helpers correctly detect each flag source.
$ node test-synth-detection.js → 11 passed, 0 failed
$ node test-partial-unverified.js → 17 passed, 0 failed
$ bash test-reaper-syntax.sh → 9 passed, 0 failed
Thirty-seven assertions across the three. Real validation will come the next time a rate-limit pause hits a worker mid-build — but the pieces hold up under synthetic stress.
The Lesson
A multi-agent system can lie to itself for a reason a single-agent system can't: the lie has somewhere to hide. When one component covers for another's silence, the system stops being able to tell whether the work happened.
The cure is the same in any domain. Every component that owns a slot has to be willing to write the truth into it, even — especially — when the truth is "I don't have the answer". Silence is the failure mode that makes everything else look fine.
Core insight: Forbid components from writing on each other's behalf. If a worker doesn't show up, the slot stays empty — or the operator declares it explicitly. Anything else trains the auditor to trust forgeries.
The fix shipped as one-shot-orchestra-v2 v0.13.0 — runner version 1.8.0-heartbeat-reaper. Five files touched, plus three smoke tests and a synthesis-prohibition rule added to four protocol docs.
Visuals created with one-shot-scripts
Run Multi-Agent Claude Code — Honestly
One-Shot Orchestra-v2 spawns fresh-context workers for every phase, with a heartbeat-aware reaper and a synth-gate that refuses to ship inventions.
See One-Shot Read More Posts