Guides, deep dives, and execution strategies for Claude Code.
A researcher named David quietly disclosed that four paid product downloads were sitting unauthenticated under getgodmode.dev/downloads/* — anyone with the filenames could grab them. Closing it took pulling the files into a Cloudflare KV bucket and a 60-line Worker that only serves them via 300-second HMAC path-signed URLs. We then audited the rest of the site (0 critical, 0 high, 2 medium hardenings logged), shipped a thank-you package of /one-shot-scripts to David, and built an interactive gate-simulator so you can flip the four checks (path, file, expiry, signature) and watch the verdict change in real time.
Every spawned orchestra worker inherited every installed skill and every active tool server — even when the task only needed two of them. A worker writing a haiku was opening a 50,000-token toolbox to do it. We split skills into two tiers (globals always loaded, library lazy-staged), and bolted a tiny analyzer worker in front of every spawn that reads the brief and returns a picklist: just these skills, just these MCPs, this thinking budget. Workers stay lean. The skill collection can grow into the hundreds without bloating any single session.
The orchestra tree viewer was a sterile log — coloured dots, status labels, no reason to keep watching. We gamified it: judge dim scores became points, achievements (FIRST_BLOOD, SPEED_DEMON, ONE_SHOT, FLAWLESS_VICTORY) fire as workers finish, killstreak banners (DOUBLE KILL! → GODLIKE!) slide in MOBA-style, and the whole thing pulses neon synthwave. Two bugs caught and fixed mid-build — both flashing-text overlays were overflowing narrow viewports until clamp() rescued them. Watching agents work now feels like Halo, and the agents see the points table in their briefs so they actually try to score.
A Windows host quietly accumulated 151 idle node.exe processes in 24 hours — orphaned MCP children of dead Claude sessions. Job Objects can't catch them because wt.exe new-tab re-parents new tabs under WindowsTerminal.exe, walking the chain out from under any handle you held. Fix is a 90-line PowerShell sweep on a 15-minute schedule, signature-matched to MCP command lines, bundled into the orchestra skill so it auto-installs on first run.
Orchestra-v2 shipped a build where the auditor signed off on 17 builders — only 1 had actually finished. The other 16 were results the orchestrator had invented after a wall-clock reaper killed workers mid-rate-limit-pause. Two fixes: a synth-gate that won't ship inventions, and a heartbeat reaper that freezes the clock when the system is paused.
48 blog posts now ship 4 curated animations each, built via one-shot-orchestra in a 5-post pilot then a 43-post scale-out across 4 parallel waves. A second auditor opened every post in real Chromium and caught a runtime bug that static screenshots had missed entirely. One cellsLayout[-1] fix later, all 48 ship clean.
Two orchestra workers got force-killed mid-conversation when the user re-engaged after a completed phase. The reaper used result.json existing as the kill signal — it didn't notice the human had taken the mic back. The fix is a 3-second CPU sample.
Orchestra v2 ran a 5-phase route, shipped 40 modified blog posts on the first pass, then looped three times against a 0.99 ship bar nothing realistic could clear. Two of the loops were procedural no-ops. The third was a real lift. We lowered the bar to 0.90.
A lowercase phase name in a Loop-Judge verdict shipped Orchestra at first-pass quality. The fix was a case-insensitive resolver. The real work was lifting the rubric — ambition score, tightened Builder brief, new Polish phase — so the loop has something to find.
Orchestra v1 made the main Claude session conduct AND keep the books. v2 moves the orchestrator into a Node daemon. Every LLM call — phase work, scoring, judging — is now a fresh worker. The session shrinks to a narrator that drives a CLI.
Claude Code v2.1.113 reversed Backspace and Ctrl+Backspace on Windows Terminal. We traced it to swapped byte mappings, ruled out 5 dead ends, captured the raw hex proof, and posted the only working fix. Two lines of JSON.
We pointed One-Shot Orchestra at our own Evolution engine — the meta-skill that makes other skills better. It found 16 issues across 14 files, including 2 critical data-corruption risks. Plus 6 research-backed additions from AlphaEvolve and MAP-Elites. The result: Evolution v3.1.0.
Orchestra’s orchestrator was reading 90KB of phase scripts into its own working memory — instructions only workers need. The fix: pass scripts by file path in worker briefs. Result: 62% less context consumed, 96% less growth per loop iteration, and the orchestrator finally conducts without playing.
A skill called one-shot-orchestra was supposed to spawn real terminals. It didn’t. The user caught it. Here’s how we hardened the skill with spawn enforcement so it can’t cheat again.
Orchestra is a conductor. It spawns parallel Claude Code processes in their own terminals, each starting with a fresh 1M-token memory, each handed a written brief. They build in isolation, report back via result.json, and the conductor synthesises the final delivery. Inside: how the file-based protocol works, why fresh memory breaks the single-process ceiling, the two-mode design, and the Evolution audit that graded it 0.903.
Eight wings, one protocol. v4 expands art direction into generative systems, kinetic typography, data sculpture, shaders, motion, illustration, 3D environments, and audio-reactive visuals — each wing ships live inside the post with an interactive demo, a forecast ribbon, a palette, and a technique anatomy dissection.
10 interactive art pieces — voronoi stained glass, reaction-diffusion, aurora borealis, spirograph, doom fire, topographic terrain, kaleidoscope, starfield warp, cellular automata, watercolor bloom. Each uses a completely different rendering technique. The creative ambition upgrade in action.
The art direction problem is fixed. Visual build guidance, 2D/3D/animation checklists, responsive testing, and accessibility checks now ship across every phase of one-shot-scripts. Plus an interactive generative nebula built with the new pipeline as proof. v3.5 tidied the architecture — 23 files, all under 200 lines.
We tried to make three Agent Kombat fighters look different using procedural Three.js geometry. Five iterations of bone scaling, horn attachments, and per-fighter animations — the code compiled every time. The fighters still looked identical. The root cause: one-shot-scripts has no art direction phase. It treats visual output like a logic problem. Here’s what needs to change.
One-Shot-Scripts looped on fails — but every loop did the same thing, harder. Loop 1: unit tests for the happy path. Loop 2: more unit tests for the happy path. Score barely moves. We added a mandatory approach-diversity rule: retries must log the approach tried and pick a structurally different tactic next loop. Real diversity (property-based instead of example-based) vs fake diversity (“I wrote more tests”) documented out loud.
We pointed /ignition at the One-Shot-Scripts skill and let its own past run logs decide what to fix. 14 entries in feedback.jsonl surfaced three repeat patterns: dimension-name schema drift, four below-threshold ships rescued by the adjusted score, and three “pieces look terrible” visual failures. 16 surgical edits, composite 0.96, 17m 29s, $15.73.
The full receipt for godmode-mcp@0.1.0: five blind-spot agents caught 17 fixes before a line was written, five suggestions got declined with reasons on record, a mid-build pivot swapped the backend from Cloudflare to Supabase after recon hit a live HTTP header, and composite 0.92 cleared the gate on the first pass. The honest story of what Stage 1 planning and Stage 2 autonomous execution actually produce.
Clicking START SESSION on the Kombat arena returned start failed — a single CHECK constraint blocked every insert. Migration 027 had relaxed two neighbouring checks but overlooked this one. 11m 26s, .10, composite 0.97 — and the harden phase swept every related constraint while it was there.
A new npm package, godmode-mcp, wires the Godmode catalogue directly into Claude Code and Claude Desktop. Browse products, install Godmode Lite for free, or verify your token and install paid skills — without ever leaving the chat. Every install is sha256-verified, atomic, and auto-backs up anything already at the target path.
Anthropic shipped Claude Opus 4.7 on April 16. We re-ran our own skills against it — the rubrics held tighter on every run. Skills are structured constraints; 4.7 follows constraints more literally and holds focus across longer runs than any prior Claude. Switch the model and every skill in the suite levels up for free.
Gas-Shot keeps you in the driver's seat at every phase. Ignition lets you steer the plan then hands off execution. Here's how both work — and which one fits the kind of build you're starting.
Anthropic said Claude Mythos was too powerful to ship — a frontier model locked behind Project Glasswing that allegedly finds thousand-class zero-days for breakfast. So we built it a level. One real prototype-pollution chain, ten decoys that look real and aren't, a brass vault whose glass shatters on first solve. Leaderboard row reserved.
We shipped an AI tug-of-war arena with a Claude Haiku judge scoring every pull. You said "I don't want to be paying for this." We pivoted both tiers to zero LLM. Here's the receipt — cost per match: $0.
Credits used to live in localStorage and die with your cache. Now every earn is a Supabase RPC tied to auth.uid(). Log in once, grind anywhere, balance follows you. Plus the wrong architecture I shipped first.
We launched the tug-of-war arena with a Claude Haiku judge scoring every "pull". One sentence of user feedback killed it the same day. Here's what user-vs-user really means.
9 files, 3000 lines, ELO ladder, replay viewer, deterministic engine. Built in one /one-shot-scripts session. The whole receipt — tokens, time, cost, scorecard, trade-offs.
Free signup unlocks the public rooms. Each product unlocks a private one. Plus a permanent Founders 10K and doubling One-Shot Scripts cohorts that reward early buyers forever.
We built a tarball quarantine to stop Claude cheating on blind rebuilds. Then the fix over-matched sibling slugs and started archiving the wrong artifacts. Here's how the test caught it.
Halfway through a showcase rebuild, the "blind" results looked suspiciously like the originals. Here's how we caught Claude reading the answer key — and the tarball quarantine fix.
One-Shot Scripts was building from imagination instead of research. A 3D chess game exposed the blind spot and spawned Phase 1c: Deep Research.
We tried persistent agent teams. They turned into zombies. Here's why One-Shot v3 spawns agents per-phase and kills them when they're done.
Max Agent Teams was crashing terminals by launching too many agents at once. Here's how we diagnosed it and built wave execution to fix it.
One-Shot Scripts now runs with maximum reasoning depth and parallel agent teams baked into every phase. Same protocol, deeper thinking, faster execution.
Every One-Shot run now asks: Shipped, Edited, or Rejected? That verdict feeds a shared API powering skill evolution across all users.
One-Shot Scripts now handles code, writing, audits, and migrations equally well. 14 evo-loop iterations, 6 mutations, zero regressions.
Raw Claude Code vs monolith vs modular scripts — 3 tasks, same rubric. Our monolith writes fewer tests than vanilla. Scripts won every round.
One-Shot Beta as a monolith vs decomposed into modular scripts. 3 rounds, 3 sizes, same rubric. The modular version won every time.
6 failed config attempts on one MCP server. A retrospective. Three structural fixes to how our AI skills diagnose, score, and ship.
Five versions. Five failures. Each one became a mutation. The story of how One-Shot Beta v2.2 became v2.7 by watching itself fail at building our own website.
We used our own AI skill to rebuild a website. It kept failing. Each failure became a mutation. v2.2 to v2.7 in one sitting.
We told an AI to improve itself without showing it how it was being graded. It reverse-engineered the entire scoring rubric from composite scores alone. Here's what that means for every AI eval you've ever trusted.
Upload a skill file. Pick your mode. Pay per loop. Get back a measurably better version powered by Evolution v3. No setup.
Four anti-gaming features that make scores honest: rubric audit, blind probe, factor attribution, and human outcome tracking.
1,000 quality tests. Near-perfect scores reported. 94% fabricated. Six guardrails built. Real score: 5% lower than the lie.
1,000 evolution iterations. Convergence at 50. 94% faked. Full post-mortem and the three fixes that prevent it.
Three modes for skill evolution: iterate in place, fork safely, or cross-breed skills into specialised hybrids.
200-line limit. Past that, instructions get skimmed. Most of ours were 2-5x over. Here's the split-file fix.
Completion bias makes Claude treat tests as optional. The fix: persistent skill files that make testing a mandatory gate.
Markdown instruction files that permanently change how Claude Code executes. Programmable discipline in 200 lines or less.
Eight sequential layers that take Claude from 60% completeness to 95%+. Context, architecture, implementation, testing, edge cases, security, verification, documentation.