Built by /blog-post-GM — a Claude Code skill we evolved with our own Evolution engine to write every post in the Godmode voice.

Get free skill (account)

Experiment April 1, 2026 ⏱️ 5 min read

We Evolved One-Shot 5 Times in One Session. Here's What Broke.

TL;DR

🛠️ What we did: Used One-Shot to overhaul 10 blogs, restructure a homepage into 3 pages, and create a pricing page
💥 What kept breaking: Duplicate navs, missing footers, broken mobile, no headings, buttons going nowhere
🧬 What we built: 5 mutations (v2.2 → v2.7) — each one a direct fix for a specific failure
💡 The insight: The biggest threat to AI quality isn't bad code — it's depleted context

SCORE FALLS AS CONTEXT FILLS — v3.0 → v3.7

composite score (91→52) context used % (25→96) critical zone (post-crossover)

EVERY EVOLUTION KEPT HARDER LESSONS — AND LOST EASIER ONES

We spent a full session using One-Shot Beta to rebuild getgodmode.dev. Overhauled every blog post. Split the homepage into three pages. Created a dedicated pricing page. Fixed dozens of UI bugs.

The skill kept failing. Not on the code — on everything around the code. And each failure taught us something the skill didn't know yet.

💥 The Five Failures

Version	What Broke	What We Added
v2.3	Agents created files correctly → bulk sed corrupted them → duplicate nav links on 2 pages	Cross-file verification after bulk operations
v2.4	Stale HUD sidebar, missing page headings, mobile nav overflow — invisible in source code review	Phase 7: Visual Audit (fetch live pages, check rendering)
v2.5	"Access" button went to a non-existent anchor. No pricing page existed. Nobody asked "should this page exist?"	Phase 0: Think Like the User + mandatory mobile walkthrough
v2.6	22 pages had no footer. Hamburger UX was broken. Steps were being skipped under context pressure.	Pre-delivery gate checklist (12 items, must print before every delivery)
v2.7	Quality degraded as conversation got longer. More tasks = more skipped steps = more bugs.	Context Gate: hard block if context is too low. Refuse to run.

EACH FIX WAS CORRECT — THE SCORE KEPT DROPPING

v2.3

Broke · FixBulk sed corrupted agent-created files → duplicate nav links+ Cross-file verification after bulk operations

v2.4

Broke · FixStale HUD, missing headings, mobile nav overflow — invisible in source review+ Phase 7: Visual Audit (fetch live pages)

v2.5

Broke · Fix"Access" button pointed to a non-existent anchor. No pricing page existed.+ Phase 0: Think Like the User + mobile walkthrough

v2.6

Broke · Fix22 pages had no footer. Hamburger UX broken. Steps skipped under pressure.+ Pre-delivery gate (12 yes/no items, must print)

v2.7

Broke · FixQuality degraded as conversation got longer. The bug is the conversation.+ Context Gate: refuse to run when context is too high

FIVE CORRECT FIXES · THE LINE STILL FALLS

🔍 The Pattern

Every failure followed the same shape: the skill had rules for writing code, but no rules for checking the result of the code.

⚙️ Code is correct in source
↓
👁️ Nobody looks at the rendered page
↓
📱 Nobody checks mobile
↓
🚶 Nobody walks the site as a visitor
↓
💩 User finds the bugs

Think of it like a restaurant: The chef follows the recipe perfectly. The food is cooked right. But nobody checks if the plate looks good, if the portion is right, if the table is set, or if the menu even lists the dish. The kitchen is flawless. The dining room is a mess.

EVERY STEP CORRECT · PLATE NEVER CHECKED

KITCHEN IS FLAWLESS · DINING ROOM IS A MESS

🏗️ What Each Mutation Actually Does

v2.3 — Cross-File Verification

When you launch 3 agents to create 3 pages, then run a bulk find-and-replace across all files, the agent-created files get modified twice. The agents did their job. The bulk operation did its job. The combination broke everything.

The fix: after ALL changes are complete, re-read every file that was touched by both an agent and a bulk operation. Check for duplicates.

v2.4 — Visual Audit

Source code review can't tell you that a page has no heading. The HTML is valid. The CSS is correct. But when you open it in a browser, there's no title — just a warning box floating in space.

The fix: Phase 7 fetches every modified page after deployment and checks 6 things — headings, nav, orphaned references, responsive design, brand consistency, and first-time visitor comprehension.

v2.5 — Think Like the User

The "Access" button in the nav pointed to /#pricing. But the pricing section had been moved to its own page. The button went to an anchor that no longer existed. Nobody asked: "where does this button actually go?"

The fix: Phase 0 walks the entire site as a first-time visitor before building anything, and again after. "Is there a page missing? Does every button go somewhere useful? What would confuse someone?"

v2.6 — Pre-Delivery Checklist

The skill had rules for footers, mobile, nav consistency. The rules existed. They were being skipped. Not intentionally — the context was so long that the instructions from the beginning of the conversation were effectively invisible.

Think of it like a pilot's preflight checklist: Experienced pilots don't skip the checklist because they know how to fly. They use it BECAUSE they know that memory under pressure is unreliable. The checklist doesn't teach — it forces verification.

The fix: a 12-item gate that must be printed and answered before every delivery. Not prose instructions that get skimmed — yes/no checkboxes that force engagement.

v2.7 — Context Gate

This was the hardest one. After fixing the same category of bug three times, we asked: why does the quality keep dropping? The answer wasn't the skill. It was the conversation.

After hours of work, the skill's instructions are thousands of tokens away from the active work. The AI starts cutting corners — not maliciously, but because the relevant instructions have been pushed out of focus by the volume of prior work.

The fix: before doing any work, assess available context. If it's too low, hard block. Don't offer a lighter version. Don't try anyway. Save a memory entry, give the user an exact copy-paste prompt for a fresh chat, and refuse to proceed.

DRAG PAST 80% — ONE-SHOT v2.7 REFUSES

Context used 75% WARNING

REFUSED · copy-paste into a fresh chat

continue: rebuild getgodmode.dev pricing page
prior session ran 15+ tasks; context depleted.
state saved to memory entry: one-shot-2026-04-26-degraded.

A SYSTEM THAT REFUSES IS BETTER THAN ONE THAT DELIVERS BAD WORK QUIETLY

🔬 This Post Is the Proof

You're reading the degraded version right now. This post was written at the end of a session that overhauled 10 blog posts, created 3 new pages, fixed 30+ UI bugs, and evolved the skill through 5 versions.

A companion post covers the same topic — written in a fresh session with full context. Same skill. Same template. Same topic.

Compare them. The differences are the experiment.

😫

This Post (Degraded)

Written after 15+ tasks in the same conversation. Context heavily used. The skill warned against it. We did it anyway — to prove the point.

💪

Companion Post (Fresh)

Same topic, same skill, same template. Fresh conversation. Full context window. The v2.7 Context Gate would have sent you here.

💡 What This Means for AI-Assisted Work

If you're using AI for multi-step projects, context management is as important as prompt engineering. A perfect skill running on depleted context produces worse output than a mediocre skill on fresh context.

One major task per conversation. Start fresh for each distinct piece of work.
Save state between sessions. Use memory/files so the next conversation has context without the baggage.
Build the checklist, not the instinct. Rules get skimmed. Checklists force engagement.
Refuse to run degraded. A system that delivers bad work quietly is worse than one that refuses to run.

The thesis: The system you use to verify quality is itself a system that needs verification. And when that system is an AI with a finite context window, the biggest threat to quality isn't bad code — it's the conversation being too long.

📊 The Evolution Timeline

📦 v2.2 — Starting point. 7 phases. No visual audit.
↓
🔄 v2.3 — +Cross-file verification
↓
👁️ v2.4 — +Phase 7: Visual Audit (8 phases)
↓
🚶 v2.5 — +Phase 0: Think Like the User + mobile mandatory
↓
📋 v2.6 — +Pre-delivery gate checklist (12 items)
↓
🛑 v2.7 — +Context Gate: refuse to run if context depleted

Five mutations. Each one a scar from a real failure. The skill at v2.7 is measurably better than v2.2 — not because we planned it, but because we used it hard enough to break it.

One-Shot Beta v2.7 ships with all five mutations.

Context Gate. Visual Audit. Think Like the User. Pre-delivery checklist. Cross-file verification. Every lesson from this session, baked in.

Get One-Shot Read the fresh version

← The Blind Experiment Process Quality →