/blog-post-GM — a Claude Code skill we evolved with our own Evolution engine to write every post in the Godmode voice.
We Evolved One-Shot 5 Times in One Session. Here's What Broke.
🛠️ What we did: Used One-Shot to overhaul 10 blogs, restructure a homepage into 3 pages, and create a pricing page
💥 What kept breaking: Duplicate navs, missing footers, broken mobile, no headings, buttons going nowhere
🧬 What we built: 5 mutations (v2.2 → v2.7) — each one a direct fix for a specific failure
💡 The insight: The biggest threat to AI quality isn't bad code — it's depleted context
SCORE FALLS AS CONTEXT FILLS — v3.0 → v3.7
We spent a full session using One-Shot Beta to rebuild getgodmode.dev. Overhauled every blog post. Split the homepage into three pages. Created a dedicated pricing page. Fixed dozens of UI bugs.
The skill kept failing. Not on the code — on everything around the code. And each failure taught us something the skill didn't know yet.
The Five Failures
| Version | What Broke | What We Added |
|---|---|---|
| v2.3 | Agents created files correctly → bulk sed corrupted them → duplicate nav links on 2 pages | Cross-file verification after bulk operations |
| v2.4 | Stale HUD sidebar, missing page headings, mobile nav overflow — invisible in source code review | Phase 7: Visual Audit (fetch live pages, check rendering) |
| v2.5 | "Access" button went to a non-existent anchor. No pricing page existed. Nobody asked "should this page exist?" | Phase 0: Think Like the User + mandatory mobile walkthrough |
| v2.6 | 22 pages had no footer. Hamburger UX was broken. Steps were being skipped under context pressure. | Pre-delivery gate checklist (12 items, must print before every delivery) |
| v2.7 | Quality degraded as conversation got longer. More tasks = more skipped steps = more bugs. | Context Gate: hard block if context is too low. Refuse to run. |
EACH FIX WAS CORRECT — THE SCORE KEPT DROPPING
The Pattern
Every failure followed the same shape: the skill had rules for writing code, but no rules for checking the result of the code.
↓
👁️ Nobody looks at the rendered page
↓
📱 Nobody checks mobile
↓
🚶 Nobody walks the site as a visitor
↓
💩 User finds the bugs
Think of it like a restaurant: The chef follows the recipe perfectly. The food is cooked right. But nobody checks if the plate looks good, if the portion is right, if the table is set, or if the menu even lists the dish. The kitchen is flawless. The dining room is a mess.
EVERY STEP CORRECT · PLATE NEVER CHECKED
What Each Mutation Actually Does
v2.3 — Cross-File Verification
When you launch 3 agents to create 3 pages, then run a bulk find-and-replace across all files, the agent-created files get modified twice. The agents did their job. The bulk operation did its job. The combination broke everything.
The fix: after ALL changes are complete, re-read every file that was touched by both an agent and a bulk operation. Check for duplicates.
v2.4 — Visual Audit
Source code review can't tell you that a page has no heading. The HTML is valid. The CSS is correct. But when you open it in a browser, there's no title — just a warning box floating in space.
The fix: Phase 7 fetches every modified page after deployment and checks 6 things — headings, nav, orphaned references, responsive design, brand consistency, and first-time visitor comprehension.
v2.5 — Think Like the User
The "Access" button in the nav pointed to /#pricing. But the pricing section had been moved to its own page. The button went to an anchor that no longer existed. Nobody asked: "where does this button actually go?"
The fix: Phase 0 walks the entire site as a first-time visitor before building anything, and again after. "Is there a page missing? Does every button go somewhere useful? What would confuse someone?"
v2.6 — Pre-Delivery Checklist
The skill had rules for footers, mobile, nav consistency. The rules existed. They were being skipped. Not intentionally — the context was so long that the instructions from the beginning of the conversation were effectively invisible.
Think of it like a pilot's preflight checklist: Experienced pilots don't skip the checklist because they know how to fly. They use it BECAUSE they know that memory under pressure is unreliable. The checklist doesn't teach — it forces verification.
The fix: a 12-item gate that must be printed and answered before every delivery. Not prose instructions that get skimmed — yes/no checkboxes that force engagement.
v2.7 — Context Gate
This was the hardest one. After fixing the same category of bug three times, we asked: why does the quality keep dropping? The answer wasn't the skill. It was the conversation.
After hours of work, the skill's instructions are thousands of tokens away from the active work. The AI starts cutting corners — not maliciously, but because the relevant instructions have been pushed out of focus by the volume of prior work.
The fix: before doing any work, assess available context. If it's too low, hard block. Don't offer a lighter version. Don't try anyway. Save a memory entry, give the user an exact copy-paste prompt for a fresh chat, and refuse to proceed.
DRAG PAST 80% — ONE-SHOT v2.7 REFUSES
continue: rebuild getgodmode.dev pricing page
prior session ran 15+ tasks; context depleted.
state saved to memory entry: one-shot-2026-04-26-degraded.
This Post Is the Proof
You're reading the degraded version right now. This post was written at the end of a session that overhauled 10 blog posts, created 3 new pages, fixed 30+ UI bugs, and evolved the skill through 5 versions.
A companion post covers the same topic — written in a fresh session with full context. Same skill. Same template. Same topic.
Compare them. The differences are the experiment.
This Post (Degraded)
Written after 15+ tasks in the same conversation. Context heavily used. The skill warned against it. We did it anyway — to prove the point.
Companion Post (Fresh)
Same topic, same skill, same template. Fresh conversation. Full context window. The v2.7 Context Gate would have sent you here.
What This Means for AI-Assisted Work
If you're using AI for multi-step projects, context management is as important as prompt engineering. A perfect skill running on depleted context produces worse output than a mediocre skill on fresh context.
- One major task per conversation. Start fresh for each distinct piece of work.
- Save state between sessions. Use memory/files so the next conversation has context without the baggage.
- Build the checklist, not the instinct. Rules get skimmed. Checklists force engagement.
- Refuse to run degraded. A system that delivers bad work quietly is worse than one that refuses to run.
The thesis: The system you use to verify quality is itself a system that needs verification. And when that system is an AI with a finite context window, the biggest threat to quality isn't bad code — it's the conversation being too long.
The Evolution Timeline
↓
🔄 v2.3 — +Cross-file verification
↓
👁️ v2.4 — +Phase 7: Visual Audit (8 phases)
↓
🚶 v2.5 — +Phase 0: Think Like the User + mobile mandatory
↓
📋 v2.6 — +Pre-delivery gate checklist (12 items)
↓
🛑 v2.7 — +Context Gate: refuse to run if context depleted
Five mutations. Each one a scar from a real failure. The skill at v2.7 is measurably better than v2.2 — not because we planned it, but because we used it hard enough to break it.
One-Shot Beta v2.7 ships with all five mutations.
Context Gate. Visual Audit. Think Like the User. Pre-delivery checklist. Cross-file verification. Every lesson from this session, baked in.
Get One-Shot Read the fresh version