Post-Mortem ⏱️ 5 min read

One-Shot Can't See: The Art Direction Problem

TL;DR

🎨 The task: Give three Agent Kombat fighters unique visual designs using procedural Three.js geometry
💥 The failure: Five iterations later, the fighters still looked like recolored copies of the same robot
👁 The root cause: One-shot-scripts has no art direction phase — it treats visual output like a logic problem
🔧 The fix: We need a dedicated visual reference + iterative screenshot loop BEFORE and DURING code generation
Three Fighters — Procedural Three.js
SENTINELcyber-ninjaINFERNOdemon brawlerSOVEREIGNmonarch
Drag to rotate · toggle the brief 15% bone scaling vs. dramatic transforms
[reduced motion] BEFORE: three near-identical low-poly robots with subtle hue jitter — the kind of 15–25% bone scaling that "sounds significant" in code but is invisible at game-camera distance. AFTER: SENTINEL turns sleek and tall with cyan visor and shoulder fins, INFERNO bulks out with sweeping horns and a glowing chest core, SOVEREIGN gains a five-spike crown and a draped purple cape. Same procedural box+cylinder primitives; the only difference is a frozen art brief.

🎮 The Task

Agent Kombat has three fighters: SENTINEL, INFERNO, and SOVEREIGN. They're rendered in Three.js using a shared RobotExpressive.glb model with procedural modifications.

The brief was simple: make them look different. Not just different colors — different silhouettes, different proportions, different gear. A hulking demon, a sleek cyber-ninja, and a regal monarch.

🔄 Five Iterations of "Looking the Same"

We ran /one-shot-scripts against the task. It produced a plan, wrote code, ran through all its phases. The code compiled. The syntax was clean. The animation branches were correct.

Then we looked at it. Three identical robots in different colors.

📝 Iteration 1: Per-fighter bone scaling (15-25%)

👁 Result: Invisible. Can't tell the difference at game camera distance.

📝 Iteration 2: Add horns to INFERNO, crown to SOVEREIGN

👁 Result: Tiny cones on the head. Invisible at game scale.

📝 Iteration 3: Bigger bone scaling (50-100%), bigger attachments

👁 Result: Still looks like the same robot. Color is the only real difference.

📝 Iteration 4: Per-fighter animations for all 9 move types

👁 Result: Animations play during brief attack frames. At idle, still identical.

📝 Iteration 5: Full demon rebuild with reference image from Veo

👁 Result: Finally getting somewhere — but only after abandoning the protocol entirely.
Iteration Ladder — Scroll To Morph 5 attempts · only the last works
Iter 1 · Bone Scaling 15–25%
Subtle per-fighter scaling on the shared rig.
→ Invisible at game-camera distance.
Iter 2 · Horns & Crowns
Tiny cones on INFERNO, mini crown on SOVEREIGN.
→ The attachments exist. Nobody can see them.
Iter 3 · Bone Scaling 50–100%
Crank the dial — "this has to register now."
→ Same robot, different proportions. Color is still the only tell.
Iter 4 · Per-Fighter Animations
Unique attack frames for all 9 move types.
→ Differentiated mid-attack only. Idle pose: identical.
Iter 5 · Demon Rebuild From Reference
Throw out the protocol. Match a Veo turntable.
→ Finally distinguishable — the only iteration that worked.
[reduced motion] Five iterations charted top-to-bottom. Iter 1–4 produce micro-deltas (15% bone scaling, sub-centimeter horns, larger scaling, animation bursts during attack frames) that all read as the same robot at game distance. Only Iter 5 — rebuilt against a Veo reference image — produces a visually distinct demon: massive sweeping horns, hulking chest, glowing core. Four protocol-faithful iterations failed; one protocol-abandoning iteration worked.

Five passes. Each one scored well on technical dimensions. Each one failed the only test that mattered: does it look different when you open the browser?

🔍 Why the Protocol Failed

One-shot-scripts has eight phases. Recon, research, build, test, harden, document, verify, polish. It's designed to produce correct, robust, well-tested code.

Not one of those phases asks: what should this look like?

One-Shot Protocol — Visual Check Per Gap
The gap before Build is the only place where a "what should this look like?" check would prevent every downstream wrong turn.
[reduced motion] Eight phases — recon, research, build, test, harden, document, verify, polish — sit on a horizontal pipeline. A small "VISUAL CHECK?" badge sits between every adjacent pair of phases; all seven badges read ×. A new "1d: ART DIRECTION" box sits below, between research and build, with an arrow flipping that gap's badge to ✓ and curved fan-out arrows touching every downstream phase: art direction informs them all.

Think of it like this: Imagine a factory that builds custom cars. It has quality inspectors for the engine, the brakes, the electronics, and the paint booth. But nobody ever looks at the finished car from across the parking lot. Every car rolls off the line mechanically perfect — and visually identical.

Inspection Booth — Empty
Every car perfect inside. Every car identical outside.
Inspector arrived. Same chassis — ten paint jobs.
[reduced motion] A conveyor moves six identical silver cars left-to-right past an inspection booth. The booth lamp is red and the booth is empty — every car wears the same paint. Once an inspector enters the booth (lamp turns green), the next batch of cars rolls off in distinct colors with unique trim. Same chassis, ten paint jobs — identical to the post's claim that visual differentiation requires a perception-checking phase, not more iterations of the same blind code path.

The protocol's visual audit (Phase 7) takes a screenshot AFTER the build is complete. By then, the damage is done. You've written 300 lines of procedural geometry code guided by nothing but imagination.

And AI imagination about 3D visuals is terrible. A 15% bone scale increase sounds significant in code. On screen, at game camera distance, it's invisible.

The Core Misconception

The protocol treats visual tasks like logic tasks. "Add horns to INFERNO" becomes a geometry problem: cone + position + rotation = done. Check the box, move on.

But visual design isn't a logic problem. It's a perception problem. The question isn't "does the horn mesh exist in the scene graph?" It's "can a human tell these two characters apart in a 2-second glance?"

What Visual Tasks Need

A reference image before writing code. Screenshots every 50 lines. "Does this look right?" checks at human-perception scale. Willingness to throw out technically-correct code that looks wrong.

What the Protocol Does

Write all the code. Run syntax checks. Take one screenshot at the end. Score it on "does it render without errors?" Ship it.

💡 What We Actually Needed

The breakthrough came when we abandoned the protocol and did something different: we grabbed a reference image of a 3D devil model, fed it to Veo to generate a rotating turntable video, and used that as the visual target.

Suddenly the code had a destination. Not "add demon features" but "match the proportions in this reference — massive sweeping horns, hulking upper body, chest armor with a glowing core."

🖼

Visual Reference First

Generate or find a reference image BEFORE writing any code. Use Veo, image generation, or even a sketch. The code serves the reference, not the other way around.

📸

Screenshot During Build

Don't wait for Phase 7. Take a screenshot every time you add a major visual element. Catch "invisible at game scale" after 10 lines, not 300.

🎯

Perceptual Scoring

"Can you tell Fighter A from Fighter B in a thumbnail?" is the real test. Not "does the geometry exist in the scene graph?" Not "does it render without errors?"

Reference-Code Loop

Compare the screenshot to the reference. If they don't match, adjust. This is art direction — iterating on feel, not just correctness.

🔧 What Needs to Change in One-Shot-Scripts

This isn't a minor tweak. It's a new phase.

Phase 1d: Art Direction (for visual tasks)

After research (Phase 1c) and before build (Phase 2), visual tasks need a dedicated art direction step:

🔎 Phase 1c: Deep Research — find references, study the domain

🎨 Phase 1d: Art Direction — generate visual targets using Veo/image tools

🎯 Lock a reference sheet: "Fighter A looks like THIS, Fighter B looks like THIS"

🛠 Phase 2: Build — code serves the reference, screenshot every major addition

📷 Phase 7: Visual Audit — compare final output to reference sheet, not just "does it render"

The Scoring Rubric Needs a Visual Dimension

The current rubric scores correctness, robustness, testing, documentation, and code quality. For visual tasks, it needs a "visual differentiation" dimension that asks:

The rule: If a task produces pixels, the scoring loop must judge pixels — not just the code that generates them. A technically perfect renderer that produces indistinguishable output scores zero on the dimension that matters most.

📊 The Numbers Don't Lie

MetricBefore Art DirectionAfter Art Direction
Iterations to "looks different"4 (still failed)1
Lines of procedural geometry~80 (subtle, invisible)~200 (dramatic, obvious)
Reference images used01 (+ Veo turntable video)
Screenshots during build0 (only Phase 7)6+ (after each fighter)
User verdict"They still look exactly the same""Getting somewhere"

Four iterations of blind coding produced nothing usable. One iteration with a reference image produced visible differentiation in a single pass.

🧠 The Bigger Lesson

This isn't just about 3D fighters. Any task with visual output — UI components, data visualizations, CSS layouts, game sprites, PDF generation — has the same problem.

AI can write code that compiles. It can write code that passes tests. It cannot write code that looks right without looking at the output. And "looking" needs to happen continuously, not as a final gate.

The takeaway: Execution protocols that treat visual tasks like logic tasks will produce code that's correct and ugly. Art direction isn't a nice-to-have — it's the difference between "technically works" and "actually ships."

🚀 What's Next

We're building the art direction phase into one-shot-scripts. Phase 1d will use Veo, image generation, and reference libraries to establish visual targets before a single line of rendering code gets written.

The visual audit in Phase 7 will compare against those targets — not just check "does it render." And the scoring rubric will include a visual differentiation dimension for any task that produces pixels.

Get the Protocol That Keeps Getting Honest

One-shot-scripts is the execution protocol that finds its own blind spots. Art direction is next.

Get Godmode Read More