One-Shot Can't See: The Art Direction Problem
🎨 The task: Give three Agent Kombat fighters unique visual designs using procedural Three.js geometry
💥 The failure: Five iterations later, the fighters still looked like recolored copies of the same robot
👁 The root cause: One-shot-scripts has no art direction phase — it treats visual output like a logic problem
🔧 The fix: We need a dedicated visual reference + iterative screenshot loop BEFORE and DURING code generation
The Task
Agent Kombat has three fighters: SENTINEL, INFERNO, and SOVEREIGN. They're rendered in Three.js using a shared RobotExpressive.glb model with procedural modifications.
The brief was simple: make them look different. Not just different colors — different silhouettes, different proportions, different gear. A hulking demon, a sleek cyber-ninja, and a regal monarch.
Five Iterations of "Looking the Same"
We ran /one-shot-scripts against the task. It produced a plan, wrote code, ran through all its phases. The code compiled. The syntax was clean. The animation branches were correct.
Then we looked at it. Three identical robots in different colors.
↓
👁 Result: Invisible. Can't tell the difference at game camera distance.
↓
📝 Iteration 2: Add horns to INFERNO, crown to SOVEREIGN
↓
👁 Result: Tiny cones on the head. Invisible at game scale.
↓
📝 Iteration 3: Bigger bone scaling (50-100%), bigger attachments
↓
👁 Result: Still looks like the same robot. Color is the only real difference.
↓
📝 Iteration 4: Per-fighter animations for all 9 move types
↓
👁 Result: Animations play during brief attack frames. At idle, still identical.
↓
📝 Iteration 5: Full demon rebuild with reference image from Veo
↓
👁 Result: Finally getting somewhere — but only after abandoning the protocol entirely.
Five passes. Each one scored well on technical dimensions. Each one failed the only test that mattered: does it look different when you open the browser?
Why the Protocol Failed
One-shot-scripts has eight phases. Recon, research, build, test, harden, document, verify, polish. It's designed to produce correct, robust, well-tested code.
Not one of those phases asks: what should this look like?
→ Phase 1d injects a frozen reference sheet that informs Build, Test, Verify and Polish — not just the gap immediately after.
Think of it like this: Imagine a factory that builds custom cars. It has quality inspectors for the engine, the brakes, the electronics, and the paint booth. But nobody ever looks at the finished car from across the parking lot. Every car rolls off the line mechanically perfect — and visually identical.
The protocol's visual audit (Phase 7) takes a screenshot AFTER the build is complete. By then, the damage is done. You've written 300 lines of procedural geometry code guided by nothing but imagination.
And AI imagination about 3D visuals is terrible. A 15% bone scale increase sounds significant in code. On screen, at game camera distance, it's invisible.
The Core Misconception
The protocol treats visual tasks like logic tasks. "Add horns to INFERNO" becomes a geometry problem: cone + position + rotation = done. Check the box, move on.
But visual design isn't a logic problem. It's a perception problem. The question isn't "does the horn mesh exist in the scene graph?" It's "can a human tell these two characters apart in a 2-second glance?"
What Visual Tasks Need
A reference image before writing code. Screenshots every 50 lines. "Does this look right?" checks at human-perception scale. Willingness to throw out technically-correct code that looks wrong.
What the Protocol Does
Write all the code. Run syntax checks. Take one screenshot at the end. Score it on "does it render without errors?" Ship it.
What We Actually Needed
The breakthrough came when we abandoned the protocol and did something different: we grabbed a reference image of a 3D devil model, fed it to Veo to generate a rotating turntable video, and used that as the visual target.
Suddenly the code had a destination. Not "add demon features" but "match the proportions in this reference — massive sweeping horns, hulking upper body, chest armor with a glowing core."
Visual Reference First
Generate or find a reference image BEFORE writing any code. Use Veo, image generation, or even a sketch. The code serves the reference, not the other way around.
Screenshot During Build
Don't wait for Phase 7. Take a screenshot every time you add a major visual element. Catch "invisible at game scale" after 10 lines, not 300.
Perceptual Scoring
"Can you tell Fighter A from Fighter B in a thumbnail?" is the real test. Not "does the geometry exist in the scene graph?" Not "does it render without errors?"
Reference-Code Loop
Compare the screenshot to the reference. If they don't match, adjust. This is art direction — iterating on feel, not just correctness.
What Needs to Change in One-Shot-Scripts
This isn't a minor tweak. It's a new phase.
Phase 1d: Art Direction (for visual tasks)
After research (Phase 1c) and before build (Phase 2), visual tasks need a dedicated art direction step:
↓
🎨 Phase 1d: Art Direction — generate visual targets using Veo/image tools
↓
🎯 Lock a reference sheet: "Fighter A looks like THIS, Fighter B looks like THIS"
↓
🛠 Phase 2: Build — code serves the reference, screenshot every major addition
↓
📷 Phase 7: Visual Audit — compare final output to reference sheet, not just "does it render"
The Scoring Rubric Needs a Visual Dimension
The current rubric scores correctness, robustness, testing, documentation, and code quality. For visual tasks, it needs a "visual differentiation" dimension that asks:
- Can the user distinguish each element at intended viewing distance?
- Does the output match the approved reference?
- Would a screenshot pass a "3-second glance" test?
The rule: If a task produces pixels, the scoring loop must judge pixels — not just the code that generates them. A technically perfect renderer that produces indistinguishable output scores zero on the dimension that matters most.
The Numbers Don't Lie
| Metric | Before Art Direction | After Art Direction |
|---|---|---|
| Iterations to "looks different" | 4 (still failed) | 1 |
| Lines of procedural geometry | ~80 (subtle, invisible) | ~200 (dramatic, obvious) |
| Reference images used | 0 | 1 (+ Veo turntable video) |
| Screenshots during build | 0 (only Phase 7) | 6+ (after each fighter) |
| User verdict | "They still look exactly the same" | "Getting somewhere" |
Four iterations of blind coding produced nothing usable. One iteration with a reference image produced visible differentiation in a single pass.
The Bigger Lesson
This isn't just about 3D fighters. Any task with visual output — UI components, data visualizations, CSS layouts, game sprites, PDF generation — has the same problem.
AI can write code that compiles. It can write code that passes tests. It cannot write code that looks right without looking at the output. And "looking" needs to happen continuously, not as a final gate.
The takeaway: Execution protocols that treat visual tasks like logic tasks will produce code that's correct and ugly. Art direction isn't a nice-to-have — it's the difference between "technically works" and "actually ships."
What's Next
We're building the art direction phase into one-shot-scripts. Phase 1d will use Veo, image generation, and reference libraries to establish visual targets before a single line of rendering code gets written.
The visual audit in Phase 7 will compare against those targets — not just check "does it render." And the scoring rubric will include a visual differentiation dimension for any task that produces pixels.
Get the Protocol That Keeps Getting Honest
One-shot-scripts is the execution protocol that finds its own blind spots. Art direction is next.
Get Godmode Read More