Summary of "3 Hours of Making an AI Film Start to Finish (Watch Me Fail)"

Summary of technological concepts, workflow, and features (AI film making)

Goal + constraints

The team is producing a fully AI-generated feature film (~80 minutes).
Deadline is extremely tight: premiere in under a month with ~14 days to finish.
They begin with 10 million credits.
15 people work in parallel; scenes 21–23 are split across workers.

End-to-end workflow shown (Day 4: “start to finish”)

Script upload → shot list generation
- Scenes 21 and 23 are handled by the speaker.
- Scene 22 is handled by coworkers.
Asset generation
- Locations, character sheets, props.
Video generation from shot prompts
- Uses Cedus/Cance 2.0.
- Subtitles reference “Cance/Cedus 2.0” and “canvas”.
Iteration loops
- Failures require prompt edits.
- Batching explores variations until something matches the shot plan.

“Canvas” + consistency controls via style prefix

To keep outputs consistent across many generators and collaborators, the team maintains a shared style prefix enforcing:

Lighting
Color
Composition
Audio

Example constraints used:

Lighting: natural light only
Audio: no music / environmental SFX only
- Reason: Cance 2.0 outputs only one audio track, and it’s hard to separate music later.

Using Claude with custom skills and context retention

The speaker uses Claude with a custom skill (e.g., a “short list builder”) that embeds director knowledge.

They then use Claude to:

Split scenes into 15-second shots
Generate tailored prompts for the selected video model
Output structured shot lists including:
- Descriptions + prompts
- Failure modes

Claude is also used for:

Prompt editing
Reformatting into better layouts (e.g., two-column shot list formatting)

Team collaboration via “collab” projects

The team creates shared projects in Collab:

Scenes are managed in a shared workspace.
Projects can be shared via link/email with permissions:
- approval required / viewer / collaborator / message settings

Key advantage:

Sharing keeps inputs + prompts + tool provenance together, so assets can be re-tweaked without rediscovering generation parameters.

Location generation strategy (spatial awareness issues)

Location prompts are written with extremely detailed spatial descriptions:

Layout
Windows/doors
Wall art
Hallway geometry

They also note a key limitation:

Location image generators can warp spatial layouts, e.g.:
- indoor rooms appearing outdoors
- incorrect “fitting” of images

Fallback workflow suggestion:

Use something like “Soul Cinema” to batch many location candidates when exact director references aren’t available.

Iterative image generation and prompt “debugging”

They run batches (often 4 images) because detailed prompts yield more consistent outputs.
Common failure fixes:
- Plasticky textures
  - Mitigated with keywords like atmospheric haze, film grain, and lighting/haze phrasing.
- Layout regressions
  - Examples: hallway disappears, posters vanish
  - Requires prompt edits and re-batching
They sometimes switch tools:
- Nano Pro for image generation
- GPT Image 2.0 for edits/variations
They track iteration counts (e.g., 44 iterations to refine the “final room layout” and gather alternate angles).

Two-view camera/reference approach for videos

To improve camera navigation and continuity:

Generate multiple still views
- e.g., front wide + reverse/hallway view
Combine them into a single reference asset:
- top panel / bottom panel
Switch to 16:9 aspect ratio for easier framing.

Props workflow: Polaroid photo wall + sticky note

Props are “locked” after several rounds:

Photo wall strategy
- Build the wall in Photoshop from many generated Polaroids.
- Avoid generating one huge photo wall image because:
  - faces drift across frames.
- Generate each Polaroid separately, then stitch, to preserve identity consistency.
Sticky note
- Generate a close-up sticky note (e.g., “food in the fridge”).
Character-aging strategy
- Generate teen and adult versions to populate different memories on the wall.

Shot generation with Cedus/Cance 2.0: batching + verification

Claude outputs shot prompts like “21A”, “21B”, etc.

The speaker:

Verifies tag order
- character first, location second, then prop/photo wall, etc.
Runs video generations in batches (e.g., 8 at a time)
Skims outputs to decide:
- keep batching
- or fix prompts based on systematic issues

Practical rule:

The speaker doesn’t watch every video end-to-end—only enough to judge composition/continuity, then fixes prompts early if errors appear.

Major failure-mode handling encountered

Spatial/camera issues
- wrong camera angle
- too many doors
- wrong hallway reference
- wrong camera placement
- camera too aggressive
Motion continuity issues
- unwanted camera cuts
- subshot segmentation inside a single intended continuous shot
FPS mismatches
- Detect and reject outputs that aren’t at the required 24 fps
“Frame double/cut” hack (last resort)
- If pacing/FPS is wrong, manual editor removal can help.
- Caution: it may cut audio too, and frame loss can cause jolts.
Audio discipline
- Prefer fixing music in edit.
- Generation sometimes adds music even when instructed not to.

Post-production notes (from the workflow narrative)

After video generation, the plan includes:

Color grading
Adding missing SFX
Combining with voice from original footage
- aligned with a traditional-style post workflow

Also recommended:

Edit as you go
- Use a timeline with temp music to catch shot assembly/prompt failures early.

VFX cost comparison analysis (traditional pipeline vs AI)

They interview Patrick Kalin, an Emmy-nominated VES award-winning VFX artist.
Kalin is associated with films such as:
- Avatar, Dune, Blade Runner 2049, Deadpool 2
Kalin praises the AI film’s illusion/engagement and estimates:
- A traditional CGI/VFX equivalent could cost ~$15–20 million
- (traditional principal photography + extensive VFX/practical effects)

The team’s takeaway:

Their approach is much cheaper than traditional estimates, though still expensive in credits.

Progress + production metrics after 4 days

Totals across 4 days:

4,441,352 credits spent
48,336 images/videos generated
~$260,000 over 4 days (as stated)
Only 8 assets from ~800 generated assets made the final cut during shooting

Conclusion:

10 million credits may be insufficient, especially for scenes that require heavy iteration.

Key review / guide / tutorial takeaways (explicitly taught or emphasized)

Use a shared “style prefix” for consistent lighting/color/audio across a multi-person team.
Force audio constraints (environment SFX only, no music/subtitles) due to limited track handling (often one audio track).
Use Claude custom skills to convert scripts into shot lists and to preserve director intent.
Use “collab” shared projects to preserve prompts, inputs, and tool history for faster iteration.
Generate locations with detailed textual layouts; batch-image selection helps when exact references are missing.
Prefer many small prop generations (Polaroids) over one giant wall to prevent face drift; stitch in Photoshop.
Batch intelligently (often 4 or 8) and fix prompts early when systemic failures appear.
Detect and reject bad outputs, especially wrong fps; manual frame removal is a last resort and may affect audio.
Edit as you go in a timeline using temp music to match cinematic pacing.
Expect failures and treat them as learning loops to improve prompts over time.

Main speakers / sources

Adil: primary narrator/speaker; runs scenes 21 and 23 and teaches workflow.
Patrick Kalin: Emmy-nominated VES award-winning VFX artist; provides cost/pipeline comparison.
Claude: AI assistant used for custom skills, shot-list generation, and prompt refinement.
Cedus/Cance 2.0: AI video generation system referenced for shot prompting and outputs.