Summary of "I Made an AI Influencer and Built Her a Full Commercial (Every Prompt)"
High-level summary
- This video is a step-by-step tutorial showing how the creator built an AI “influencer” and a full cinematic commercial using one text-to-video model (Cling 3.0) inside the Higsfield platform.
- The host reads every prompt used, explains what each line does, and breaks down every shot of an ad made for his company (Growth School).
- Emphasis is on prompt design (how to describe characters, camera, lighting, delivery, and sound) and on practical Cling 3.0 + Higsfield features that make cinematic AI video possible.
Key product / technology features demonstrated
Cling 3.0 (AI video model)
- Generates up to ~15 seconds per generation.
- Native audio: dialogue, sound effects, and music are produced in the same generation (no separate dubbing).
- Character consistency across shots when using a reference / “elements” workflow.
- Multi-shot storyboarding: direct multiple camera cuts (up to six shots) in one prompt sequence with per-shot durations and audio.
- Multilanguage audio & lip sync (Hindi, Japanese, Spanish, Chinese, Korean, etc. — just write dialogue in the target language).
- Camera & cinematography-aware controls (camera moves, depth of field, speed ramps, slow motion, etc.).
Higsfield platform
- Elements library to upload/tag reference images for a consistent character (3–5 reference photos: front, side, smiling, neutral).
- Per-shot prompt boxes for multi-shot storyboarding with start/end times and individual soundscapes.
- Runs Cling 3.0 as the generator; annual plan offers a high generation allowance (host mentions “unlimited” on annual).
Concrete prompting rules, structures and vocabulary (actionable guide)
General prompt structure — include these components (in roughly this order):
- Character description (wardrobe, age, prop, location).
- Lighting line — e.g., “Warm afternoon light through window.” One sentence about lighting changes the mood.
- Body action + quoted dialogue — words in quotes are spoken by the generated character (native audio & lip sync).
- Delivery note — two-word pattern: mood + energy (e.g., “casual, curious”; “reflective, slightly tired”).
- Camera direction — handheld, medium close-up, shallow depth of field, dolly tracking, crane down, low angle, Dutch angle, etc.
- SFX: colon then list of sound elements (e.g., “SFX: traffic, distant chatter, footsteps on pavement”).
- Optional: tag elements reference (e.g., “Arya reference image from elements”) at the start to force consistent face.
Important prompt rules and vocabulary
- Dialogue rule: put spoken lines inside quotes — the model will generate native voice and lip movement.
- SFX rule: use “SFX:” and list specific ambient sounds. Per-shot soundscapes allow different audio worlds across cuts.
- Character consistency: upload labeled reference images to the Elements library and tag the element in the prompt. If you skip tagging, the model will generate a random face.
- Camera vocabulary (compact list used directly in prompts): handheld, dolly, crane, arc, lateral, crash push. These words control camera behavior/motion.
- Multi-shot storyboarding:
- Up to 6 shots per prompt sequence; maximum total ~15 seconds.
- Minimum ~2 seconds per shot.
- Each shot gets its own prompt, camera, and SFX; use logical shot progression (wide → medium → close).
- Action choreography: list physical moves in order and use speed modifiers (e.g., “slow motion as the kick connects; speed resumes on follow through”) for speed ramps.
- B-roll & macro: describe lens choice and framing (e.g., “extreme close-up, macro lens”) and add matching SFX for realism.
- Editing control: control where cuts happen by ending quoted dialogue at the desired cut point — the model generates the quoted audio exactly.
Blockquote note:
Dialogue inside quotes = generated native voice + lip sync. Use SFX: to define ambient audio for each shot.
Examples / tutorial items included in the video
- Full readout and line-by-line explanation of five interview-style prompts (examples include cafe conversation, street walking, office/Hindi prompt, tennis-court shoelace shot, classroom → stadium cut).
- Character-consistency walkthrough: how to create and tag an “element” (upload 3–5 images, label, tag in prompts).
- Camera vocabulary guide and examples (handheld, dolly tracking, crane down, low angle, Dutch angle, arc, lateral, crash push).
- Multi-shot storyboarding demo: up to six cuts with per-shot timings, per-shot audio worlds, and shot progression examples.
- Action/fight scene choreography prompt with speed ramp and per-movement SFX mapping.
- Multi-language prompting demonstration (Hindi example — write dialogue in the language and the model generates matching audio and lips).
- B-roll & macro technique (extreme close-up bee on flower with SFX).
- End-to-end ad build (Growth School commercial) showing eight different scenes and a four-shot cinematic car sequence told purely via SFX and shots (no dialogue).
- Practical tips: negative prompts for controlling unwanted features; iterative refinement (some shots took 2–3 tries); typical project time (the ad took roughly an hour from first prompt to finished ad).
Workflow & practical constraints
- One prompt generation can include video + synced audio + SFX + music — often replacing separate voice actors and post sound design.
- Character consistency requires tagging the saved element per generation. Without that tag, consistency is lost.
- Multi-shot limits: max 6 shots, min 2 seconds per shot, max total ~15 seconds.
- Some shots may require multiple generations (iterations of a few minutes each).
- Host provides copy-paste-ready prompts in a WhatsApp community and links them in the video description.
Analysis / takeaways
- Cling 3.0 + Higsfield translate higher-level cinematic direction (lighting, camera words, choreography, delivery shorthand, and SFX mapping) into usable, consistent short video outputs.
- The primary creative control now sits in precise prompt engineering: mood + delivery notes, body actions, camera vocabulary, and sound cues. Small specific details (lighting, props, micro-actions) drastically change perceived production value.
- The toolchain enables rapid prototyping across formats: talking-head interviews, multilingual spots, action choreography, B-roll macro shots, and multi-shot cinematic sequences — all from text and a few reference images.
Where to get the prompts / assets
- All prompts for the five interview prompts and the eight ad shots are available in the host’s WhatsApp community and linked in the video description.
- The entire ad was generated on Higsfield with Cling 3.0.
Speakers / sources
- Video host / creator: entrepreneur who has built startups and raised funding (subtitle reads “Seoia”; primary speaker reads and explains prompts). He is the source of the demonstrations and prompts.
- Technologies/platforms referenced: Cling 3.0 (AI video model) and Higsfield (platform and Elements library).
- The ad shown in the video is for the host’s company, Growth School.
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...