Summary of "That Isn't Me - How to Recognize Deepfakes and AI Generated Videos"

Technological concepts & what the video demonstrates

The video argues that long-form or highly dynamic deepfakes are still somewhat easier to spot, but simple deepfakes—like head replacement or a stationary person at a desk—have become very convincing over roughly the last 5 years.
Fully AI-generated videos can also be convincing, but they still show detectable “tells.” They may also require more complex assembly to sound natural.

Core pipeline (refined version of an older method):

Select a lookalike actor/body shape close to the target.
- The creator notes that when viewed on a TV, anomalies (e.g., beard/hat edges) are easier to see, while on a phone, they may look more plausible.
Train a DeepFaceLab model using ~7,000 recent images of the target face.
Lip-sync remains difficult, especially for accuracy with audio.
- To improve results, the creator:
  - keeps clips short (5–7 seconds)
  - uses fast edits and alternate angles/close-ups
  - splices multiple segments to achieve convincing continuity

Practical claim: this process can be far easier than before—they estimate ~100x easier than 5 years prior.

The creator used OpenArt.ai to test across models quickly because their first choice (Sora 2) wouldn’t generate content “with me in it.”
They explicitly say they are not recommending OpenArt.ai broadly (they mention it has received “deserved hate”), but they used it for the experiment.

Newer models can be more convincing (e.g., Google V3), but may have stricter safety/content guidelines.
They used a mix of models, including V3, plus “sprinkling of WAN 2.5 and Cling 2.1.”
Safety/guardrail issue: prompts may be flagged not safe for work unexpectedly.
- Workaround: use cloud AI to rewrite prompts into more “AI-friendly” versions to bypass guardrails.
Cost constraint: video generation uses token budget. They report discarding ~5 clips per usable one.

They suggest DIY pipelines are possible with ComfyUI and open-source models, but:
- results are less convincing
- performance is rough on consumer hardware
They add that progress is fast and may improve by the time viewers watch.

Unlike deepfakes, generated actors don’t naturally speak, so the workflow requires:
- separate audio generation
- alignment of visuals to audio
They tried lip-sync services, but they “fell apart” outside simple talking-head shots.
They used Fish Audio to generate multiple audio options and chose the closest match.

Losses are estimated at over $1 trillion in 2024 (attributed to Bitdefender).
A “scariest part” claim: many scams still rely on old-school text-to-voice phone calls, but AI could make them sound far more realistic—e.g., highly convincing messages designed to trick a worried family member.

Scammers can spend more time and money than creators on production, making malicious content harder to detect.

The creator emphasizes that producing convincing clips may only require:
- start and end keyframes
Those keyframes can be scraped from existing videos or Facebook photos, lowering the barrier.

The video highlights detection methods rooted in 3D lighting/geometry intuition, such as:
- shadow behavior
- vanishing points / perspective convergence
Claim: AI often renders 2D approximations of a 3D scene without fully respecting physics:
- shadows may not align with the implied light source
- perspective lines may fail to converge to a correct vanishing point

It’s not always easy to check quickly.
Some artifacts (e.g., extra/missing limbs) can be obvious, but sophistication is improving.
They suggest these weaknesses may be fixed soon (months rather than years, depending on model effort).

Be skeptical, especially of social media.
Don’t rely on suspicious content for trustworthy news.
Dig deeper:
- look for visual inconsistencies (e.g., weird shimmer, lighting mismatch)
- check the discussion/community context
Don’t respond to urgent requests for money/info or click sketchy links.
Safer tactic: call back using the number you already have to confirm identity.

Described as a global leader in cybersecurity
Claims over 17 years of AI/machine-learning threat detection experience (since 2008)
Notes that AI-driven scam patterns could also be detectable through replication patterns

Linus (video host; “Linus”/creator persona)
Professor Hanny Farid (source of detection strategies; referenced TED talk)
Nicholas Plove (appears as a person quoted in the deepfake test portion)
Emily (mentioned as “editing supervisor” inside the project)
Bitdefender (sponsor; referenced as data source for scam statistics and security product provider)
OpenArt.ai / DeepFaceLab / Sora 2 / ComfyUI / Fish Audio / Cloud AI (tools mentioned as sources/platforms for generating the media)