Summary of "Better than ElevenLabs, FREE & Unlimited... (Not Clickbait) 🤯"

Overview

Demonstrates a free, unlimited open-source text-to-speech (TTS) system that can:
- Generate high-quality speech from text.
- Clone voices from just a few seconds of audio.
- Design custom voices by describing the desired sound.
Shows two usage modes:
- Web interface — quick testing with some limits.
- Local installation — removes limits and gives full control.

Note: Some names in the video come from auto-generated subtitles and may be slightly different in the original project pages.

Quen 3TS — open-source TTS model used for TTS, voice cloning, and custom voice design (name may be approximate).
Pinocchio — installer / model manager used to download and install models and handle dependencies with a two-click workflow.
Higsfield (Higsfield Cinema Studio) — sponsored tool demoed for converting images into cinematic video clips to pair with generated audio (name may be approximate).

Unlimited generation: Local install removes web limits and subscription walls.
Voice cloning:
- Upload or record a short audio clip.
- Transcribe the exact reference text.
- Set target text and language.
- Use the recommended 1.7B base model to generate a faithful clone in seconds.
Custom voice design:
- Create a voice from a detailed text description (tone, style, delivery) without any voice sample.
- Export and reuse the designed voice via cloning.
Preset/custom voices: Pretrained custom voices available to use directly if you don’t want to clone.
Models & sizing: Repeated recommendation to use the 1.7B model for best balance of cloning and customization.
Save/load cloned voices: Save a cloned voice (including reference text and model) to avoid recloning.
Generating long audio (two approaches):
1. Chunking — paste text in parts, generate segments, then stitch them in an editor.
2. Bootstrap cloning — generate a short sample with the desired voice/style, then use that sample as the clone reference to produce much longer audio.
Style control: A “style” field controls delivery; good style prompts are key to natural, consistent results.
Workflow tips: Exact reference transcription is crucial; keep the “latest model” selected and follow other recommended settings.

Use an image-to-video pipeline (Higsfield Cinema Studio demo):
- Create images and animate them into 10-second cinematic clips.
- Use start/end frames to create controlled transitions.
- Generate multiple versions to choose the best one.
Goal: produce consistent, filmic visuals that match high-quality voiceovers.

In the voice clone tab, select base = 1.7B and download it.
Upload or record a reference sample.
Transcribe the reference text exactly (use the built-in transcribe if available).
Input the target text and select the language; ensure the latest model is selected.
Generate and optionally save the cloned voice for reuse.

Download the voice design model (recommended: 1.7B).
Describe the voice in detail in the design field (tone, cadence, delivery, etc.).
Download the designed voice and import it into the voice clone workflow to generate speech.

Chunk the text into parts and generate segments, then stitch externally.
Or re-clone using a short sample to bootstrap a longer, consistent voice output.

Always transcribe the reference audio exactly when cloning.
Use model 1.7B for best cloning and custom-voice quality.
Save cloned voices to reuse them instantly.
Break long scripts into chunks or bootstrap with a short reference audio to avoid web/UI errors.
Use detailed style prompts to control delivery and maintain consistency across segments.
Pair quality voiceovers with consistent, cinematic visuals for best viewer retention.

The creator offers a free PDF containing direct links and the exact prompts/settings used in the video (available via a Discord bot; instructions shown in the video).
The presenter teases more beta features coming soon and offers to create a part two if viewers request it.

Presenter / channel: Malva AI (referenced in the narration).
TTS model: “Quen 3TS” (open-source TTS; name may vary due to auto-subtitle errors).
Model/installer manager: Pinocchio.
Video/visual tool (sponsor): Higsfield / Higsfield Cinema Studio.