Summary of "Better than ElevenLabs, FREE & Unlimited... (Not Clickbait) 🤯"
Overview
- Demonstrates a free, unlimited open-source text-to-speech (TTS) system that can:
- Generate high-quality speech from text.
- Clone voices from just a few seconds of audio.
- Design custom voices by describing the desired sound.
- Shows two usage modes:
- Web interface — quick testing with some limits.
- Local installation — removes limits and gives full control.
Note: Some names in the video come from auto-generated subtitles and may be slightly different in the original project pages.
Key technologies and products shown
- Quen 3TS — open-source TTS model used for TTS, voice cloning, and custom voice design (name may be approximate).
- Pinocchio — installer / model manager used to download and install models and handle dependencies with a two-click workflow.
- Higsfield (Higsfield Cinema Studio) — sponsored tool demoed for converting images into cinematic video clips to pair with generated audio (name may be approximate).
Main features and capabilities
- Unlimited generation: Local install removes web limits and subscription walls.
- Voice cloning:
- Upload or record a short audio clip.
- Transcribe the exact reference text.
- Set target text and language.
- Use the recommended 1.7B base model to generate a faithful clone in seconds.
- Custom voice design:
- Create a voice from a detailed text description (tone, style, delivery) without any voice sample.
- Export and reuse the designed voice via cloning.
- Preset/custom voices: Pretrained custom voices available to use directly if you don’t want to clone.
- Models & sizing: Repeated recommendation to use the 1.7B model for best balance of cloning and customization.
- Save/load cloned voices: Save a cloned voice (including reference text and model) to avoid recloning.
- Generating long audio (two approaches):
- Chunking — paste text in parts, generate segments, then stitch them in an editor.
- Bootstrap cloning — generate a short sample with the desired voice/style, then use that sample as the clone reference to produce much longer audio.
- Style control: A “style” field controls delivery; good style prompts are key to natural, consistent results.
- Workflow tips: Exact reference transcription is crucial; keep the “latest model” selected and follow other recommended settings.
Video pairing workflow (image → cinematic video)
- Use an image-to-video pipeline (Higsfield Cinema Studio demo):
- Create images and animate them into 10-second cinematic clips.
- Use start/end frames to create controlled transitions.
- Generate multiple versions to choose the best one.
- Goal: produce consistent, filmic visuals that match high-quality voiceovers.
Use cases suggested
- YouTube voiceovers (backup when the creator is unavailable).
- Character voices for story-driven videos.
- Fast ad production with multiple voice variations and languages.
- Freelance/agency ad creation for clients.
Tutorial / guide (step-by-step)
Local install
- Install Pinocchio (choose your OS and continue through the installer).
- Open community/models, search for Quen 3TS, and follow the download buttons.
- Wait for the model to install and configure the recommended settings.
Voice cloning
- In the voice clone tab, select base = 1.7B and download it.
- Upload or record a reference sample.
- Transcribe the reference text exactly (use the built-in transcribe if available).
- Input the target text and select the language; ensure the latest model is selected.
- Generate and optionally save the cloned voice for reuse.
Custom voice design
- Download the voice design model (recommended: 1.7B).
- Describe the voice in detail in the design field (tone, cadence, delivery, etc.).
- Download the designed voice and import it into the voice clone workflow to generate speech.
Long-audio generation methods
- Chunk the text into parts and generate segments, then stitch externally.
- Or re-clone using a short sample to bootstrap a longer, consistent voice output.
Video pipeline
- Generate an image.
- Animate it and set camera/lens/focus or use the recommended settings.
- Create 10-second cinematic clips with the audio.
- Produce multiple versions and use start/end frames for controlled transitions.
Practical tips and best practices
- Always transcribe the reference audio exactly when cloning.
- Use model 1.7B for best cloning and custom-voice quality.
- Save cloned voices to reuse them instantly.
- Break long scripts into chunks or bootstrap with a short reference audio to avoid web/UI errors.
- Use detailed style prompts to control delivery and maintain consistency across segments.
- Pair quality voiceovers with consistent, cinematic visuals for best viewer retention.
Extras and resources
- The creator offers a free PDF containing direct links and the exact prompts/settings used in the video (available via a Discord bot; instructions shown in the video).
- The presenter teases more beta features coming soon and offers to create a part two if viewers request it.
Main speaker and sources
- Presenter / channel: Malva AI (referenced in the narration).
- TTS model: “Quen 3TS” (open-source TTS; name may vary due to auto-subtitle errors).
- Model/installer manager: Pinocchio.
- Video/visual tool (sponsor): Higsfield / Higsfield Cinema Studio.
Names above reflect the auto-generated subtitles and may differ slightly from the original video or tool pages.
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...