Key Takeaway
ElevenLabs has launched Dubbing v2, an AI dubbing system designed to preserve a speaker’s voice, tone, rhythm, pacing, and delivery across more than 90 languages. The release positions ElevenLabs more directly against AI video translation platforms, professional localization providers, and traditional dubbing studios.
ElevenLabs Dubbing v2 – Key Points
The Story
ElevenLabs Dubbing v2 is built to make audio and video localization faster, cheaper, and more natural than traditional dubbing workflows. Instead of using the conventional transcription-translation-synthesis chain, the system models the original voice performance directly to carry over vocal identity, pitch, tone, timing, emotion, and interpretation.
The product arrives as ElevenLabs expands beyond voice generation into a broader AI audio platform. It launched two days after Music v2, reinforcing the company’s strategy around voice, music, sound effects, dubbing, and production tools.
The Facts
- ElevenLabs says Dubbing v2 supports more than 90 languages and accents.
- The system can work with audio, video, and text inputs.
- Dubbing v2 is a foundational model upgrade that shifts from transcript-based generation to performance-conditioned synthesis.
- Automatic voice cloning creates a model of the original speaker without manual setup.
- The output is designed to preserve identity, pitch, tone, rhythm, pacing, delivery, energy, and emotion across target languages.
- Its sync-aware translation system adapts phrasing for spoken delivery while aligning starts, stops, and pacing with the original content.
- Directly modeling the source performance can reduce errors introduced by intermediate transcription steps.
- Users can refine delivery, adjust timing, and add music or sound effects inside ElevenLabs Studio.
- Dubbing v2 is available now through ElevenCreative and ElevenProductions.
- For the first 7 days of launch, ElevenLabs is offering 1 minute of free use on the Free plan, 15 minutes on the Starter plan, and 30 minutes on Creator+ plans.
- ElevenLabs has launched a Creator Dubbing Partner Program offering eligible creators discounted access to Dubbing v2.
- API access is coming soon and is expected to roll out first to select enterprise customers.
- ElevenProductions combines Dubbing v2 with professional localization services, including human translators, expert voice casting, and professional audio mixing.
- Model-specific terms restrict use in feature films, TV programs, scripted streaming productions, VOD platforms, theatrical releases, and other professional commercial entertainment unless ElevenLabs gives written enterprise authorization.
- The restriction does not apply to user-generated content platforms such as YouTube, TikTok, Instagram, and similar creator platforms.
What Is New
The main technical change is end-to-end, performance-conditioned dubbing. Earlier AI dubbing systems often depended heavily on transcripts, translation, and synthesized speech as separate steps. Dubbing v2 treats the original audio performance as the key input, allowing the system to preserve more of the speaker’s rhythm, emotion, hesitation, emphasis, pacing, energy, and breathing patterns.
Dubbing v2 also adapts translations for spoken delivery rather than direct word replacement. Different languages require different phrasing, sentence structure, and timing, so ElevenLabs uses sync-aware translation to keep localized speech aligned with the original content.
The Competitive Market
ElevenLabs is entering a crowded but fast-moving AI localization market.
HeyGen focuses strongly on AI video translation, voice cloning, subtitles, and lip-sync across more than 175 languages and dialects. RWS acquired Papercup’s AI dubbing technology in 2025 to strengthen enterprise video localization with human expertise. Deepdub also targets media, entertainment, live dubbing, and emotionally adaptive multilingual voice workflows.
ElevenLabs is competing from a voice-first position. Its advantage is not only translation coverage, but the quality and identity of the generated voice. Lip-sync remains important, especially for video creators and marketing teams, but ElevenLabs is making voice realism the center of the product.
The shift also puts pressure on traditional dubbing providers that rely on manual casting, recording, post-production, and segmented localization workflows. AI dubbing does not remove the need for review in high-value media, but it changes the cost and speed assumptions behind multilingual distribution.
Why Enterprises Will Watch This Closely
Traditional dubbing for professional media can involve translators, adapters, voice casting, studio recording, mixing, quality control, and delivery for each language. ElevenLabs says professional dubbing can cost hundreds of dollars per minute, especially when large production pipelines require multiple vendors and specialized audio teams.
The Platform Strategy
Dubbing v2 fits into ElevenLabs’ broader platform push. The company now offers tools across text-to-speech, speech-to-text, voice cloning, voice agents, sound effects, music generation, and dubbing through its creator, production, and API products.
Music v2, released two days before Dubbing v2, added stronger control over vocals, instrumentation, arrangement, multilingual lyrics, inpainting, full-song structure, and genre changes inside a track. ElevenLabs also cut Music API pricing by up to 50% and ElevenCreative self-serve pricing by up to 40%.
The direction is clear: ElevenLabs wants to become an AI audio infrastructure company, not only a voice generator. Dubbing v2 extends that thesis from voice replication into cross-language performance transfer.
What to Watch Next
The next signal will be rights management. The model-specific restrictions show that ElevenLabs is separating creator use from professional commercial entertainment use. That distinction could become important as studios, voice actors, and localization vendors negotiate how AI dubbing should be licensed and reviewed.
Why This Matters
AI dubbing could make global content distribution faster and more affordable, especially for creators, educators, companies, and media teams that cannot support traditional localization budgets. For end users, it could mean more videos, courses, podcasts, and entertainment available in their own language, while raising new questions about voice rights, consent, quality control, and the future of localization work.
This article was drafted with the assistance of generative AI. All facts and details were reviewed and confirmed by an editor prior to publication.
Google Gemini Omni can generate and edit videos from text, images, audio, and clips, with avatars, watermarking, Flow credits, and limits.
OpenAI is intensifying its audio AI push with a new model and audio-first devices planned for 2026, aiming to make voice the primary AI interface.
Thinking Machines previews Interaction Models for real-time voice, video and text AI, using full-duplex processing and benchmark gains.
Explore the vital role of AI chips in driving the AI revolution, from semiconductors to processors: key players, market dynamics, and future implications.
Read a comprehensive monthly roundup of the latest AI news!






