LIVE
HuggingFaceHuggingFace launches CUGA: lightweight harness for agentic apps·OpenAIOmio Uses OpenAI to Build Conversational Travel Experiences·HuggingFacePP-OCRv6 Arrives on Hugging Face: 50 Languages, Tiny to Medium Models·OpenAISamsung equips 100,000+ employees with ChatGPT Enterprise·OpenAIOpenAI Rolls Out Spend Controls and Analytics for ChatGPT Enterprise·HuggingFaceMosaicLeaks Benchmark Exposes Research Agents' Inability to Keep Secrets·Google AIGoogle's AMIE Medical AI Matches Doctors in Disease Management·HuggingFaceMolmoMotion: Language-Guided 3D Motion Forecasting Hits HuggingFace·DeepMindDeepMind and UK government build AI prototype to speed housing decisions·HuggingFaceHugging Face lets you deploy robot policies from Hub to real hardware·OpenAIOpenAI's Deployment Simulation predicts model behavior before launch·Google AIGoogle invests $1.5B in Alabama data center expansion·OpenAIOpenAI launches Partner Network with $150M investment fund·OpenAIOpenAI launches three Agent Academy courses for workplace AI skills·DeepMindDeepMind's DiffusionGemma speeds text generation 4x·Google AIGoogle pours community funds into Virginia jobs and energy·OpenAIPreply uses OpenAI to generate AI lesson summaries for tutors·HuggingFaceHuggingFace Details PyTorch Profiling for Fused MLP Layers·DeepMindGemini 3.5 Live Translate delivers fluid natural speech translation·HuggingFaceHuggingFace benchmarks code-switched ASR: OpenAI, Google, Meta fail hard·HuggingFaceHuggingFace launches CUGA: lightweight harness for agentic apps·OpenAIOmio Uses OpenAI to Build Conversational Travel Experiences·HuggingFacePP-OCRv6 Arrives on Hugging Face: 50 Languages, Tiny to Medium Models·OpenAISamsung equips 100,000+ employees with ChatGPT Enterprise·OpenAIOpenAI Rolls Out Spend Controls and Analytics for ChatGPT Enterprise·HuggingFaceMosaicLeaks Benchmark Exposes Research Agents' Inability to Keep Secrets·Google AIGoogle's AMIE Medical AI Matches Doctors in Disease Management·HuggingFaceMolmoMotion: Language-Guided 3D Motion Forecasting Hits HuggingFace·DeepMindDeepMind and UK government build AI prototype to speed housing decisions·HuggingFaceHugging Face lets you deploy robot policies from Hub to real hardware·OpenAIOpenAI's Deployment Simulation predicts model behavior before launch·Google AIGoogle invests $1.5B in Alabama data center expansion·OpenAIOpenAI launches Partner Network with $150M investment fund·OpenAIOpenAI launches three Agent Academy courses for workplace AI skills·DeepMindDeepMind's DiffusionGemma speeds text generation 4x·Google AIGoogle pours community funds into Virginia jobs and energy·OpenAIPreply uses OpenAI to generate AI lesson summaries for tutors·HuggingFaceHuggingFace Details PyTorch Profiling for Fused MLP Layers·DeepMindGemini 3.5 Live Translate delivers fluid natural speech translation·HuggingFaceHuggingFace benchmarks code-switched ASR: OpenAI, Google, Meta fail hard·
Back
🔷DeepMind
Product/DeepMind

Gemini 3.5 Live Translate delivers fluid natural speech translation

D

DeepMind

June 10, 2026

2 MIN

Original source

deepmind.google — read the full announcement →

Google and DeepMind launch Gemini 3.5 Live Translate

DeepMind just dropped Gemini 3.5 Live Translate into Google AI Studio, Google Translate, and Google Meet. The headline feature is near real-time, natural speech translation — meaning it preserves tone, intonation, and even pauses rather than spitting out flat, robotic audio. Under the hood, it's built on the same foundation as the Gemini 3.5 LLM, but optimized for low-latency streaming. DeepMind claims end-to-end latency under 500 milliseconds. That's fast enough to feel conversational. The output is so fluid you'd think a human interpreter had whispered in your ear — if that human had perfect recall and no accent.

Why this leap matters for voice translation

Until now, voice translation has been a compromise. Google Translate's old system sounded like a monotone telemarketer. Microsoft's solutions added a second of delay that broke the rhythm of conversation. Startup players like DeepL dabbled but never cracked natural prosody. The breakthrough here is that Gemini 3.5 Live Translate doesn't just translate words — it learns to map speech melody from source to target language. The model was trained on tens of thousands of hours of bilingual speech with aligned pitch and rhythm. That's a dataset nobody else has published. Since last year, end-to-end neural speech translation has been the holy grail. DeepMind just shipped it.

What this means for users and Google's ecosystem

For the average user, the impact is straightforward: fewer awkward translation pauses in Google Meet, smoother conversations on the Translate app, and an AI Studio API that lets developers embed natural voice translation into any product. Imagine a customer support bot that doesn't sound like a fax machine. Or a real-time interpreter for a 50-person global all-hands. The cost reduction is real: if you were paying for human interpreters at $100/hour, these API calls will cost cents. That said, the real winner here is Google's ecosystem lock-in. Once teams get used to fluid multilingual meetings in Meet, they're not leaving. It's a feature that turns a product into a habit.

The unknowns: languages, latency, and privacy

None of this is perfect. DeepMind hasn't published the full language list — early reports suggest maybe 30 languages, with heavy European skew. Code-switching (mixing languages mid-sentence) likely breaks the system. Background noise and heavy accents? Unknown. Also critical: the audio passes through Google's servers. Enterprise customers with strict data sovereignty rules may balk. And while 500ms latency is impressive, it's still not real-time — try having a heated debate where every reply lags by half a second and you'll feel the friction. The big question: can this scale to 100+ languages without quality degradation? That's the barrier to true global adoption.

Watch video
Video thumbnail
Click to play

Frequently Asked Questions

How is Gemini 3.5 Live Translate different from existing features?

Previous voice translation systems used a cascade of speech recognition, text translation, and speech synthesis — each step added delay and flattened expressiveness. Gemini 3.5 Live Translate is an end-to-end neural model that processes audio directly, preserving natural prosody and reducing latency to under half a second.

What languages are supported at launch?

Is this feature free?

For consumer apps like Google Translate and Google Meet, it's free with your Google account — though Meet's real-time translation may be limited to premium tiers. Developers using Google AI Studio will pay per API call, with pricing expected to undercut human interpretation significantly.

Does it work offline or require continuous internet?

The current implementation is cloud-based — all audio processing happens on Google's servers. No offline mode announced. For privacy-sensitive use cases, that's a limitation. On-device processing would require smaller models and is likely a future priority.

Can I integrate Gemini 3.5 Live Translate into my own app?

Yes, through Google AI Studio's API. Developers can stream audio in and receive natural-sounding translated speech. The API supports multiple output formats and custom voice styles. Expect documentation and sample code to drop alongside the public release.

↑ SWIPE FOR NEXT