- The AI Voice Newsletter
- Posts
- Mistral’s Free Voxtral TTS Tops ElevenLabs
Mistral’s Free Voxtral TTS Tops ElevenLabs

🔊 Soundcheck
Mistral’s Free Voxtral TTS Tops ElevenLabs
Gemini 3.1 Flash Live: Google's most lifelike AI voice yet
Speechify Brings Voice AI to Windows
Vonage Launches AI Voice in ServiceNow
Read time: 4 minutes
🔥 Hot Mic
Big moves, deep dives, and standout stories.
Mistral AI has unveiled Voxtral TTS, a high-quality, open‑weight text‑to‑speech model targeting developers and enterprises. It delivers competitive, multilingual voice synthesis and gives away model weights for free, breaking down cost and deployment barriers. Human preference tests show Voxtral won 68.4 % over ElevenLabs Flash v2.5, a strong indicator of its sound quality and effectiveness. Designed for real‑time applications, Voxtral supports nine languages and enables on‑premise deployment for privacy and cost control.
Key Points:
Open‑weight TTS model freely available for self‑hosting or enterprise use
Beat ElevenLabs Flash v2.5 in preference tests 68.4 % of the time
Supports nine languages with zero‑shot voice cloning
Optimized for low latency and streaming in real‑time systems
Takeaway: Voxtral TTS signals a major shift in voice AI: a model that’s open, high‑quality, low‑cost, multilingual, and self‑hostable, challenging closed proprietary platforms head‑on.
Google introduces Gemini 3.1 Flash Live, delivering faster, more natural, and emotion-aware real‑time voice interactions.
Google just rolled out Gemini 3.1 Flash Live on March 26, 2026, marking its most advanced real‑time voice model to date. It's designed for smoother, more natural dialogue, capable of sensing pitch and emotion, and performs reliably even in noisy environments. Developers get configurable thinking levels, empowering them to balance speed and response quality. Under high thinking settings, it scores 95.9 percent on Big Bench Audio—just behind the top performer—while minimal thinking drops quality but slashes response time. This model now powers live modes in the Gemini app and Search Live, pushing voice UX closer to human‑like conversation.
Key Points:
Launched March 26, 2026 as Google's top real-time voice model.
Achieves 95.9 % on Big Bench Audio with high thinking level.
Minimal thinking yields quicker responses (~0.96 sec), lower quality (≈70.5 %).
Recognizes pitch and emotions, works well in noisy conditions.
Takeaway: Gemini 3.1 Flash Live bridges the gap between AI and human speech, offering configurable voice agents that feel natural, responsive, and emotionally aware—ideal for seamless, scalable voice experiences.
Speechify’s new Windows app delivers real‑time text‑to‑speech and voice typing with on‑device or cloud AI choice.
Speechify just launched its new Windows application, bringing real‑time text‑to‑speech and voice typing to over a billion Windows users. Crucially, users can select between on‑device or cloud processing, with instant switching and privacy preserved when processing happens locally. The app taps into the Windows ML stack and ONNX Runtime to run AI models across x64, Arm64, GPU, and NPU hardware. It’s built natively for Windows using WinUI3 and deep system APIs, offering robust integration and seamless performance. The app is available now via the Microsoft Store, marking a big step in Speechify’s growth on enterprise Windows devices.
Key Points:
Over 1B Windows users now supported
Real‑time text‑to‑speech and voice typing
Choice of on‑device or cloud AI with instant switching
Native Windows integration via WinUI3 and ONNX Runtime
Takeaway: Speechify’s Windows launch delivers powerful, privacy‑focused voice AI built right into the platform—balancing performance, flexibility, and seamless user experience for professionals everywhere.
Vonage has introduced a native integration with ServiceNow Voice, powered by the ServiceNow AI Platform. This addition embeds real-time voice and AI capabilities into ServiceNow’s Customer Service Management and IT Service Management workflows, so agents can streamline tasks and handle cases without toggling between tools.
With this integration, calls can now trigger incident categorization, launch subflows, and update issue resolutions instantly. It enriches AI context by feeding structured voice data straight into ServiceNow records, boosting generative AI tools like Now Assist. And agents stay in a unified interface—no more switching screens or duplicating effort.
Key Points:
Native integration embeds AI voice into ServiceNow CSM and ITSM workflows.
Calls automatically trigger incident categorization and Flow Designer actions.
Real-time transcription and case updates happen without leaving ServiceNow.
Structured voice data enhances Now Assist’s generative AI context.
Takeaway: Vonage’s integration means enterprise agents can work smarter, not harder—voice becomes part of the workflow, not a detour. By embedding AI-powered voice directly into ServiceNow, it boosts productivity and elevates context for smarter automation, all within a seamless agent experience.
🎙️ Mic Drop
What else is making noise in voice AI.
Mistral’s new Voxtral TTS provides enterprise-grade, multilingual text-to-speech with open weights for commercial use. (theaiinsider.tech)
Mistral’s TTS model enables businesses to build scalable voice agents, intensifying competition with ElevenLabs and Deepgram. (techcrunch.com)
Voxtral TTS supports low-latency, high-quality voice synthesis, enabling fast deployment for multilingual voice AI projects. (marktechpost.com)
Mistral’s entry into voice AI accelerates competition with ElevenLabs, underlining how rapidly the sector is evolving and attracting top talent. (sifted.eu)
Reveal enhances enterprise applications by embedding conversational AI analytics, driving actionable voice-driven insights for businesses. (martechcube.com)
Simply Speak™ introduces a real-time conversational AI layer for enterprise data access, emphasizing security and structured outputs. (nationaltoday.com)
Ship.Cars integrates AI voice agents, streamlining automotive logistics with automated, scalable voice-driven workflows. (autoremarketing.com)
Numerator’s Nexa brings conversational analytics for real-time consumer insights, automating data queries via natural language. (finance.yahoo.com)
Report highlights growing enterprise investments in voice AI agents, presenting an updated 2026 leaderboard for platform performance. (aijourn.com)
AI voice tech enables fan-created content for Pinkfong’s 'REDREX,' illustrating content IP strategies with voice AI. (licenseglobal.com)
Novel use case: AI voices automate thousands of calls for real-world data collection and market research. (zmescience.com)
Zapier leverages voice-AI platform Ezra for automated, interactive candidate screening, reducing hiring friction for HR teams. (benefitnews.com)
RES0N8 secures €5m to build adaptive speech-to-text models tailored for diverse European languages and regional markets. (slator.com)
Gupshup demonstrates practical retail use of WhatsApp and voice AI for customer engagement and commerce. (tipranks.com)