Google’s Gemini 2.5 Flash Native Audio Upgrade Boosts Voice AI Interactions

🔊 Soundcheck

  • Gemini gets a natural-sounding voice upgrade.

  • PolyAI Nets $86M Series D for Voice AI

  • ElevenLabs scores $100M, doubles valuation to drive voice AI and deepfake protection

Read time: 4 minutes

🔥 Hot Mic

Big moves, deep dives, and standout stories.

Google’s Gemini 2.5 Flash Native Audio delivers more natural, accurate, and context‑aware voice AI interactions.

Google has rolled out a major upgrade to its Gemini 2.5 Flash Native Audio model, making voice interactions feel noticeably more human. It handles interruptions and multi-turn conversations better, weaving real‑time responses with tools like Search and Translate. Accuracy in function calling and following instructions has jumped significantly, elevating Gemini’s voice agents across platforms.

This refinement reflects Google's ambition to close the gap between synthetic and natural dialogue. Developers can leverage these enhancements via APIs in Google AI Studio, Vertex AI, and Gemini Live, while users experience more fluent voice-driven tools like Search Live and Translate.

Key Points:

  • 21 % improvement in conversational fluidity and handling of interruptions

  • Function‑calling accuracy increased to 71.5 % in ComplexFuncBench Audio

  • Instruction adherence rose from 84 % to 90 %

  • Live speech‑to‑speech translation now preserves tone and pitch

Takeaway: Gemini 2.5 Flash Native Audio shifts the bar for voice AI, enabling conversations that feel genuine, responsive, and capable—blurring lines between human and machine interaction.

PolyAI raised $86M in Series D to scale its AI voice agents for enterprise customer service globally.

PolyAI, a UK‑based conversational AI startup, secured $86 million in a Series D round co‑led by Georgian, Hedosophia, and Khosla Ventures. The funding pushes its total raised past $200 million and includes support from NVentures, British Business Bank, Citi Ventures, and others. The company develops voice agents that handle complex customer interactions—payments, bookings, authentication—across 45 languages and serve over 100 enterprise clients worldwide. A Forrester‑commissioned study shows a 391% ROI for users, averaging $10.3 million in savings per client, while PolyAI’s agents now replace work equivalent to 1,000+ full‑time employees.

Key Points:

  • Series D funding of $86M co‑led by Georgian, Hedosophia, Khosla Ventures

  • Total funding now exceeds $200 million

  • Voice agents deployed across 45 languages for 100+ enterprises

  • Forrester study reports 391% ROI and average client savings of $10.3M

Takeaway: PolyAI’s fresh $86M boost not only underscores surging demand for human‑like voice automations in enterprise customer service—it also equips the company to supercharge its Agent Studio platform and global rollout at a time when scalable, smart, conversational agents are becoming mission‑critical.

ElevenLabs raised $100M, hitting a $6.6B valuation, expanding voice AI and fortifying deepfake defenses.

ElevenLabs just closed a $100 million funding round, pushing its valuation to $6.6 billion—double what it was nine months ago. The round was led by Sequoia and ICONIQ, with a16z and others joining in. The capital will fuel growth of its high-fidelity voice synthesis platform used across gaming and customer support, while also advancing AI agent capabilities and integration of voice technology into more interactive experiences.

The startup is also ramping up its defenses against misuse, introducing tools like watermarking, AI detection, and device authentication to fight deepfakes. On the horizon, ElevenLabs plans to expand into music, combine audio with video, and evolve from a voice tool provider into a comprehensive conversational AI platform.

Key Points:

  • Raised $100M funding round led by Sequoia and ICONIQ with a16z involved

  • Valuation now at $6.6B, twice what it was nine months prior

  • Expanding voice AI platform used for gaming, support bots and conversational agents

  • Deploying watermarking, deepfake detection, and device authentication for safety

Takeaway: With its new $100 million raise and $6.6 billion valuation, ElevenLabs is reinforcing both its leadership in voice AI and its commitment to combating deepfake misuse—signaling a shift from a tool provider toward a full conversational AI safety platform.

🎙️ Mic Drop

What else is making noise in voice AI.

Voice AI fraud caused a 475% rise in incidents last year, costing insurance businesses and consumers billions globally. (floydct.com)

AudioCodes, Atento roll out large-scale voice AI, modernizing enterprise IVR and supporting 500+ concurrent agents for healthcare. (investing.com)

Gradium’s $70M seed round signals strong interest in open-source, developer-focused, high-performance voice AI foundation models. (sifted.eu)

Notta’s B round supports scaling its robust speech recognition/NLP toolset—potential tech and partnership opportunities for global builders. (techinasia.com)

SuperBryn raises funds to address monitoring, evaluation, and reliability issues in enterprise-scale voice AI deployments. (cxodigitalpulse.com)

Public and enterprise awareness grows as 32% of survey respondents experienced attempted AI voice scams within six months. (newsbywire.com)

Detailed guide on integrating, comparing, and deploying best-in-class TTS models in React Native for futureproof voice apps. (vocal.media)

Google outlines recent improvements in Gemini TTS for nuanced control and functionality, opening new creative/technical applications. (blog.google)

Voice AI platform Recho secures new capital to accelerate product development and expand presence in Asian markets. (thebridge.jp)

Industry analysis highlights growing competition and evolving technology landscape in global AI voice generator markets. (openpr.com)

Explores evolution of conversational voice AI for business communications—from IVR to advanced, multi-platform agents. (technology.org)

Amazon withdraws poorly performing AI voice dubs, spotlighting challenges in quality and user acceptance of synthetic voice in entertainment. (benzinga.com)