Vapi Secures $50M Series B

🔊 Soundcheck

  • Vapi Secures $50M Series B

  • Voice AI that listens, reasons, translates, transcribes live

  • Voice AI that truly listens and responds in real time

  • Voice AI Giant ElevenLabs Scales Up

Read time: 4 minutes

🔥 Hot Mic

Big moves, deep dives, and standout stories.

Vapi raised $50M in Series B led by Peak XV, reaching $500M valuation after 1B AI voice calls.

Voice AI startup Vapi just closed a $50 million Series B round led by Peak XV, with participation from M12, Kleiner Perkins, and Bessemer Venture Partners, bringing total funding to $72 million. Vapi has passed 1 billion AI voice calls and supports both enterprises and developers. Amazon Ring, among others, now routes 100% of its inbound calls through Vapi’s platform, and its developer community has grown beyond one million users.

Key Points:

  • Series B raised $50 million led by Peak XV

  • Total funding now totals $72 million

  • Company valuation around $500 million post-round

  • Over 1 billion AI voice calls handled yet

Takeaway: Vapi’s new funding and milestone of 1 billion calls underscore its transition into a high-scale, enterprise-grade voice AI infrastructure—trusted by both developers and major brands like Amazon Ring.

OpenAI launched GPT‑Realtime‑2 plus Translate and Whisper models for live multimodal voice agents.

OpenAI just unveiled a trio of real‑time voice models designed to turn voice from a gimmick into full agentic experiences. GPT‑Realtime‑2 can listen, reason, act, and adapt tone—all while a conversation unfolds. It handles interruptions, calls tools, and retains more context across long discussions. Complementing that, GPT‑Realtime‑Translate manages live translation across dozens of languages, and GPT‑Realtime‑Whisper delivers in‑flight transcription. The new Realtime API reduces latency and warms up voice interfaces for production, not just chatting. Beyond the tech, major companies like Zillow, Priceline, and Deutsche Telekom are already harnessing the new voice stack to build smarter assistants. OpenAI also boosted the context window to 128K tokens and added safety guardrails with EU data residency support. Pricing is clear‑cut: GPT‑Realtime‑2 charges per million audio tokens for input and output, while Translate and Whisper bill by the minute. Developers can test via the Playground or Codex right away.

Key Points:

  • GPT‑Realtime‑2 brings GPT‑5‑class reasoning into live voice conversations

  • Realtime‑Translate supports 70+ input and 13 output languages in real time

  • Realtime‑Whisper enables streaming speech‑to‑text for live transcription

  • 128K token context window improves conversation continuity and reasoning

Takeaway: OpenAI is pushing voice AI beyond novelty—real‑time reasoning, translation, and transcription now happen in one seamless loop, turning voice into a powerful interface for dynamic AI agents that can think and act as you speak.

Inworld’s new TTS‑2 uses real‑time audio context and natural‑language steering for emotionally adaptive speech.

Inworld AI just rolled out Realtime TTS‑2, their flagship voice model built for conversation rather than narration. It listens to how you're talking—tone, pacing, emotion—and adapts its delivery on the fly, creating far more natural exchanges. Developers can steer the model using ordinary language—“speak tired but warm”—and even drop nonverbal cues like [laugh] or [sigh] directly into text. Plus, it keeps the same voice identity even when switching across more than 100 languages in mid‑utterance, all with low-latency performance.

Key Points:

  • Adapts delivery using full audio context from prior turns

  • Natural‑language steering for tone, pacing, emotion inline

  • Preserves single voice identity across 100+ languages

  • Low‑latency, expressive speech optimized for conversation

Takeaway: Realtime TTS‑2 marks a major shift in voice AI—prioritizing conversational empathy, directability, and multilingual continuity to deliver speech that feels alive and adaptive.

ElevenLabs has broken past the $500 million annual recurring revenue mark in the first four months of 2026, marking a dramatic revenue surge from its year-end performance. At the same time, the company quietly closed an additional tranche in its ongoing Series D round, reinforcing investor confidence and positioning it for rapid global expansion.

This latest close builds on a February raise of $500 million at an $11 billion valuation, underscoring the firm’s fast-growing market presence. With new partners—ranging from financial heavyweights to creative names—on board, ElevenLabs is doubling down on infrastructure, product innovation, and international scale.

Key Points:

  • ARR exceeded $500 million within first four months of 2026

  • Secured further funding as third close in its Series D round

  • Valuation stands at $11 billion following February Series D

  • New investors include financial titans, enterprises, creative figures

Takeaway: ElevenLabs’ revenue acceleration and continued Series D momentum highlight voice AI’s transformation from niche feature to foundational enterprise infrastructure—positioning the company as a fast-scaling category leader.

🎙️ Mic Drop

What else is making noise in voice AI.

Known raises $10M to launch a voice-first dating app, spotlighting niche voice AI experiences beyond enterprise. (globaldatinginsights.com)

Basata lands $21M Series A to automate healthcare referrals via AI voice agents, targeting a major bottleneck in clinical practice. (theaiinsider.tech)

Thinking Machines demos near-realtime voice and video models, targeting latency and user experience breakthroughs. (venturebeat.com)

Krisp debuts VIVA 2.0—predictive, multilingual voice AI infrastructure for production-level voice agents and IVR. (01net.it)

Mashvisor rolls out MashGPT, an AI analyst for natural-language real estate Q&A, signaling use-case verticalization. (prnewswire.com)

SoundHound highlights multilingual agent platform amid M&A in Q1, despite reporting financial loss. (slator.com)

Videa's 'Ambient Intelligence' expands voice AI from dental documentation to full practice analytics. (businesswire.com)

GMI Cloud and Inworld demonstrate collaborative building with advanced voice AI in live developer workshops. (tipranks.com)

Smallest.ai and Tenstorrent claim 4x lower TTS/voice AI costs through hardware acceleration. (lokmattimes.com)

JobsUPI raises $250K to scale multilingual AI-powered hiring, expanding voice tech's reach in recruitment. (peoplematters.in)

UK health regions prioritize ambient voice technology and cybersecurity in FY2026 funding plans. (hsj.co.uk)

Mahindra leverages ElevenLabs voice AI for scaled, personalized outreach during its premium SUV launch. (cxodigitalpulse.com)

Inworld's TTS-2 offers contextually driven, real-time speech generation in 100+ languages for live dialog. (testingcatalog.com)