ElevenLabs, the prominent AI voice technology company, has unveiled Scribe v1, a groundbreaking speech-to-text model that sets new industry standards with unprecedented accuracy rates across 99 languages, achieving 96.7% accuracy for English transcription.

The new model, launched on February 26, 2025, outperforms leading competitors including Google’s Gemini 2.0 Flash, OpenAI’s Whisper v3, and Deepgram Nova-3 in converting spoken speech to text. This advancement represents a significant milestone in automated transcription technology.

Scribe’s capabilities extend beyond simple transcription. According to Flavio Schneider, ElevenLabs’ lead researcher, the model demonstrates sophisticated audio comprehension, detecting non-verbal elements such as laughter, sound effects, and background noise while maintaining accuracy in challenging acoustic environments.

A standout feature of Scribe is its advanced speaker diarization capability, allowing it to distinguish between up to 32 different speakers in a single audio file. This functionality, combined with word-level timestamps and structured transcript output, makes it particularly valuable for complex multi-speaker recordings.

The model’s impressive performance is evidenced by its benchmark results from FLEURS and Common Voice testing. It achieved remarkable accuracy rates across multiple languages, with Italian leading at 98.7% and English at 96.7%. This performance extends to previously underserved languages including Serbian, Cantonese, and Malayalam.

For enterprise users, Scribe offers immediate accessibility through ElevenLabs’ website and API, with competitive pricing at $0.40 per hour of input audio. The company is currently offering a 50% launch discount for the first six weeks, making it an attractive option for businesses requiring high-volume transcription services.

The timing of Scribe’s release coincides with rival Hume AI’s launch of Octave, their text-to-speech model, highlighting the intensifying competition in AI audio technology. While Octave focuses on emotional voice generation, Scribe’s emphasis on precise transcription addresses a different market need.

Looking ahead, ElevenLabs has announced plans to introduce a low-latency version of Scribe, which will enable real-time transcription applications. This development could significantly expand the model’s utility across various industries, from live broadcasting to real-time communication tools.

For businesses and organizations, Scribe represents a powerful solution for automated documentation, meeting transcription, and content accessibility. Its multi-language capabilities and high accuracy rates make it particularly valuable for multinational operations and companies requiring precise transcription services.

The launch of Scribe marks a significant advancement in speech recognition technology, setting new benchmarks for accuracy and functionality in the rapidly evolving field of AI-powered audio processing. With its state-of-the-art algorithms, Scribe enables users to transcribe audio with unprecedented precision, catering to a wide range of industries from healthcare to media. As tech giants encounter AI challenges related to scalability and real-time processing, Scribe’s innovative features promise to streamline workflows and improve overall productivity. This breakthrough not only enhances user experience but also paves the way for future developments in automated communication tools.

News Source: https://venturebeat.com/ai/elevenlabs-new-speech-to-text-model-scribe-is-here-with-highest-accuracy-rate-so-far-96-7-for-english/

Author Profile

Bukola Anifowose

Bukola is a writer who loves exploring technology and the power of storytelling. She combines creativity with data-driven insights to craft meaningful narratives. In her free time, she enjoys watching movies and appreciating great stories on screen.

View all posts

ElevenLabs Launches Scribe: Breakthrough Speech-to-Text Model Achieves 96.7% Accuracy

Author Profile

Related posts: