ElevenLabs, the prominent AI voice technology company, has unveiled Scribe v1, a groundbreaking speech-to-text model that sets new industry standards with unprecedented accuracy rates across 99 languages, achieving 96.7% accuracy for English transcription.
The new model, launched on February 26, 2025, outperforms leading competitors including Google’s Gemini 2.0 Flash, OpenAI’s Whisper v3, and Deepgram Nova-3 in converting spoken speech to text. This advancement represents a significant milestone in automated transcription technology.
Scribe’s capabilities extend beyond simple transcription. According to Flavio Schneider, ElevenLabs’ lead researcher, the model demonstrates sophisticated audio comprehension, detecting non-verbal elements such as laughter, sound effects, and background noise while maintaining accuracy in challenging acoustic environments.
A standout feature of Scribe is its advanced speaker diarization capability, allowing it to distinguish between up to 32 different speakers in a single audio file. This functionality, combined with word-level timestamps and structured transcript output, makes it particularly valuable for complex multi-speaker recordings.
The model’s impressive performance is evidenced by its benchmark results from FLEURS and Common Voice testing. It achieved remarkable accuracy rates across multiple languages, with Italian leading at 98.7% and English at 96.7%. This performance extends to previously underserved languages including Serbian, Cantonese, and Malayalam.
For enterprise users, Scribe offers immediate accessibility through ElevenLabs’ website and API, with competitive pricing at $0.40 per hour of input audio. The company is currently offering a 50% launch discount for the first six weeks, making it an attractive option for businesses requiring high-volume transcription services.
The timing of Scribe’s release coincides with rival Hume AI’s launch of Octave, their text-to-speech model, highlighting the intensifying competition in AI audio technology. While Octave focuses on emotional voice generation, Scribe’s emphasis on precise transcription addresses a different market need.
Looking ahead, ElevenLabs has announced plans to introduce a low-latency version of Scribe, which will enable real-time transcription applications. This development could significantly expand the model’s utility across various industries, from live broadcasting to real-time communication tools.
For businesses and organizations, Scribe represents a powerful solution for automated documentation, meeting transcription, and content accessibility. Its multi-language capabilities and high accuracy rates make it particularly valuable for multinational operations and companies requiring precise transcription services.
The launch of Scribe marks a significant advancement in speech recognition technology, setting new benchmarks for accuracy and functionality in the rapidly evolving field of AI-powered audio processing.