Changes from previous version
Expands from 25 to 43 languages versus MAI-Transcribe-1, adds contextual biasing, and improves noisy-audio robustness while matching prior hourly pricing.
Release Summary
Speech-to-text model with 4.9% average WER on FLEURS across 43 languages (automatic detection), contextual biasing for domain terminology, and ~5.7x lower latency than cited competitors. Outperforms Scribe v2, Whisper-large-v3, GPT-4o-Transcribe, and Gemini 3.1 Flash on many language benchmarks. Priced at $0.36 per hour via Azure Speech / Foundry.
Timeline
June 2, 2026
MAI-Transcribe-1.5 announced
Microsoft AI releases its latest transcription model with leading FLEURS and Artificial Analysis accuracy scores.