Changes from previous version
Expands from English-only MAI-Voice-1 to 15 languages with multilingual voice cloning at the same per-character price.
Release Summary
Multilingual text-to-speech with 15 languages, instant voice matching from short reference clips, expressive emotion control, and stable long-form output for audiobooks, podcasts, and lectures. Built-in guardrails require authorized, consented voices. Priced at $0.22 per 1M characters via MAI Playground and Azure Speech.
Timeline
June 2, 2026
MAI-Voice-2 announced
Microsoft AI launches expressive low-latency speech generation with multilingual voice adaptation.