March 26, 2026

Mistral AI launches Voxtral TTS, a 4B-parameter multilingual voice model

Mistral AI released Voxtral TTS on March 26, 2026, a 4B-parameter text-to-speech model supporting 9 languages. The model is available via API at $0.016 per 1,000 characters and as open weights on Hugging Face under a CC BY NC 4.0 license.

Subscribe to VoiceRadar on YouTube

What Voxtral TTS delivers

Voxtral TTS is a transformer-based, autoregressive, flow-matching model built on Ministral 3B. It consists of a 3.4B-parameter transformer decoder backbone, a 390M flow-matching acoustic transformer, and a 300M in-house neural audio codec operating at 12.5Hz frame rate.

The model supports English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic. It adapts to a custom voice from as little as 3 seconds of reference audio, capturing accent, intonation, rhythm, and emotional tone. It also supports zero-shot cross-lingual voice adaptation: a French voice prompt with English text produces natural French-accented English.

Preset voices are available in American English, British English, and French dialects. Custom voice libraries can be connected through the API.

How Mistral AI frames performance

Mistral AI reports that human evaluations by native speakers show Voxtral TTS achieves superior naturalness compared to ElevenLabs Flash v2.5 while maintaining similar time-to-first-audio. The company also claims parity with ElevenLabs v3 quality, with support for emotion steering.

The evaluation used two recognizable voices per language in a zero-shot custom voice context. Three annotators per pair performed side-by-side preference tests on naturalness, accent adherence, and acoustic similarity to the original reference.

Editorial note

VoiceRadar has not independently verified these claims. The benchmark methodology is Mistral AI's own, and independent third-party evaluations have not yet been published as of March 26, 2026. A detailed analysis of Voxtral TTS, including independent testing, will follow in a dedicated blog post.

Mistral AI on X: Introducing Voxtral TTS, our new frontier open-weight model for natural, expressive, and ultra-fast text-to-speech

Read the full announcement on X from @MistralAI

Pricing and availability

Voxtral TTS is available now via the Mistral API at $0.016 per 1,000 characters. At that price point, one million characters costs $16, positioning the model significantly below ElevenLabs and most commercial TTS providers.

The model can be tested in Mistral Studio and Le Chat. Open weights with several reference voices are published on Hugging Face under a CC BY NC 4.0 license, which permits non-commercial use.

Why it matters

This is Mistral AI’s first text-to-speech model and the first TTS release from a major European AI lab. The combination of open weights, aggressive pricing, and a compact 4B-parameter architecture targets enterprises that want to self-host or reduce their dependency on American TTS providers.

The model also completes Mistral AI’s audio pipeline: paired with Voxtral Transcribe (their speech-to-text model), enterprises can now build full speech-to-speech systems within a single vendor stack.

For the broader TTS market, Voxtral TTS adds a credible open-weights competitor at a price point that could pressure margins across the industry.

Source: Mistral AI announcement, March 26, 2026.

VoiceRadar’s take on Voxtral TTS

Watch our video analysis of Voxtral TTS, covering what the model delivers, how the pricing compares, and what it means for the TTS market.

Subscribe to VoiceRadar on YouTube