Microsoft launches 3 new AI models in direct shot at OpenAI and Google
Microsoft on Thursday launched three new foundational AI models it built entirely in-house — a state-of-the-art speech transcription system, a voice generation engine, and an upgraded image creator — marking the most concrete evidence yet that the $3 trillion software giant intends to compete directly with OpenAI, Google, and other frontier labs on model development, not just distribution.
The trio of models — MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 — are available immediately through Microsoft Foundry and a new MAI Playground. They span three of the most commercially valuable modalities in enterprise AI: converting speech to text, generating realistic human voice, and creating images. Together, they represent the opening salvo from Microsoft's superintelligence team, which Suleyman formed just six months ago to pursue what he calls "AI self-sufficiency."
"I'm very excited that we've now got the first models out, which are the very best in the world for transcription," Suleyman told VentureBeat in an interview ahead of the public announcement. "Not only that, we're able to deliver the model with half the GPUs of the state-of-the-art competition."
The announcement lands at a precarious moment for Microsoft. The company's stock just closed its worst quarter since the 2008 financial crisis, as investors increasingly demand proof that hundreds of billions of dollars in AI infrastructure spending will translate into revenue. These models — priced aggressively and positioned to reduce Microsoft's own cost of goods sold — are Suleyman's first answer to that pressure.
Posted on: 4/4/2026 9:31:19 AM
|