The official name is MAI-Thinking-1. Microsoft’s AI Superintelligence team under Mustafa Suleyman announced it at Build 2026 on June 2, 2026, closing the gap between the “Project Polaris” reporting that circulated earlier in the week and what the model actually is. The official announcement confirms a sparse Mixture of Experts architecture with 35 billion active parameters and approximately 1 trillion total, a 256,000-token context window, and availability in private preview through Azure AI Foundry, Baseten, Fireworks AI, and Open Router.
The data lineage angle is the story enterprises should read carefully. According to Baseten, which is one of the launch deployment partners, “MAI-Thinking-1 was trained from the ground up on curated, high-integrity data with zero distillation from third-party models.” Microsoft makes the same claim. Neither statement has been independently verified by a neutral party, but the framing is deliberate. “Zero distillation, commercially licensed” is language aimed directly at legal and procurement teams evaluating IP exposure in AI-generated outputs.
Benchmark scores are self-reported from Microsoft’s 109-page technical report. According to that report, MAI-Thinking-1 achieves 97.0% on AIME 2025 and 94.5% on AIME 2026. The catch: BenchLM.ai, an independent aggregator, currently shows Kimi K2.5 Reasoning at 96.1% as the AIME 2025 benchmark leader, not MAI-Thinking-1. That discrepancy hasn’t been resolved, and no Epoch AI evaluation is available yet. On SWE-Bench Pro, Microsoft’s report claims 52.8%. Note the benchmark name carefully: SWE-Bench Pro is distinct from the more widely cited SWE-bench Verified, where GPT 5.5 and Claude Opus 4.7 are scoring above 82%, not directly comparable figures.
Disputed Claim
In blind side-by-side evaluations commissioned by Microsoft and conducted by Surge, MAI-Thinking-1 was preferred over Claude Sonnet 4.6. Surge is a legitimate evaluation firm. The evaluation was vendor-funded, not independently commissioned, a meaningful distinction when assessing the result.
Pricing hasn’t been disclosed. Azure AI Foundry access is available in private preview. Don’t expect cost clarity before the public launch.
What to Watch
The part nobody mentions in most coverage: the BenchLM.ai discrepancy matters more than it might appear. If MAI-Thinking-1 genuinely scores 97.0% on AIME 2025, it’s the highest publicly reported score on that benchmark. Independent aggregator data not yet reflecting that result, weeks after the announcement, is worth tracking before teams make workflow migration decisions based on benchmark leadership claims.
Wait for Epoch AI evaluation before treating these benchmark scores as competitive facts. The data lineage positioning is worth taking seriously regardless of where the benchmarks land.