AI Models News: MAI-Thinking-1 Official, Microsoft's First In-House Reasoning Model Specs, Benchmarks, and Caveats

June 3, 2026 2 min read Microsoft AI Blog Partial Strong

Tech Jacks Solutions AI News Coverage

Microsoft officially launched MAI-Thinking-1 at Build 2026 on June 2, confirming the model previously reported as "Project Polaris", with a 35-billion active-parameter MoE architecture, a 256K-token context window, and a data lineage claim designed to address enterprise IP concerns. Benchmark scores are self-reported and one independent aggregator hasn't confirmed the flagship AIME 2025 figure.

ai-models-news generative-ai-news ai-announcements-today microsoft-ai-news moe-architecture azure-ai reasoning-models benchmark-verification

AIME 2025 (self-reported), 97.0%

Key Takeaways

Microsoft officially named MAI-Thinking-1 at Build 2026, confirming the "Project Polaris" reporting with a 35B active-parameter sparse MoE, 256K context window, and private preview in Azure AI Foundry
All benchmark scores (AIME 2025: 97.0%, AIME 2026: 94.5%, SWE-Bench Pro: 52.8%) are self-reported from Microsoft's technical report; BenchLM.ai aggregator data shows a different model leading AIME 2025, discrepancy unresolved
The "zero distillation, commercially licensed training data" claim is T3-corroborated (Baseten) but not independently verified; it's targeting enterprise IP and procurement concerns directly
Epoch AI evaluation is pending; no independent benchmark confirmation available yet, treat competitive claims with appropriate caution until third-party evaluation arrives

Model Release

MAI-Thinking-1

OrganizationMicrosoft AI Superintelligence

TypeLLM — Flagship

Parameters35B active / ~1T total (sparse MoE)

Benchmark[SELF-REPORTED] AIME 2025: 97.0% | AIME 2026: 94.5% | SWE-Bench Pro: 52.8%

AvailabilityPrivate preview, Azure AI Foundry, Baseten, Fireworks AI, Open Router

Verification

Partial Microsoft technical report (109 pages, self-reported benchmarks); Baseten T3 corroboration for zero-distillation claim; primary source URL dead per SVR No independent benchmark evaluation available. BenchLM.ai aggregator data conflicts with AIME 2025 claim. Epoch AI evaluation pending.

The official name is MAI-Thinking-1. Microsoft’s AI Superintelligence team under Mustafa Suleyman announced it at Build 2026 on June 2, 2026, closing the gap between the “Project Polaris” reporting that circulated earlier in the week and what the model actually is. The official announcement confirms a sparse Mixture of Experts architecture with 35 billion active parameters and approximately 1 trillion total, a 256,000-token context window, and availability in private preview through Azure AI Foundry, Baseten, Fireworks AI, and Open Router.

The data lineage angle is the story enterprises should read carefully. According to Baseten, which is one of the launch deployment partners, “MAI-Thinking-1 was trained from the ground up on curated, high-integrity data with zero distillation from third-party models.” Microsoft makes the same claim. Neither statement has been independently verified by a neutral party, but the framing is deliberate. “Zero distillation, commercially licensed” is language aimed directly at legal and procurement teams evaluating IP exposure in AI-generated outputs.

Benchmark scores are self-reported from Microsoft’s 109-page technical report. According to that report, MAI-Thinking-1 achieves 97.0% on AIME 2025 and 94.5% on AIME 2026. The catch: BenchLM.ai, an independent aggregator, currently shows Kimi K2.5 Reasoning at 96.1% as the AIME 2025 benchmark leader, not MAI-Thinking-1. That discrepancy hasn’t been resolved, and no Epoch AI evaluation is available yet. On SWE-Bench Pro, Microsoft’s report claims 52.8%. Note the benchmark name carefully: SWE-Bench Pro is distinct from the more widely cited SWE-bench Verified, where GPT 5.5 and Claude Opus 4.7 are scoring above 82%, not directly comparable figures.

Disputed Claim

MAI-Thinking-1 achieves 97.0% on AIME 2025, ranking at the top of the benchmark

BenchLM.ai independent aggregator data shows Kimi K2.5 Reasoning at 96.1% as current AIME 2025 leader, MAI-Thinking-1 not yet reflected. SWE-Bench Pro (52.8%) is a distinct evaluation from SWE-bench Verified (where leaders score 82%+); scores are not directly comparable.

Treat all benchmark figures as self-reported until Epoch AI or equivalent independent evaluation is published. Do not use these scores as the basis for competitive procurement decisions.

In blind side-by-side evaluations commissioned by Microsoft and conducted by Surge, MAI-Thinking-1 was preferred over Claude Sonnet 4.6. Surge is a legitimate evaluation firm. The evaluation was vendor-funded, not independently commissioned, a meaningful distinction when assessing the result.

Pricing hasn’t been disclosed. Azure AI Foundry access is available in private preview. Don’t expect cost clarity before the public launch.

What to Watch

Epoch AI independent evaluation of MAI-Thinking-1Weeks to months post-launch

Azure AI Foundry public availability and pricing disclosurePost-private-preview

BenchLM.ai AIME 2025 leaderboard update reflecting MAI-Thinking-1Ongoing

The part nobody mentions in most coverage: the BenchLM.ai discrepancy matters more than it might appear. If MAI-Thinking-1 genuinely scores 97.0% on AIME 2025, it’s the highest publicly reported score on that benchmark. Independent aggregator data not yet reflecting that result, weeks after the announcement, is worth tracking before teams make workflow migration decisions based on benchmark leadership claims.

Wait for Epoch AI evaluation before treating these benchmark scores as competitive facts. The data lineage positioning is worth taking seriously regardless of where the benchmarks land.