The number Microsoft led with was 35 billion. It’s not wrong. It’s incomplete.
Following Microsoft’s Build 2026 announcement on June 2, technical documentation from Microsoft AI clarifies the full picture: MAI-Thinking-1 is a sparse Mixture of Experts model with approximately 1 trillion total parameters, of which 35 billion are active at inference time. A companion model, MAI-Code-1-Flash, carries 5 billion active parameters and is built directly into GitHub Copilot and VS Code.
The MoE distinction matters immediately if you’re evaluating whether to use this model. Sparse MoE architectures activate only a fraction of total parameters per forward pass, that’s where the 35B active figure comes from. The performance ceiling can approach that of a much larger dense model. The inference cost is closer to the active count, not the total. But the total parameter count shapes memory requirements, serving infrastructure, and whether the model fits your deployment environment. Community reports suggest approximately 16.5GB VRAM at 4K context; Microsoft hasn’t published official inference requirements. That’s the number to confirm before you commit.
On benchmarks: Microsoft’s technical report states MAI-Thinking-1 achieves 52.8% on SWE-Bench Pro. The company also reports the model outperforms Claude 3.5 Sonnet in internal blind evaluations and matches Claude Opus on SWE-Bench Pro. These are self-reported figures from Microsoft’s own evaluation, Epoch AI has not yet published an independent assessment. Until that evaluation lands, treat the comparisons to Anthropic’s models as vendor claims, not established rankings.
Disputed Claim
Microsoft states the model was trained entirely from scratch on commercially licensed data, with no distillation from third-party frontier models. The company highlights this as clean commercial IP, a relevant signal for enterprise procurement teams with legal exposure to model supply chain questions. It’s a vendor-stated claim with no independent verification yet, but it’s the kind of claim that would carry significant legal consequences if false.
MAI-Code-1-Flash integrates natively into GitHub Copilot and VS Code as a lightweight agentic model. Microsoft’s description positions it as purpose-built for engineering workflows, not a general-purpose model that happens to code.
The part nobody mentions in the announcement materials: MAI-Thinking-1 is currently available only to select early access partners via Microsoft Foundry. Pricing hasn’t been disclosed. For most enterprise teams, this isn’t a deployment decision yet, it’s an evaluation decision. The question is whether to get into the early access queue.
Unanswered Questions
- What are the official VRAM and inference infrastructure requirements for MAI-Thinking-1?
- When will Epoch AI publish an independent evaluation?
- What pricing model will Microsoft use for MAI-Thinking-1 API access via Foundry?
What to watch
Epoch AI’s independent benchmark evaluation is the pivotal data point. When that publishes, the SWE-Bench Pro comparisons to Claude Opus become verifiable. Until then, the 52.8% figure is the only number you can anchor to, and it comes from the vendor.
TJS synthesis: Don’t migrate GitHub Copilot configurations based on the announcement benchmarks alone. The MoE architecture is genuinely interesting, active parameter efficiency at near-dense-model performance is the right engineering direction. But vendor-only benchmark comparisons against a named competitor are the least reliable number in any launch package. Wait for independent evaluation before adjusting your Copilot stack. If you’re in early access, prioritize running your own SWE-Bench evaluation on your actual codebase. That’s the benchmark that matters.