JetBrains Open-Sources Mellum2: 12B MoE Coding Model With 2.5B Active Parameters, and a Benchmark Split Worth Knowing

June 3, 2026 3 min read JetBrains Qualified Moderate

Tech Jacks Solutions AI News Coverage

JetBrains has released Mellum2 as an open-weight model under the Apache 2.0 license, a 12-billion-parameter Mixture-of-Experts architecture designed for agentic developer workflows. The honest headline: it's fast on coding tasks and trails a smaller dense model on math, and that tradeoff is the whole story.

jetbrains mellum2 open-source-ai mixture-of-experts coding-models agentic-ai llm-release apache-2

LiveCodeBench v6, 69.9% (vendor-reported)

Key Takeaways

Mellum2 activates only 2.5B of its 12B parameters per token via an MoE architecture (64 experts, 8 active), confirmed via JetBrains' arXiv technical paper
According to JetBrains' technical report, Mellum2 scores 69.9% on LiveCodeBench v6; the RLVR Thinking variant scores 58.4% on AIME, trailing Qwen3.5-4B (68.3%) on the same eval
JetBrains released Mellum2 under Apache 2.0 with native vLLM support; early community reports flag Ollama compatibility issues with the custom MoE architecture
All benchmark figures are vendor-reported via JetBrains' own technical report, independent evaluation is not yet available

Model Release

Mellum2

OrganizationJetBrains

TypeOpen Source LLM

Parameters12B total / 2.5B active per token (64 experts, 8 active)

Benchmark[SELF-REPORTED] LiveCodeBench v6: 69.9% | AIME (Thinking variant): 58.4%

AvailabilityOpen-weight, Apache 2.0, vLLM native, Transformers supported

Verification

Qualified JetBrains arXiv technical report (2605.31268); JetBrains blog (content not retrieved) All benchmark figures are vendor-reported. Hugging Face model card unavailable. No independent evaluation at time of publication.

Self-reported benchmarks. Read carefully.

JetBrains released Mellum2 as an open-weight model under the Apache 2.0 license on June 1, targeting developers who want a local inference option for coding, debugging, multi-step reasoning, tool use, and agentic workflows. The architecture is a 12-billion-parameter Mixture-of-Experts model. That number understates what actually runs per inference: Mellum2 activates only 2.5 billion parameters per token, routing each token to 8 of its 64 expert subnetworks. Less compute per forward pass, lower latency per token, that’s the MoE argument for developer tooling, where you’re generating dozens of code completions per session rather than one long essay.

The architecture is confirmed. JetBrains’ technical paper on arXiv corroborates the 12B/2.5B active, 64-expert/8-active structure, along with Grouped-Query Attention using 4 KV heads and Sliding Window Attention. JetBrains reports a 128K context window and approximately 10.6 trillion tokens in the training dataset, both figures from the vendor’s technical report, not independently verified.

AIME 2025+2026 Score (per JetBrains technical report)

Mellum2-Thinking (12B MoE)

58.4%

Qwen3.5-4B (dense)

68.3%

Disputed Claim

Mellum2 delivers up to 2x faster inference than dense models of comparable parameter count

Vendor claim, not confirmed in available arXiv abstract content; no independent speed benchmark available

Treat as unverified until an independent inference benchmark reproduces the figure on your target hardware

The benchmark profile is where this gets interesting. According to JetBrains’ technical report, Mellum2 scores 69.9% on LiveCodeBench v6. For a coding-specialized model, that’s the number that matters most to its target audience. JetBrains also reports a “Thinking” variant trained via RLVR, which scores 58.4% on AIME 2025 and 2026. The catch is that Qwen3.5-4B, a general-purpose dense model less than a third of Mellum2’s total parameter count, reportedly scores 68.3% on the same AIME evaluation, per the same technical report. A smaller dense model outperforming a larger MoE on mathematical reasoning. JetBrains disclosed this comparison themselves. That’s worth noting.

JetBrains claims up to 2x faster inference compared to dense models of similar parameter count. This figure appears in the vendor’s announcement but wasn’t confirmed in the arXiv abstract content available for review, treat it as a vendor claim until independent benchmarks emerge.

Deployment is functional but not frictionless. vLLM supports the model natively. Standard Transformers-based pipelines work but may carry overhead from the custom architecture. Early community reports suggest compatibility challenges with Ollama due to that custom MoE structure, if you’re running local inference on Ollama, verify compatibility before building around it. The Hugging Face model card would normally clarify this, but that source was unavailable at the time of this brief’s production.

What to Watch

Independent LiveCodeBench v6 reproduction4-6 weeks

Ollama compatibility resolution or official guidance from JetBrains2-3 weeks

Epoch AI model evaluation listing for Mellum26-8 weeks

What to watch: Independent benchmark evaluation of the LiveCodeBench 69.9% figure is the key trigger. JetBrains published this via their own technical report, which is appropriate and transparent, but the performance claim hasn’t cleared an independent reproduction. Epoch AI coverage or a community reproduction on a standardized harness would move this from vendor-qualified to confirmed. Watch for those in the weeks following release.

TJS synthesis: Don’t deploy this model for mathematical reasoning tasks where Qwen3.5-4B is already in your stack. For coding-specific agentic pipelines, code generation, editing, debugging, tool-use sequences, Mellum2’s MoE architecture makes a credible case on latency grounds, and Apache 2.0 removes the license friction. Wait for independent LiveCodeBench confirmation before replacing your current coding model. If vLLM is already your inference layer, this is worth a controlled evaluation this month.