Mistral AI Open-Sources Leanstral 1.5: Apache-Licensed Formal Verification Model Finds Real Bugs

July 4, 2026 3 min read Mistral AI Partial Moderate

Tech Jacks Solutions AI News Coverage

Mistral AI released Leanstral 1.5 on July 2, 2026, an open-weights, Apache 2.0-licensed model built for Lean 4, the proof assistant used in formal software verification. The 119B-parameter model is free during beta and, according to Mistral AI, has already uncovered previously unknown bugs in open-source code.

mistral open-source-llm formal-verification lean-4 ai-models open-source-ai theorem-proving hugging-face

PutnamBench, 587 of 672 solved (vendor-reported)

Key Takeaways

Mistral AI released Leanstral 1.5 under Apache 2.0 on July 2, 2026, a 119B-parameter open-weights model built specifically for Lean 4 formal verification
Architecture is confirmed: MoE with 128 experts, 6.5B active parameters per token, 256k context window, free during beta
According to Mistral AI, the model found 5 previously unknown bugs across 57 open-source repositories, a practitioner test, not just a benchmark
All performance scores (miniF2F 100%, PutnamBench 587/672, FATE-H 87%) are vendor-reported; no independent evaluation has been published yet

Model Release

Leanstral-1.5-119B-A6B

OrganizationMistral AI

TypeOpen Source LLM

Parameters119B total / 6.5B active per token (MoE: 128 experts, 4 active)

Benchmark[SELF-REPORTED] miniF2F: 100%, PutnamBench: 587/672, per Mistral AI; no independent evaluation

AvailabilityFree beta via Mistral Labs API, Apache 2.0 license

Formal verification just got a lot more accessible. Mistral AI released Leanstral 1.5 on July 2, 2026, an open-weights model under the Apache 2.0 license, purpose-built for Lean 4, the proof assistant capable of expressing complex mathematical objects and software specifications including properties of Rust programs. It’s part of the Mistral Small 4 family and available free during beta through Mistral’s Labs API.

What the model actually is

The architecture is confirmed via the Hugging Face model card: a Mixture of Experts design with 128 experts and 4 active per token, totaling 119B parameters with 6.5B activated per inference. Context window is 256,000 tokens. Multimodal input, text and images, with text output. Recommended temperature is 1.0, and Mistral advises against using the reasoning mode for simple prompts while flagging it for complex ones. These architectural specifics are deposited facts from the model card itself, not marketing claims.

Why it matters

Formal verification has stayed a specialist discipline for a simple reason: it requires proof engineers fluent in theorem-proving languages. Most engineering teams don’t have them. A capable open-weights model that speaks Lean 4, carries an Apache 2.0 license, and runs free during beta removes the highest barrier, access cost, from the evaluation decision. Teams that couldn’t justify a dedicated proof engineer can now run a trial without budget approval.

According to Mistral AI’s own testing, Leanstral 1.5 identified 5 previously unknown bugs across 57 open-source repositories. That’s a practitioner signal worth taking seriously, even though it comes from the vendor. Bug discovery in real codebases is a harder, messier test than any benchmark, and it’s the one that actually maps to what engineering teams need.

Disputed Claim

100% on miniF2F, 587/672 on PutnamBench, 87% on FATE-H, 34% on FATE-X

All benchmark scores are self-reported by Mistral AI. Secondary coverage traces to the same announcement. No Epoch AI or third-party evaluation is available. The FATE benchmark is Mistral's own framework, not an established third-party standard.

Treat performance claims as vendor-reported until independent evaluation is published. Run your own Lean 4 tests against actual specifications before drawing workflow conclusions.

The benchmark story, read carefully

Self-reported benchmarks. Read carefully. According to Mistral AI, the model scores 100% on the miniF2F formal mathematics benchmark and solves 587 of 672 problems on PutnamBench, a competition mathematics dataset maintained by researchers at UT Austin. Mistral AI also reports 87% on its own FATE-H benchmark and 34% on FATE-X. None of these results have been independently evaluated. The T3 coverage repeating these figures all traces back to Mistral’s announcement. No Epoch AI evaluation exists yet. The miniF2F saturation and PutnamBench results describe performance on structured mathematical proof tasks, which is legitimately relevant to the model’s use case, but “100% on miniF2F” means the benchmark has hit its ceiling, not that the model is infallible in production.

What to watch

Independent evaluation is the next gate. Until Epoch AI or an academic group reproduces the PutnamBench and FATE results, treat all performance claims as vendor-reported. The FATE benchmark is Mistral’s own framework, it doesn’t have the same standing as PutnamBench, which has academic hosting at UT Austin. Watch for community testing via the Hugging Face model card, where developers will begin logging real-world results. If the bug-discovery rate holds up across broader repository testing, that’s the metric that will matter to engineering teams.

TJS synthesis

Don’t migrate production verification pipelines on vendor benchmarks alone. The Apache 2.0 license and free beta make the evaluation cost essentially zero, run it against your own Lean 4 specifications before drawing conclusions. The architecture is solid and the confirmed facts are genuinely interesting. What isn’t confirmed yet is whether the performance holds outside Mistral’s test conditions. Run your own evaluation. Wait for independent benchmarks before committing workflow changes.

What to Watch

Epoch AI or academic independent evaluation of Leanstral 1.5 benchmarks4-8 weeks

Community benchmark results emerging on Hugging Face model card2-4 weeks

Bug-discovery replication by external teams across broader repository sets4-6 weeks

Sources: Nyu, Mistral AI.

Sources: Huggingface, Nyu, Mistral AI.

View Source

More Technology intelligence

View all Technology

Gallery

Contacts