Agentic AI News: Mistral Medium 3.5 Hits GA, Pricing, Platform Replacement, and What the Benchmarks Actually Tell You

May 7, 2026 3 min read Hugging Face (Mistral model card); docs.mistral.ai Partial Weak

Tech Jacks Solutions AI News Coverage

Mistral Medium 3.5 moved from preview to general availability on May 5, bringing API pricing and confirmed platform replacements that give enterprise teams the production-readiness data the May 3 preview couldn't provide. The model replaces Devstral 2 in Vibe and Mistral Medium 3.1 in Le Chat, a lineage shift with real evaluation implications for teams already running those tools.

agentic-ai mistral mistral-medium-3-5 open-source-llm coding-agents ai-benchmarks vibe-agents enterprise-ai

128B dense model, 256k context, GA May 5, 2026

Key Takeaways

Mistral Medium 3.5 is GA as of May 5, 128B dense model with 256k context window, confirmed across multiple independent sources
The model replaces Devstral 2 in Vibe and Mistral Medium 3.1 and Magistral in Le Chat, teams on either platform face a forced evaluation decision
Mistral reports 77.6% on SWE-Bench Verified, but independent sources have flagged the benchmark's reliability; treat the figure as directional, not definitive
API pricing has not been independently confirmed, verify against docs.mistral.ai before using in procurement decisions

Two days after the preview, the decision window opened. Mistral Medium 3.5 reached general availability on May 5, and the story moved from “interesting architecture announcement” to “do we run this in production?” That shift matters because the GA release brought three things the preview didn’t: confirmed API pricing, explicit platform replacement designations, and benchmark figures that need to be read carefully before anyone acts on them.

Start with the architecture, because it’s the firmest ground here. Mistral Medium 3.5 is a dense 128B model with a 256,000-token context window, confirmed across multiple independent sources including Hugging Face and NVIDIA’s API documentation. Dense, not mixture-of-experts, a deliberate design choice that affects inference behavior at scale. The 256k context is real and verified. These aren’t soft claims.

The platform replacements are equally confirmed. According to Mistral’s model card on Hugging Face, Medium 3.5 replaces Mistral Medium 3.1 and Magistral in Le Chat, and replaces Devstral 2 in the Vibe coding agent. If your team is running either of those, the GA release is a forced evaluation trigger, not an optional upgrade consideration.

Now the benchmarks. Mistral reports a score of 77.6% on SWE-Bench Verified, though the benchmark’s reliability for ranking frontier coding models has been publicly questioned, with independent commentary describing it as “effectively benchmaxxed.” That editorial context matters. The score tells you something, but not as much as a clean leaderboard position would suggest. Similarly, according to Mistral’s internal evaluation, the model scores 91.4 on the τ³-Telecom benchmark, a domain-specific vertical that has no current independent corroboration available from this coverage cycle. For general coding evaluation, treat both figures as directional, not definitive. Independent evaluation is pending.

The configurable reasoning effort parameter, which allows users to modulate compute depth between quick responses and extended agentic runs, is described by DevOps.com’s reporting as a Mistral implementation, though independent cross-references for the Mistral-specific version of this feature are limited. The concept exists and is operational in other frontier models; its Mistral implementation is reported, not yet independently confirmed.

One practical consideration the announcement doesn’t fully address: at production scale, the reasoning effort slider’s value depends entirely on how predictably the model’s inference latency responds to effort-level changes. A configurable parameter is only useful if the cost-latency curve is stable and documented. Teams evaluating this feature for agentic pipelines should test that curve before committing, the reported feature is real, but its production behavior at volume is uncharted in public benchmarks.

FLAG FOR OPERATOR: API pricing ($1.5/1M input tokens, $7.5/1M output tokens) is unconfirmed independently and must be verified against Mistral’s official API documentation before this brief is published. Do not publish with unverified pricing figures.

What to watch: Whether independent benchmark organizations evaluate Medium 3.5 against the SWE-Bench Verified leaderboard directly, and whether the τ³-Telecom score finds third-party validation. Teams replacing Devstral 2 should run side-by-side evaluations on their actual task distributions before migration, not on vendor-reported aggregate scores.

The GA release of Mistral Medium 3.5 is meaningful for one specific reason: the platform replacements are confirmed and irreversible. Devstral 2 and Mistral Medium 3.1 are being retired from their respective contexts regardless of whether you evaluate the new model. That makes this a time-bounded decision for teams on those platforms, the benchmark debate is secondary to the operational reality.