Gemini 3.5 Flash Costs 3x More Than Its Predecessor, and Outperforms the Model It Was Supposed to Be Below

May 21, 2026 2 min read Google Blog / Google Cloud Blog Partial

Tech Jacks Solutions AI News Coverage

Google's mid-tier Gemini 3.5 Flash launched at $1.50/M input and $9.00/M output tokens, three times the price of Gemini 3 Flash Preview, and with benchmark scores that exceed the prior flagship Gemini 3.1 Pro. That combination breaks the logic enterprise teams have used to route workloads across Google's model tiers.

gemini google-deepmind llm-mid-tier agentic-ai model-pricing terminal-bench google-io-2026 ai-models-news

Pricing increase, 3x vs. prior Flash tier

Key Takeaways

Gemini 3.5 Flash launched at $1.50/M input and $9.00/M output, confirmed at 3x the cost of Gemini 3 Flash Preview
Google reports 76.2% on Terminal-Bench 2.1, above prior flagship Gemini 3.1 Pro's 70.3%; independent Epoch AI evaluation is pending
The 4x speed claim couldn't be independently corroborated, exclude it from production planning until verified
Any team with an existing Flash-vs-Pro routing decision needs to revisit it; the price and performance tiers no longer map the same way

Model Release

Gemini 3.5 Flash

OrganizationGoogle DeepMind

TypeLLM — Mid-tier

ParametersNot disclosed

Benchmark[SELF-REPORTED] Terminal-Bench 2.1: 76.2% (partial third-party corroboration via Artificial Analysis; Epoch AI evaluation pending)

AvailabilityGoogle AI Studio, Vertex AI, GitHub Copilot

Pricing: Input tokens per 1M (confirmed)

Gemini 3.5 Flash

$1.50

Gemini 3 Flash Preview (prior)

$0.50 (approx. 3x less)

Gemini 3.1 Flash-Lite

Not disclosed in this package

Pricing confirmed. $1.50 per million input tokens, $9.00 per million output tokens. Google announced Gemini 3.5 Flash at I/O 2026 on May 19 alongside a price point that is three times what Gemini 3 Flash Preview carried. That’s not a minor revision to the Flash tier. That’s a repositioning.

The performance numbers complicate the picture further. According to Google’s benchmark data, corroborated by third-party evaluation indices including Artificial Analysis, Gemini 3.5 Flash scores 76.2% on Terminal-Bench 2.1, outperforming Gemini 3.1 Pro’s 70.3% on the same evaluation. Google’s reported evaluations place MCP Atlas tool-use performance at 83.6%, above Gemini 3.1 Pro’s reported 78.2%. Independent evaluation by Epoch AI is pending, treat these figures as vendor-reported with partial third-party corroboration, not confirmed benchmarks.

A mid-tier model that outscores the prior flagship raises an architectural question most teams haven’t had to ask before: what does “mid-tier” mean when performance tiers and price tiers no longer align?

Disputed Claim

Output speeds up to 4x faster than comparable frontier models

Could not be independently corroborated via cross-reference at publication time

Exclude from production performance estimates until independent evaluation confirms this figure

The catch is that the pricing increase lands unevenly depending on your use case. Teams running high-throughput agentic workflows, coding assistants, multi-step document processing, API orchestration, will feel a 3x token cost increase acutely. Teams that were already routing premium workloads to Gemini 3.1 Pro may find the switch economically neutral or even favorable given the benchmark improvement. The question isn’t whether Gemini 3.5 Flash is good. It’s whether it’s good enough at this price relative to what you were paying before.

Google claims output speeds are up to 4x faster than comparable frontier models, a figure that couldn’t be independently corroborated at publication time. Don’t build that number into your production estimates until independent evaluation confirms it. Context window is specified at 1,048,576 input tokens per Google’s documentation. The model is available via Google AI Studio and Vertex AI, with GitHub Copilot integration also announced.

The comparison most teams will run is against Gemini 3.1 Flash-Lite, which remains available at a lower price point. Per Google’s internal Finance Agent v2 benchmark, Gemini 3.5 Flash reportedly scores 57.9%, above GPT-5.5’s 51.8% and below Claude Opus 4.7’s 66.1% on the same evaluation, figures that haven’t been independently verified. The three-way comparison is useful context but treat it as Google’s own benchmark framing, not an independent assessment.

What to Watch

Epoch AI independent benchmark evaluation for Gemini 3.5 FlashWeeks to months post-launch

Gemini 3.1 Flash-Lite pricing update or repositioning in responseNext Google developer update cycle

Enterprise cost-routing announcements from teams currently on Gemini 3.1 Pro30-60 days post-GA

The part nobody mentions: the Flash tier was positioned as the cost-efficient alternative for teams that didn’t need flagship performance. That positioning is gone. Gemini 3.5 Flash now costs what Gemini 3.1 Pro used to justify, while offering performance that exceeds it. Every team with a current routing decision between Flash and Pro-tier models needs to revisit that architecture before the next billing cycle.

Wait for Epoch AI’s independent evaluation before migrating workloads based on the benchmark claims. The pricing is confirmed, use that for budget planning now. The performance advantage over GPT-5.5 on Finance Agent v2 is vendor-reported and requires independent verification before it influences procurement.

View Source

More Technology intelligence

View all Technology

Deep Dive Available The NDA Evaluation Problem: What GPT-5.6 Sol's METR Assessment Reveals About AI...