Google DeepMind released Gemini 3.1 Flash approximately April 29, 2026. The primary source URL is broken, so all specifics below carry the qualification that Google DeepMind has stated them, not that they’ve been independently confirmed. Epoch AI evaluation is pending.
The headline figures from Google DeepMind: audio-to-audio latency under 150 milliseconds for Flash Live, a score of 84.2% on its Multimodal Live Eval benchmark, and a reported 15% reduction in Flash tier pricing. The model carries a 2M-token context window, consistent with prior Gemini architecture reporting. Google DeepMind claims Gemini 3.1 Flash Live achieves audio-to-audio latency under 150 milliseconds, that number hasn’t been reproduced by an independent lab. Google DeepMind reports a score of 84.2% on its Multimodal Live Eval benchmark; independent evaluation by Epoch AI is pending. Google DeepMind reportedly reduced Flash tier pricing by 15% alongside this release.
One item that’s not new: Gemini Robotics-ER 1.6. This release package references the robotics model’s performance, with Google DeepMind’s internal evaluation citing a 22% improvement in Unstructured Environment Navigation over v1.5. Robotics-ER 1.6 was covered at deployment in April in the context of Boston Dynamics industrial operations. The navigation figure is new data but comes from a self-reported benchmark on a model that’s been in deployment for two weeks. Treat it as an update, not an announcement.
Why it matters: the real-time audio and live multimodal space is becoming a tier-one product category. Voice assistants, live translation, and AI-assisted call center infrastructure all depend on latency figures that earlier-generation models couldn’t reliably hit. A sub-150ms audio-to-audio claim, if it holds under real-world load, is competitive with purpose-built real-time architectures. The 15% price cut tightens the cost calculus for developers choosing between Flash tier and alternatives. That combination, speed claim plus price reduction, is the competitive signal, not either figure in isolation.
The unaddressed question is production-scale behavior. A latency figure measured under benchmark conditions at a single request level doesn’t tell a developer what latency looks like at 10,000 concurrent audio streams. Google DeepMind’s disclosure doesn’t address throughput at scale, which is the number that actually determines whether this model can anchor a production voice architecture. That’s not a criticism, it’s what’s missing from the announcement and what teams evaluating it for real-time deployment should demand before committing.
Context: AI inference costs have been declining across the frontier lab tier. A 15% Flash tier price reduction fits that trajectory. It also puts pressure on competing real-time audio offerings, the GPT-5.5 real-time API, in particular, occupies the same developer decision set. Developers choosing a real-time audio stack are now evaluating a price-reduced Gemini Flash against whatever competing latency claims exist, all of which are similarly self-reported pending independent evaluation.
What to watch: Epoch AI’s evaluation timeline for Gemini 3.1 Flash. If independent benchmark results diverge significantly from Google DeepMind’s reported figures, as has happened with other self-reported scores in the benchmark credibility debate, that’s a meaningful update for any team that has already selected Flash for a production real-time deployment.