Open-source agentic AI just got a lot more accessible. Google DeepMind released Gemma 4 on April 2, a model family that runs on hardware ranging from an Android phone to a cloud cluster, and that’s precisely the point. The models are available on Google DeepMind’s model page under an Apache 2.0 license, with day-one support across Hugging Face, Ollama, vLLM, llama.cpp, MLX, NVIDIA NIM, and Android Studio.
The family ships in four variants. Two phone- and edge-optimized models (2B and 4B parameters) handle on-device inference without a cloud dependency. A 26-billion parameter Mixture of Experts variant and a 31-billion parameter dense model serve more demanding tasks. Context windows scale from 128,000 tokens for smaller variants to 256,000 tokens for larger ones, according to SiliconAngle’s coverage and AI Tools Recap.
Google DeepMind describes the family as purpose-built for agentic workflows and advanced reasoning tasks. The 256K context window matters for agentic use cases specifically: agents running multi-step workflows, processing long documents, or maintaining extended tool-call histories need that headroom. A 128K window on the smaller edge models is meaningful too, most consumer-grade or mobile agent implementations don’t come close to exhausting it.
The benchmark figures reported by Google DeepMind, 89.2% on AIME 2026 and 80.0% on LiveCodeBench v6 for the 31B model, are, according to Google DeepMind’s internal evaluation, not yet independently verified. Epoch AI’s assessment is pending. Treat these as directional signals from the vendor, not confirmed third-party scores.
Why does this matter for developers and architects? The economics of agent deployment shift when the capable-enough model runs locally. Cloud API costs for high-volume agentic applications add up fast, especially agents that make dozens of tool calls per task. A model that delivers competitive reasoning performance on a developer’s laptop or an edge server changes the build-vs-buy calculus for teams that have been accepting cloud dependency as a given. Apache 2.0 licensing removes the commercial friction that sometimes accompanies open-weight releases.
The prior Gemma family established Google DeepMind’s presence in open-weight models but wasn’t explicitly designed around agentic use cases. Gemma 4 is a direct response to the competitive pressure from open-source alternatives optimized for extended task execution, a dynamic visible across multiple labs this week (see the GLM-5.1 brief for the parallel release from Z.ai).
Watch for Epoch AI’s independent evaluation. Vendor benchmark scores on AIME and LiveCodeBench tell you what the vendor wants you to know; independent evaluation tells you where the model actually sits relative to peers. That data will be the deciding input for serious production deployment decisions. Also watch for community benchmark results on Hugging Face, the open-weights release means practitioners will have access to run their own evaluations quickly.
Google DeepMind has been methodical about Gemma: first establishing the open-model lineage, then iterating on size and capability, now explicitly targeting agentic workflows. The edge-deployment angle is not incidental. It’s a strategic bet that the next wave of agent deployment happens at the device level, in enterprise edge servers, developer laptops, and eventually phones, not purely in hyperscaler APIs. Whether that bet is right will depend on whether independent evaluations confirm that the on-device variants can handle the reasoning demands of production agent tasks.