Two models. One week. Different answers to the same question.
The question: what does a capable open-source agent actually need? Google DeepMind and Z.ai both released major open-weight models this week designed for agentic workloads, and both made choices that reveal what they think the answer is. Google DeepMind’s Gemma 4, released April 2 and covered in this week’s Gemma 4 brief, optimizes for breadth, multimodal capability, edge hardware compatibility, and a context window that scales up to 256,000 tokens. Z.ai’s GLM-5.1, announced today and detailed in the GLM-5.1 brief, optimizes for endurance, 754 billion parameters and a design specification of eight hours of continuous autonomous operation on a single task.
These aren’t competing entries in the same category. They’re two different bets about where the open-source agentic frontier goes next.
Gemma 4: Intelligence Per Parameter, Anywhere
Google DeepMind’s bet is that agent deployment is about to go wide. Gemma 4 ships in four variants, 2B and 4B for phones and edge devices, a 26B Mixture of Experts, and a 31B dense model, all under Apache 2.0 licensing, according to SiliconAngle’s reporting on the release. The smallest models run on an Android phone. The largest offers a 256K token context window, per coverage from AI Tools Recap.
The design philosophy is about removing the cloud dependency. API-dependent agent pipelines carry real costs at volume: inference costs per call, latency on multi-step workflows, and exposure when sensitive data leaves the device. A model that delivers competitive reasoning performance on local hardware removes all three constraints. Day-one integration with Hugging Face, Ollama, vLLM, llama.cpp, MLX, NVIDIA NIM, and Android Studio signals that Google DeepMind built Gemma 4 to be deployable, not just downloadable.
Google DeepMind describes Gemma 4 as its most capable open model family to date, that’s their characterization, and as context rather than confirmed competitive ranking. The benchmark figures Google DeepMind reported (89.2% on AIME 2026 and 80.0% on LiveCodeBench v6 for the 31B model) are from the company’s own internal evaluation. Epoch AI’s independent assessment is pending. Treat them as directional, not definitive.
The use case that Gemma 4 unlocks: agents deployed at the edge, on enterprise endpoint devices, developer laptops, or eventually consumer hardware, that can run locally, cost less per inference, and handle multimodal inputs. This is the right architecture for agents that assist with ongoing tasks in environments where cloud round-trips are expensive or impractical.
GLM-5.1: Endurance Over Breadth
Z.ai’s bet runs in a different direction. GLM-5.1 is a 754-billion parameter Mixture-of-Experts model, a scale that demands serious compute and makes edge deployment irrelevant. The design specification, per VentureBeat’s reporting, is eight hours of continuous autonomous operation on a single task, with capacity for up to 1,700 sequential steps. That’s not a benchmark. It’s a design target. But as design targets go, it’s unusually specific, and the specificity is informative.
Long-horizon task execution is one of the genuine hard problems in agentic AI. Most current agent frameworks struggle with tasks that require maintaining coherent state, making good decisions across hundreds of tool calls, and recovering gracefully from errors deep in an execution chain. A model explicitly architected for this problem, not retrofitted for it, is relevant even before independent evaluation confirms its actual performance.
Z.ai released GLM-5.1 under an MIT license, available on Hugging Face immediately. The MIT license is slightly more permissive than Apache 2.0 in some commercial use contexts, though both are broadly permissive. The practical difference for most enterprise deployments is negligible, but legal teams evaluating open-source licensing for production AI should verify which applies to their specific use case.
One benchmark claim in Z.ai’s announcement is omitted from this analysis: comparisons to specific competitor model versions could not be confirmed as referring to existing products. That claim has no verifiable signal and is excluded here. The GLM-5.1 release stands on its own merits without it.
The Fork: Choosing a Philosophy
| Dimension | Gemma 4 | GLM-5.1 |
|---|---|---|
| Developer | Google DeepMind | Z.ai (Zhupai AI) |
| License | Apache 2.0 | MIT |
| Architecture | Multi-variant (2B–31B), Dense + MoE | 754B Mixture-of-Experts |
| Optimization target | Edge deployment, multimodal, hardware flexibility | Extended autonomous execution, long-horizon tasks |
| Context / Execution focus | Up to 256K tokens (context) | Up to 1,700 steps / 8 hours (execution) |
| Key use case | On-device agents, low-latency inference, mobile | Long-running autonomous task agents, complex workflows |
| Edge deployment | Yes, runs on phones and developer hardware | No, requires substantial compute |
| Independent evaluation | Pending (Epoch AI) | Pending (no evaluation data available) |
The choice isn’t which model is better. Neither has been independently evaluated. The choice is which design philosophy fits your agent architecture.
If you’re building agents that assist users in real time, responding to inputs, handling multimodal data, running on the user’s device, Gemma 4’s design approach is the relevant one. The smaller variants’ ability to run on-device without cloud infrastructure is a concrete architectural advantage.
If you’re building agents that execute complex, multi-step workflows autonomously over extended periods, research synthesis, code review pipelines, automated analysis tasks, GLM-5.1’s endurance-first design is worth evaluating. The 754B parameter scale and the explicit eight-hour design target signal a different set of architectural priorities.
Many real agent deployments need both: a fast, edge-capable model for real-time interaction and a larger, endurance-capable model for background processing. The interesting deployment pattern may not be “choose one” but “orchestrate both.”
What This Week Signals
Two major open-source agentic releases in a single week from two different countries isn’t a coincidence, it’s a pattern. Open-source model development has become a globally distributed competition. A Chinese AI startup and a US hyperscaler are now shipping comparable-category models with similar open licensing on the same week. The competitive frontier for open-source agentic AI isn’t concentrated in one geography or one organizational type.
For practitioners, the practical implication is straightforward. The open-source alternatives to closed API-dependent agent pipelines are improving quickly and are now purpose-built for agentic workloads in ways that earlier open-source releases were not. The cost and control advantages of local deployment are becoming achievable without significant capability sacrifice, though “becoming achievable” is different from “confirmed achievable,” and independent evaluation for both models is still pending.
Watch for Epoch AI’s assessments of both models. That’s the data that converts vendor-stated design targets into verified production guidance. Until those evaluations exist, both models are worth pulling and running on representative workloads, but neither should anchor a production architecture decision based solely on vendor claims.
The open-source agentic race is no longer a US-only story.