The weights are on Hugging Face. MIT license. Six hundred forty-four billion total parameters, forty billion active per forward pass. That’s the GLM-5.2 release in its plainest form, and for teams tracking the open-weights frontier, it matters.
According to Artificial Analysis’ Intelligence Index v4.1, GLM-5.2 scores 51, placing it first among open-weights models currently evaluated on that index. Z.ai reports GLM-5.2 scores 62.1% on SWE-bench Pro, according to Artificial Analysis’ evaluation, compared to Claude Opus 4.8’s 69.2% on the same benchmark. For math reasoning, Z.ai’s technical reporting puts the model at 99.2% on AIME 2026 and 91.2% on GPQA-Diamond. These figures come from Artificial Analysis and Z.ai’s own documentation; no independent Epoch AI evaluation has been published yet.
The model uses a mixture-of-experts architecture, a design confirmed by the published arXiv paper on the GLM-5 family. With 744 billion total parameters, only 40 billion activate on any given forward pass, which is why MoE models at this scale can run on far less hardware than dense equivalents. The catch is that “far less” still means eight H100 GPUs at FP8 quantization, according to Z.ai’s documentation. That’s roughly $30,000 in GPU hardware at current spot pricing if you’re buying. Most teams won’t touch local deployment.
SWE-bench Pro Score (per Artificial Analysis / vendor reporting, not Epoch-verified)
Why this matters for developers and enterprise architects
GLM-5.2 closes roughly 7 percentage points of the gap between the best open-weights model and Claude Opus 4.8 on SWE-bench Pro, a coding benchmark that reflects real software engineering tasks. Six months ago, that gap was wider. The trajectory is compressing. For organizations that can run this hardware, the build-vs-API calculus just shifted.
The broader pattern is worth noting. This is the fourth significant open-weights model release from non-US labs in roughly 30 days. MiniMax M3 dropped a 456-billion-parameter model with a one-million-token context window in early June, followed by Microsoft’s MAI-Thinking-1. GLM-5.2 also matches the one-million-token context window, with a maximum output of 131,072 tokens, per Z.ai’s specifications. That’s not a coincidence, it’s a feature race on a specific capability axis.
For teams that can’t self-host, the model is also available via the Z.ai API and Cloudflare Workers AI. Z.ai’s stated pricing is $1.40 per million input tokens, $4.40 per million output tokens, and $0.26 per million cached tokens, vendor-stated figures, not independently verified.
What to Watch
What to watch
Epoch AI hasn’t published a GLM-5.2 evaluation yet. When it does, the independent benchmark data will either confirm or revise the Artificial Analysis scores. That’s the number that matters for organizations making serious adoption decisions. Don’t build a migration plan around self-reported benchmarks, wait for independent evaluation. A second trigger: Cloudflare’s Workers AI integration is confirmed but the changelog wasn’t fetched in verification; watch for developer community feedback on latency at production scale before assuming the API tier matches self-hosted performance.
TJS synthesis
GLM-5.2 is the most capable open-weights model available today by the measure of at least one reputable independent benchmarker, and it’s free to download. The access barrier isn’t licensing, it’s hardware. Eight H100s is a meaningful filter. Teams with that infrastructure should run their own evals now, before Epoch AI publishes. Teams without it should watch the API pricing tier and wait for production latency data from the developer community. The frontier isn’t closed anymore. It’s just expensive to open.