Gallery

Contacts

405 W. Greenlawn Ave Lansing, Michigan 48910

contact@techjacksolutions.com

+1-616-320-4064

Skip to content
NVIDIA Technology
Technology Daily Brief Vendor Claim

Open Source AI News: Z.ai Drops GLM-5.2 Weights on Hugging Face, MIT License, 8xH100 Required

3 min read Hugging Face, zai-org/GLM-5.2 Repository Partial Strong H
Z.ai released the full weights for GLM-5.2 on Hugging Face under an MIT license on June 16, making a 744-billion-parameter mixture-of-experts model openly available for the first time. According to Artificial Analysis' Intelligence Index v4.1, it's now the top-ranked open-weights model at the frontier, and the minimum hardware requirement is eight H100 GPUs.
SWE-bench Pro score, 62.1% (per Artificial Analysis)

Key Takeaways

  • GLM-5.2 weights released June 16 on Hugging Face under MIT license, freely available for self-hosting
  • Scores 51 on Artificial Analysis Intelligence Index v4.1, ranking first among open-weights models per that index (not yet Epoch-verified)
  • Minimum self-hosting requirement: 8x H100 GPUs at FP8, significant infrastructure barrier for most teams
  • Z.ai reports 62.1% on SWE-bench Pro vs. Claude Opus 4.8's 69.2%, the gap is narrowing but remains; API pricing $1.40/$4.40/$0.26 per 1M tokens (vendor-stated)

Model Release

GLM-5.2
OrganizationZ.ai (Zhipu AI)
TypeOpen Source LLM
Parameters744B total / 40B active (MoE)
Benchmark[SELF-REPORTED via Artificial Analysis] Intelligence Index v4.1: 51 (rank #1 open-weights); SWE-bench Pro: 62.1%
AvailabilityHugging Face (MIT) + Z.ai API + Cloudflare Workers AI

Verification

Partial Artificial Analysis Intelligence Index v4.1 (independent benchmarker, not directly fetched); Z.ai technical documentation Epoch AI evaluation pending. Benchmark scores are per Artificial Analysis and Z.ai reporting. Independent confirmation not yet available.

The weights are on Hugging Face. MIT license. Six hundred forty-four billion total parameters, forty billion active per forward pass. That’s the GLM-5.2 release in its plainest form, and for teams tracking the open-weights frontier, it matters.

According to Artificial Analysis’ Intelligence Index v4.1, GLM-5.2 scores 51, placing it first among open-weights models currently evaluated on that index. Z.ai reports GLM-5.2 scores 62.1% on SWE-bench Pro, according to Artificial Analysis’ evaluation, compared to Claude Opus 4.8’s 69.2% on the same benchmark. For math reasoning, Z.ai’s technical reporting puts the model at 99.2% on AIME 2026 and 91.2% on GPQA-Diamond. These figures come from Artificial Analysis and Z.ai’s own documentation; no independent Epoch AI evaluation has been published yet.

The model uses a mixture-of-experts architecture, a design confirmed by the published arXiv paper on the GLM-5 family. With 744 billion total parameters, only 40 billion activate on any given forward pass, which is why MoE models at this scale can run on far less hardware than dense equivalents. The catch is that “far less” still means eight H100 GPUs at FP8 quantization, according to Z.ai’s documentation. That’s roughly $30,000 in GPU hardware at current spot pricing if you’re buying. Most teams won’t touch local deployment.

SWE-bench Pro Score (per Artificial Analysis / vendor reporting, not Epoch-verified)

GLM-5.2 (Z.ai)
62.1%
Claude Opus 4.8 (Anthropic)
69.2%
MiniMax M3 (MiniMax)
[URL-NEEDED: MiniMax M3 benchmark, SWE-bench Pro]
DeepSeek V4 Pro (DeepSeek)
[URL-NEEDED: DeepSeek V4 Pro benchmark, SWE-bench Pro]

Why this matters for developers and enterprise architects

GLM-5.2 closes roughly 7 percentage points of the gap between the best open-weights model and Claude Opus 4.8 on SWE-bench Pro, a coding benchmark that reflects real software engineering tasks. Six months ago, that gap was wider. The trajectory is compressing. For organizations that can run this hardware, the build-vs-API calculus just shifted.

The broader pattern is worth noting. This is the fourth significant open-weights model release from non-US labs in roughly 30 days. MiniMax M3 dropped a 456-billion-parameter model with a one-million-token context window in early June, followed by Microsoft’s MAI-Thinking-1. GLM-5.2 also matches the one-million-token context window, with a maximum output of 131,072 tokens, per Z.ai’s specifications. That’s not a coincidence, it’s a feature race on a specific capability axis.

For teams that can’t self-host, the model is also available via the Z.ai API and Cloudflare Workers AI. Z.ai’s stated pricing is $1.40 per million input tokens, $4.40 per million output tokens, and $0.26 per million cached tokens, vendor-stated figures, not independently verified.

What to Watch

Epoch AI publishes GLM-5.2 evaluationDays to weeks
Developer community latency data from Cloudflare Workers AI integration1-2 weeks
Z.ai API pricing verified by independent tracking sourcesOngoing

What to watch

Epoch AI hasn’t published a GLM-5.2 evaluation yet. When it does, the independent benchmark data will either confirm or revise the Artificial Analysis scores. That’s the number that matters for organizations making serious adoption decisions. Don’t build a migration plan around self-reported benchmarks, wait for independent evaluation. A second trigger: Cloudflare’s Workers AI integration is confirmed but the changelog wasn’t fetched in verification; watch for developer community feedback on latency at production scale before assuming the API tier matches self-hosted performance.

TJS synthesis

GLM-5.2 is the most capable open-weights model available today by the measure of at least one reputable independent benchmarker, and it’s free to download. The access barrier isn’t licensing, it’s hardware. Eight H100s is a meaningful filter. Teams with that infrastructure should run their own evals now, before Epoch AI publishes. Teams without it should watch the API pricing tier and wait for production latency data from the developer community. The frontier isn’t closed anymore. It’s just expensive to open.

View Source
More Technology intelligence
View all Technology

Related Coverage

Stay ahead on Technology

Get verified AI intelligence delivered daily. No hype, no speculation, just what matters.

Explore the AI News Hub