Top 10 Open-Weight LLMs in 2026
Ranked by LMArena Elo: GLM-5, Qwen 3.5, Kimi K2.5 lead the open-weight field, with Llama and Gemma flagged for restrictive licenses.
405 W. Greenlawn Ave Lansing, Michigan 48910
contact@techjacksolutions.com
+1-616-320-4064
Cross-Vendor Leaderboards
Data-driven LLM leaderboards built from independent benchmarks (LMArena, LLM-Stats, SWE-bench, RULER). Sortable, sourced, and updated as the landscape shifts.
Open-Weight
Top 10 ranked by LMArena Elo
Coding
SWE-bench, LiveCodeBench, Terminal-Bench
Cost
Cheapest frontier-class APIs by token price
Context
Advertised max versus RULER-effective
Benchmarks That Matter
Which tests still separate frontier models, and which to retire
Mar 2026
Open-Weight Leaderboard Shift
GLM-5, Qwen 3.5, and Kimi K2.5 lead the open-weight field on LMArena Elo, with Llama and Gemma flagged for restrictive licenses.
LeaderboardMar 2026
Coding Splits Three Ways
No single model sweeps coding: SWE-bench, LiveCodeBench, and Terminal-Bench each crown a different leader this cycle.
BenchmarkFeb 2026
Price Floor Drops Again
Step-3.5-Flash and DeepSeek V4-Flash undercut the frontier-class field on price per million tokens.
PricingFeb 2026
Context Reality Check
RULER testing shows models usably hold only 50 to 65 percent of their advertised context window.
MethodologyEvery ranking on this page is assembled from independent leaderboards, not vendor marketing. Scores are a Feb to Mar 2026 snapshot, each figure is labeled vendor-reported or independently measured, and positions shift as new models ship.
Rankings draw on third-party leaderboards: LMArena Elo for human preference, SWE-bench Verified and LiveCodeBench for coding, RULER for effective context, plus aggregators like LLM-Stats. We favor open, reproducible tests over vendor-run evaluations.
Some scores come from the model makers themselves and some from neutral third parties. Each ranking labels which is which, because a vendor-reported benchmark and an independently reproduced one do not carry equal weight.
These are point-in-time standings from early 2026. Frontier models ship monthly, prices fall, and leaderboard positions move. Treat each ranking as a snapshot to re-check, not a permanent verdict.
GLM-5
Open-Weight Arena Leader
80.8
Top SWE-bench Verified
$0.10
Cheapest Input / MTok
10M
Longest Advertised Context
Human Preference
LMArena Elo
Blind, crowd-sourced head-to-head voting that anchors the open-weight ranking. Resistant to overfitting because prompts are live and unpredictable.
Coding
SWE-bench Verified, LiveCodeBench, Terminal-Bench
Real GitHub issue resolution, contamination-resistant competitive coding, and end-to-end terminal task completion. Each measures a different slice of developer work.
Reasoning
HLE and GPQA
Humanity's Last Exam and graduate-level science questions probe frontier reasoning where older tests have saturated near 100 percent.
Context
RULER Effective Window
Measures how much of an advertised context window a model can actually use. The gap between marketing numbers and usable tokens is large and consistent.
Cost
Price per Million Tokens
Published API rates for input and output, compared at frontier-class capability so the cheapest option is not simply the weakest model.
Five data-driven leaderboards across open-weight models, coding, cost, benchmarks, and context. Each ranking cites its sources and labels vendor-reported figures.
Ranked by LMArena Elo: GLM-5, Qwen 3.5, Kimi K2.5 lead the open-weight field, with Llama and Gemma flagged for restrictive licenses.
SWE-bench Verified, LiveCodeBench, and Terminal-Bench, ranked. Opus 4.6 leads SWE-bench; Qwen leads LiveCodeBench; GPT-5.3 Codex leads the terminal.
Price per 1M tokens at frontier capability. Step-3.5-Flash and DeepSeek V4-Flash undercut the field; Opus 4.6 anchors the premium end.
Which benchmarks still separate frontier models. SWE-bench, LiveCodeBench, HLE, GPQA, and the saturated ones to retire.
Advertised max versus RULER-effective context. Llama 4 Scout 10M tops the list, but every model usably holds only 50 to 65 percent of its advertised window.
More from the AI Tools Hub and across Tech Jacks Solutions.
DeepSeek Hub
Open-weight models, aggressive API pricing, and the lab that reset cost expectations.
AI Tools Hub
Breakdowns, comparisons, and guides across every major AI vendor.
Meta Llama Hub
Meta's open-weight Llama models: architecture, fine-tuning, and deployment.
AI Governance
Responsible AI, EU AI Act, and compliance frameworks.
Security News
Cybersecurity alerts, threat analysis, and defense strategies.
Prompt Library
Copy-paste prompt templates for ChatGPT, Gemini, and Claude.
Important context for responsible AI adoption
Model rankings cover providers with very different data practices. Some process conversations on servers outside your jurisdiction, some offer enterprise or self-hosted deployments with stronger controls, and free tiers often log inputs to improve their models. Review each provider's privacy policy and terms of service before sharing sensitive information, and prefer enterprise or self-hosted options when data cannot leave your walls.
A high benchmark score does not make a model a safe substitute for human expertise or connection. The models ranked here are built for information and technical tasks, and over-reliance on any of them carries real risk. If you are experiencing distress:
AI systems can produce plausible-sounding but incorrect guidance. For mental health, medical, legal, or financial decisions, always consult a qualified professional.
See the NIST AI Risk Management Framework for structured risk assessment guidance.
Under GDPR (EU) and CCPA (California), you have the right to access, correct, and delete your personal data. Enforcement of these rights may differ for services operated from outside your jurisdiction.
The EU AI Act classifies general-purpose AI models under specific transparency and risk obligations, which apply to many of the models ranked here when deployed within the EU.
This publication is editorially independent. Rankings are based on independent research and testing. Where affiliate links are present, they are clearly disclosed and do not influence editorial conclusions.