Rankings

Cross-Vendor Leaderboards

AI Model
Rankings

Data-driven LLM leaderboards built from independent benchmarks (LMArena, LLM-Stats, SWE-bench, RULER). Sortable, sourced, and updated as the landscape shifts.

Explore Rankings AI Tools Hub →

Rankings

Benchmarks

2026

Feb to Mar Snapshot

Sourced

Every Figure

Open-Weight

Top 10 ranked by LMArena Elo

Coding

SWE-bench, LiveCodeBench, Terminal-Bench

Cost

Cheapest frontier-class APIs by token price

Context

Advertised max versus RULER-effective

Benchmarks That Matter

Which tests still separate frontier models, and which to retire

Mar 2026

Open-Weight Leaderboard Shift

GLM-5, Qwen 3.5, and Kimi K2.5 lead the open-weight field on LMArena Elo, with Llama and Gemma flagged for restrictive licenses.

Leaderboard

Mar 2026

Coding Splits Three Ways

No single model sweeps coding: SWE-bench, LiveCodeBench, and Terminal-Bench each crown a different leader this cycle.

Benchmark

Feb 2026

Price Floor Drops Again

Step-3.5-Flash and DeepSeek V4-Flash undercut the frontier-class field on price per million tokens.

Pricing

Feb 2026

Context Reality Check

RULER testing shows models usably hold only 50 to 65 percent of their advertised context window.

Methodology

Every ranking on this page is assembled from independent leaderboards, not vendor marketing. Scores are a Feb to Mar 2026 snapshot, each figure is labeled vendor-reported or independently measured, and positions shift as new models ship.

Independent Benchmarks

Rankings draw on third-party leaderboards: LMArena Elo for human preference, SWE-bench Verified and LiveCodeBench for coding, RULER for effective context, plus aggregators like LLM-Stats. We favor open, reproducible tests over vendor-run evaluations.

Vendor vs Independent

Some scores come from the model makers themselves and some from neutral third parties. Each ranking labels which is which, because a vendor-reported benchmark and an independently reproduced one do not carry equal weight.

Scores Shift

These are point-in-time standings from early 2026. Frontier models ship monthly, prices fall, and leaderboard positions move. Treat each ranking as a snapshot to re-check, not a permanent verdict.

GLM-5

Open-Weight Arena Leader

80.8

Top SWE-bench Verified

$0.10

Cheapest Input / MTok

10M

Longest Advertised Context

Human Preference

LMArena Elo

Blind, crowd-sourced head-to-head voting that anchors the open-weight ranking. Resistant to overfitting because prompts are live and unpredictable.

Coding

SWE-bench Verified, LiveCodeBench, Terminal-Bench

Real GitHub issue resolution, contamination-resistant competitive coding, and end-to-end terminal task completion. Each measures a different slice of developer work.

Reasoning

HLE and GPQA

Humanity's Last Exam and graduate-level science questions probe frontier reasoning where older tests have saturated near 100 percent.

Context

RULER Effective Window

Measures how much of an advertised context window a model can actually use. The gap between marketing numbers and usable tokens is large and consistent.

Cost

Price per Million Tokens

Published API rates for input and output, compared at frontier-class capability so the cheapest option is not simply the weakest model.

Five data-driven leaderboards across open-weight models, coding, cost, benchmarks, and context. Each ranking cites its sources and labels vendor-reported figures.

Format

Ranking Rankings

Top 10 Open-Weight LLMs in 2026

Ranked by LMArena Elo: GLM-5, Qwen 3.5, Kimi K2.5 lead the open-weight field, with Llama and Gemma flagged for restrictive licenses.

Skeptic 11 min Read →

Ranking Rankings

Top 7 LLMs for Coding in 2026

SWE-bench Verified, LiveCodeBench, and Terminal-Bench, ranked. Opus 4.6 leads SWE-bench; Qwen leads LiveCodeBench; GPT-5.3 Codex leads the terminal.

Practitioner 12 min Read →

Ranking Rankings

Top 8 Cheapest Frontier-Class AI APIs

Price per 1M tokens at frontier capability. Step-3.5-Flash and DeepSeek V4-Flash undercut the field; Opus 4.6 anchors the premium end.

Practitioner 11 min Read →

Ranking Rankings

Top 7 LLM Benchmarks That Still Matter

Which benchmarks still separate frontier models. SWE-bench, LiveCodeBench, HLE, GPQA, and the saturated ones to retire.

Skeptic 12 min Read →

Ranking Rankings

Top 6 LLMs by Context Window

Advertised max versus RULER-effective context. Llama 4 Scout 10M tops the list, but every model usably holds only 50 to 65 percent of its advertised window.

Practitioner 11 min Read →

More from the AI Tools Hub and across Tech Jacks Solutions.

DeepSeek Hub

Open-weight models, aggressive API pricing, and the lab that reset cost expectations.

AI Tools Hub

Breakdowns, comparisons, and guides across every major AI vendor.

Meta Llama Hub

Meta's open-weight Llama models: architecture, fine-tuning, and deployment.

AI Governance

Responsible AI, EU AI Act, and compliance frameworks.

Security News

Cybersecurity alerts, threat analysis, and defense strategies.

Prompt Library

Copy-paste prompt templates for ChatGPT, Gemini, and Claude.

Before You Use AI

Important context for responsible AI adoption

Your Privacy

Model rankings cover providers with very different data practices. Some process conversations on servers outside your jurisdiction, some offer enterprise or self-hosted deployments with stronger controls, and free tiers often log inputs to improve their models. Review each provider's privacy policy and terms of service before sharing sensitive information, and prefer enterprise or self-hosted options when data cannot leave your walls.

Mental Health & AI Dependency

A high benchmark score does not make a model a safe substitute for human expertise or connection. The models ranked here are built for information and technical tasks, and over-reliance on any of them carries real risk. If you are experiencing distress:

988 Suicide & Crisis Lifeline - Call or text 988 (US)
SAMHSA Helpline - 1-800-662-4357 (free, 24/7)
Crisis Text Line - Text HOME to 741741

AI systems can produce plausible-sounding but incorrect guidance. For mental health, medical, legal, or financial decisions, always consult a qualified professional.

See the NIST AI Risk Management Framework for structured risk assessment guidance.

Your Rights & Our Transparency

Under GDPR (EU) and CCPA (California), you have the right to access, correct, and delete your personal data. Enforcement of these rights may differ for services operated from outside your jurisdiction.

The EU AI Act classifies general-purpose AI models under specific transparency and risk obligations, which apply to many of the models ranked here when deployed within the EU.

This publication is editorially independent. Rankings are based on independent research and testing. Where affiliate links are present, they are clearly disclosed and do not influence editorial conclusions.

Gallery

Contacts

AI Model
Rankings

What's New