Gallery

Contacts

405 W. Greenlawn Ave Lansing, Michigan 48910

contact@techjacksolutions.com

+1-616-320-4064

Rankings

Cross-Vendor Leaderboards

AI Model
Rankings

Data-driven LLM leaderboards built from independent benchmarks (LMArena, LLM-Stats, SWE-bench, RULER). Sortable, sourced, and updated as the landscape shifts.

5
Rankings
4+
Benchmarks
2026
Feb to Mar Snapshot
Sourced
Every Figure

Open-Weight

Top 10 ranked by LMArena Elo

Coding

SWE-bench, LiveCodeBench, Terminal-Bench

Cost

Cheapest frontier-class APIs by token price

Context

Advertised max versus RULER-effective

Benchmarks That Matter

Which tests still separate frontier models, and which to retire

What's New

Mar 2026

Open-Weight Leaderboard Shift

GLM-5, Qwen 3.5, and Kimi K2.5 lead the open-weight field on LMArena Elo, with Llama and Gemma flagged for restrictive licenses.

Leaderboard

Mar 2026

Coding Splits Three Ways

No single model sweeps coding: SWE-bench, LiveCodeBench, and Terminal-Bench each crown a different leader this cycle.

Benchmark

Feb 2026

Price Floor Drops Again

Step-3.5-Flash and DeepSeek V4-Flash undercut the frontier-class field on price per million tokens.

Pricing

Feb 2026

Context Reality Check

RULER testing shows models usably hold only 50 to 65 percent of their advertised context window.

Methodology

How These Rankings Are Built

Every ranking on this page is assembled from independent leaderboards, not vendor marketing. Scores are a Feb to Mar 2026 snapshot, each figure is labeled vendor-reported or independently measured, and positions shift as new models ship.

Independent Benchmarks

Rankings draw on third-party leaderboards: LMArena Elo for human preference, SWE-bench Verified and LiveCodeBench for coding, RULER for effective context, plus aggregators like LLM-Stats. We favor open, reproducible tests over vendor-run evaluations.

Vendor vs Independent

Some scores come from the model makers themselves and some from neutral third parties. Each ranking labels which is which, because a vendor-reported benchmark and an independently reproduced one do not carry equal weight.

Scores Shift

These are point-in-time standings from early 2026. Frontier models ship monthly, prices fall, and leaderboard positions move. Treat each ranking as a snapshot to re-check, not a permanent verdict.


Key Numbers

GLM-5

Open-Weight Arena Leader

80.8

Top SWE-bench Verified

$0.10

Cheapest Input / MTok

10M

Longest Advertised Context


The Benchmarks Behind the Rankings

Human Preference

LMArena Elo

Blind, crowd-sourced head-to-head voting that anchors the open-weight ranking. Resistant to overfitting because prompts are live and unpredictable.

Coding

SWE-bench Verified, LiveCodeBench, Terminal-Bench

Real GitHub issue resolution, contamination-resistant competitive coding, and end-to-end terminal task completion. Each measures a different slice of developer work.

Reasoning

HLE and GPQA

Humanity's Last Exam and graduate-level science questions probe frontier reasoning where older tests have saturated near 100 percent.

Context

RULER Effective Window

Measures how much of an advertised context window a model can actually use. The gap between marketing numbers and usable tokens is large and consistent.

Cost

Price per Million Tokens

Published API rates for input and output, compared at frontier-class capability so the cheapest option is not simply the weakest model.


Rankings

Five data-driven leaderboards across open-weight models, coding, cost, benchmarks, and context. Each ranking cites its sources and labels vendor-reported figures.

Format

Related Coverage

More from the AI Tools Hub and across Tech Jacks Solutions.

Before You Use AI

Important context for responsible AI adoption

Your Privacy

Model rankings cover providers with very different data practices. Some process conversations on servers outside your jurisdiction, some offer enterprise or self-hosted deployments with stronger controls, and free tiers often log inputs to improve their models. Review each provider's privacy policy and terms of service before sharing sensitive information, and prefer enterprise or self-hosted options when data cannot leave your walls.

Mental Health & AI Dependency

A high benchmark score does not make a model a safe substitute for human expertise or connection. The models ranked here are built for information and technical tasks, and over-reliance on any of them carries real risk. If you are experiencing distress:

  • 988 Suicide & Crisis Lifeline - Call or text 988 (US)
  • SAMHSA Helpline - 1-800-662-4357 (free, 24/7)
  • Crisis Text Line - Text HOME to 741741

AI systems can produce plausible-sounding but incorrect guidance. For mental health, medical, legal, or financial decisions, always consult a qualified professional.

See the NIST AI Risk Management Framework for structured risk assessment guidance.

Your Rights & Our Transparency

Under GDPR (EU) and CCPA (California), you have the right to access, correct, and delete your personal data. Enforcement of these rights may differ for services operated from outside your jurisdiction.

The EU AI Act classifies general-purpose AI models under specific transparency and risk obligations, which apply to many of the models ranked here when deployed within the EU.

This publication is editorially independent. Rankings are based on independent research and testing. Where affiliate links are present, they are clearly disclosed and do not influence editorial conclusions.