Cursor Composer 2.5 Claims Third on Coding Agent Index at a Fraction of Claude and Codex Costs

May 22, 2026 2 min read Artificial Analysis Partial Strong

Tech Jacks Solutions AI News Coverage

Anysphere has released Cursor Composer 2.5, and per Artificial Analysis benchmarks published May 21, the new model reportedly scores 62 on the Coding Agent Index, third behind Claude Opus 4.7 max and GPT-5.5 xhigh reasoning, at a reported cost of $0.07 to $0.44 per task. For teams currently paying $4.10 for Opus 4.7 or $4.82 for GPT-5.5, that gap is a procurement question, not a product curiosity.

ai-developer-tools cursor anysphere coding-agents ai-agents ai-benchmarks artificial-analysis coding-agent-index generative-ai

Benchmark cost vs Opus 4.7, 10x cheaper

Key Takeaways

Cursor Composer 2.5 reportedly scores 62 on the Coding Agent Index (Artificial Analysis, May 21), third behind Opus 4.7 max (66) and GPT-5.5 xhigh (65), per reported benchmarks
Reported task costs: $0.07 Standard / $0.44 Fast, approximately 10x to 60x cheaper than the top two ranked alternatives at $4.10 and $4.82
SWE-Bench-Pro-Hard-AA reportedly gained 35 points (12% to 47%), reportedly matching Opus 4.7 max on that benchmark
All figures attributed to Artificial Analysis independent benchmarks; verify before acting on these numbers

Model Release

Cursor Composer 2.5

OrganizationAnysphere (Cursor)

TypeLLM u2014 Coding Specialized

ParametersNot disclosed

Benchmark[SELF-REPORTED] Coding Agent Index: 62 (reported, Artificial Analysis); SWE-Bench-Pro-Hard-AA: 47% (reported, Artificial Analysis)

AvailabilityCursor platform

Coding Agent Index Score vs. Cost Per Task (reported, Artificial Analysis, May 21 2026)

Claude Opus 4.7 max

Score: 66 | Cost: $4.10

GPT-5.5 xhigh reasoning

Score: 65 | Cost: $4.82

Cursor Composer 2.5 (Fast)

Score: 62 | Cost: $0.44

Cursor Composer 2.5 (Standard)

Score: 62 | Cost: $0.07

Cost compression has been the real story in the coding agent market for months. Composer 2.5 is the sharpest data point yet.

Per Artificial Analysis benchmarks published May 21, Cursor Composer 2.5 reportedly scores 62 on the Coding Agent Index, up from approximately 48 in the prior version. That reportedly places it third on the index, behind Claude Opus 4.7 max at 66 and GPT-5.5 xhigh reasoning at 65. The spread between third and first is four points. The cost spread is a different story.

Artificial Analysis reports task costs at $0.07 in Standard mode and $0.44 in Fast mode. Compare that to $4.10 for Opus 4.7 and $4.82 for GPT-5.5. At Fast mode pricing, Composer 2.5 runs at roughly a 10x discount versus Opus 4.7. At Standard mode, it’s closer to 60x cheaper. For a development team running thousands of tasks per week, that math changes a procurement decision.

The benchmark picture has a second dimension. Per the same Artificial Analysis evaluation, Composer 2.5 reportedly gained 35 points on SWE-Bench-Pro-Hard-AA, moving from 12% to 47%. The SWE-Bench variants test real software engineering tasks: bug fixing, feature implementation, code understanding. A 35-point gain on the harder variant is a meaningful stated improvement. It reportedly matches Claude Opus 4.7 max on that specific benchmark.

The catch is the verification layer. All numerical claims here come from Artificial Analysis, an independent benchmark provider, but these figures are attributed to independent evaluation, not vendor self-reporting, which matters for how much weight you give them. They remain reported, not independently confirmed at this production stage.

One number that’s not in this brief: average execution wall time. The Wire referenced a 6.7-minute figure for Fast mode, but without a source URL, it doesn’t clear the bar for publication. Artificial Analysis’s full report should address it.

Context: this release lands inside a competitive moment for agentic coding tools. We covered the broader picture on May 21, what Google, Anthropic, and OpenAI actually built in their agentic coding platforms. Composer 2.5 reshapes that picture by inserting a challenger within four benchmark points of the top two, at a cost structure the major labs haven’t matched.

The longer pricing pattern: a brief from May 13 argued that when frontier model benchmarks converge, pricing becomes the story. Composer 2.5 is the clearest evidence for that thesis to date. The Coding Agent Index scores for the top three tools sit within a four-point band. The cost spread is an order of magnitude.

What to Watch

Anthropic or OpenAI pricing response to Composer 2.5 cost structureWeeks

SWE-Bench-Pro-Hard-AA methodology documentation, validate the 47% claimAvailable now via SWE-Bench team

Next Coding Agent Index update cycleWeeks

What to watch

the Artificial Analysis source is the primary verification step. SWE-Bench-Pro-Hard-AA is a recognized variant in the SWE-Bench family, the methodology details matter for evaluating the 47% claim. Watch whether Anthropic or OpenAI respond with pricing adjustments; a four-point benchmark gap doesn’t justify a 10x premium indefinitely.

TJS synthesis

The price-performance narrative is compelling, and the benchmark source has independent credibility. The decision framework is straightforward: if your team’s Coding Agent Index use cases skew toward the task types SWE-Bench-Pro-Hard-AA covers, Composer 2.5 deserves an evaluation. At $0.44 per task, the evaluation cost is trivial.