Anthropic’s Claude Sonnet 5 is a drop-in replacement for Sonnet 4.6, with adaptive thinking enabled by default. That’s the headline. The part nobody mentions is the tokenizer.
According to one cost analysis, Anthropic has updated its tokenizer in Sonnet 5 in a way that may increase effective token counts by up to 35% per request. If that holds across typical workloads, a team running 1 million input tokens under the new model could be billed for the equivalent of 1.35 million. At the introductory input rate of $2 per million tokens, confirmed by Anthropic’s platform documentation, that’s a meaningful gap between what the rate card says and what the invoice shows.
The introductory pricing runs through August 31, 2026, per platform documentation and multiple secondary sources. Standard pricing of $3 per million input tokens and $15 per million output tokens takes effect after that date. Teams that build cost models around the $2/$10 introductory rate, then absorb the tokenizer effect on top of the September step-up, face a compounding bill increase. Benchmark your specific workload now, not after the rate change.
Why it matters
Pricing changes at the nominal level get covered. Tokenizer changes don’t, and they’re often harder to anticipate because they require actual workload testing to quantify. A 35% token count increase isn’t a rounding error; it’s the difference between a model that costs $2 and one that effectively costs $2.70 before you’ve touched the output rate. For high-volume production deployments, that spread compounds quickly across millions of daily requests.
Disputed Claim
Sonnet 5 also introduces adaptive thinking by default, which can be disabled via API parameter. Adaptive thinking means the model may invoke extended reasoning on certain queries, potentially increasing output token consumption further. Teams that don’t explicitly disable this feature may see output token costs behave differently than they did on Sonnet 4.6. The combination of a new tokenizer and an always-on reasoning mode creates two distinct cost variables that the nominal rate card doesn’t capture.
Context
According to Anthropic’s coverage, Anthropic reports Sonnet 5 scores 80.4% on Terminal-Bench 2.1, a third-party agentic coding benchmark. Sonnet 4.6 scored 67.0% on the same benchmark, per Anthropic’s own figures. Those are self-reported scores on a third-party benchmark, not independently evaluated by Epoch AI or a comparable external body. The performance gain is real enough to justify evaluating the model; the benchmark methodology warrants the same scrutiny as the cost structure.
The drop-in upgrade positioning matters for teams already running Sonnet 4.6 in production. No model migration, no prompt reengineering, no API version change, in theory. The catch is that “drop-in” refers to API compatibility, not cost parity. Workload economics change the moment the tokenizer and adaptive thinking defaults interact with your specific prompt structure and output length patterns.
What to watch
Watch for independent tokenizer analysis from the developer community in the weeks ahead, the 35% estimate comes from a single source and warrants broader validation. Watch also whether Anthropic publishes explicit tokenizer documentation, which would let teams estimate token inflation before committing to production migration. The August 31 pricing transition is the hard deadline: teams that haven’t benchmarked effective costs by then are making the migration decision with incomplete information.
What to Watch
TJS synthesis
Don’t migrate Sonnet 4.6 workloads to Sonnet 5 based on the nominal rate card alone. Run your actual prompts through both models, compare token counts, and factor in adaptive thinking output behavior before projecting costs. The performance case for Sonnet 5 is plausible, a 13-point Terminal-Bench gain, per Anthropic’s own reporting, is worth investigating. The cost case requires your own measurement. Wait for broader tokenizer analysis before locking in volume commitments ahead of the September rate step-up.