Evaluating the Costs and Benefits of Claude Opus 4.7's New Tokenizer
By Abhishek Ray
AI Summary
Anthropic's transition from Claude Opus 4.6 to 4.7 introduces a new tokenizer that increases token usage by 1.0 to 1.35 times, though my measurements show a higher 1.47x on technical documents. This change means that the same price and quota now cover fewer prompts, as more tokens are consumed per session, accelerating the depletion of your maximum window and increasing the cost of cached prefixes. To understand the trade-off, I conducted two experiments: one to measure the cost and another to evaluate the claimed benefits.
Using Anthropic's token counter, I compared token usage across two sets of samples: real-world Claude Code content and synthetic samples of various content types. The results showed a weighted ratio of 1.325x for real content and 1.345x for English-and-code subsets, with minimal change for CJK content. This suggests that English and code content is more affected, likely due to shorter or fewer sub-word merges in the new tokenizer.
The rationale behind this change is to enhance literal instruction following, especially at lower effort levels. Smaller tokens supposedly improve attention to individual words, aiding precision in tasks like tool calls. Partner reports indicate fewer tool errors over long runs, though it's unclear if this improvement is solely due to the tokenizer, as weights and post-training also changed.
To test if 4.7 actually follows instructions better, I used the IFEval benchmark, which includes prompts with verifiable constraints. The results showed a slight improvement in strict instruction following, with a 5 percentage point increase in prompt-level accuracy. However, this is a small sample size, and the improvement is modest, not the dramatic change some partners suggested.
The cost implications are significant. In a typical Claude Code session with 80 turns, the increased token usage raises costs by 20-30%. While the per-token price remains unchanged, the overall session cost rises due to the higher token count. For users on Max plans, this means hitting rate limits sooner, especially on English-heavy tasks.
Prompt caching, a key component of Claude Code's architecture, is also affected. The first session with 4.7 starts cold, as cached prefixes from 4.6 are invalidated. The increased token ratio means more tokens are written to and read from the cache, raising costs.
Despite these challenges, the new tokenizer does offer a measurable improvement in instruction following, albeit small. Whether this justifies the increased cost depends on the user's specific needs and content types. Ultimately, users are paying more for a slight enhancement in model precision.
Key Concepts
Tokenization is the process of converting text into smaller units called tokens, which can be words, phrases, or characters. This is a crucial step in natural language processing, as it allows models to interpret and process text data.
Instruction following in AI refers to a model's ability to accurately execute commands or tasks as specified by the user. This involves understanding and adhering to the precise requirements of a given prompt.
Category
AIOriginal source
https://www.claudecodecamp.com/p/i-measured-claude-4-7-s-new-tokenizer-here-s-what-it-costs-youMore on Discover
Summarized by Mente
Save any article, video, or tweet. AI summarizes it, finds connections, and creates your to-do list.
Start free, no credit card