There’s a version of this story that writes itself and gets shared widely: “AI makes everyone a programmer.” Another version, equally shareable: “AI is replacing software engineers.” Anthropic’s research supports neither. It supports something more specific, more useful, and considerably more complicated.
Start with the methodology. Approximately 400,000 Claude Code sessions, October 2025 through April 2026. Roughly 235,000 unique individuals, per Anthropic’s research summary. The study uses a privacy-preserving analysis framework. Humans make most planning decisions, what to do. Claude makes most execution decisions, how to do it. That division of labor is the foundation every other finding rests on.
What Anthropic measured, and what the methodology can and can’t tell us
The study covers Claude Code. Specifically. Not GitHub Copilot, not Cursor, not any other agentic coding platform. Every finding in this brief applies to Claude Code users during this specific window. Extrapolating to “agentic AI generally” requires comparative data that doesn’t yet exist in published form.
Four hundred thousand sessions is a large sample for behavioral research. It isn’t a random sample of the software development population, it’s a sample of people who adopted Claude Code and used it enough to be included in the analysis. Early adopters tend to skew toward higher engagement and often toward higher technical baseline. That context matters when interpreting the findings, though Anthropic’s stated methodology attempts to account for it.
The time period, October 2025 through April 2026, captures the model at a specific capability level. Claude Code has continued to update since April. The seven-month trend line within the study is informative; a comparable study published in late 2026 may show a different picture.
The expertise multiplier in practice
The core confirmed finding: greater domain expertise leads to more work done by Claude per instruction, and correlates with higher session success rates. Per Anthropic’s research page directly: “The greater domain expertise a person brings to a session, the more work Claude does per instruction.”
According to the paper’s full dataset analysis, expert users achieve approximately a 91% task success rate compared to approximately 15% for novices. These figures come from the full paper PDF, not confirmed from the published research page excerpt available during verification, and not independently corroborated. Treat them as Anthropic’s reported figures, not established independent benchmarks.
The number that deserves equal weight: “the gap between intermediate and expert users is modest.” That’s a direct quote from the fetched source, and it’s the number the 91%/15% framing omits. The success rate story isn’t expert-vs-everyone. It’s experts-and-intermediates vs. novices, with a smaller gap between the first two groups than between either of them and the third. For workforce planning purposes, this matters. The population of intermediate users, people with domain knowledge but not deep coding expertise, gains substantial benefit from Claude Code. They’re not at the ceiling, but they’re not on the floor either.
The democratization finding and what “average” actually means
Anthropic’s research page states: “On coding tasks, every major occupation succeeds at nearly the same rate as software engineers, on average.” That finding has made its way into headlines as “non-coders can now code.” The word “average” is doing structural work in that sentence.
Claude Code Usage Pattern Shift, October 2025 to April 2026
Who This Affects
Average coding tasks span a wide range. They include routine data manipulation, scripting, API calls, file system operations, and standard CRUD implementations. They don’t include system architecture, performance-critical optimization, security-sensitive implementation, or debugging novel failure modes in complex distributed systems. The study documents that Claude Code enables non-experts to complete a representative sample of tasks at roughly the same rate as software engineers complete that same sample. It doesn’t document parity on advanced tasks.
That distinction matters for two reasons. First, it defines where the workforce impact is and isn’t concentrated. Roles that spend the majority of their time on routine, well-scoped coding tasks face the most direct capability compression. Roles that require architectural judgment, security expertise, or management of technical complexity are amplified, not compressed. Second, it sets the terms for honest workforce planning conversations. “Non-coders can code now” overstates the finding and leads to poor decisions. “Non-coders can handle average coding tasks at near-parity” is accurate and leads to different, more targeted analysis.
The seven-month trend: what the trajectory implies
The shift over seven months is consistent and directional. The share of sessions spent debugging fell by nearly half, confirmed directly from the fetched source. Usage moved toward end-to-end agentic execution: deploying and running code, analyzing data, writing non-code documents. The paper reports the estimated economic value of typical tasks rose over the study period, approximately 25% per the full paper, which could not be confirmed from the published research page.
Debugging is historically one of the highest-time-cost activities in software development. A roughly 50% reduction in debugging sessions over seven months, if it holds across the broader developer population, represents a substantial shift in where engineering time goes. More time on architecture, less time on debugging, that’s the implied trajectory.
It’s one study, from one tool, over seven months. The trend line is suggestive, not definitive. What comparable data from other tools shows will matter.
Implications for engineering teams and workforce planning, five things to assess before acting
The study is the first major output of Anthropic’s $200M commitment to studying AI’s economic impact, announced in June. That context is important: this is a vendor publishing research about its own product’s effects. The methodology appears serious, and the findings are consistent with what smaller-scale studies have shown. But independent replication matters, and it hasn’t happened yet.
Before acting on this data for workforce planning:
One. Define what percentage of your engineering team’s current work falls into “average tasks” as the study uses that term. If that number is low, if your team skews toward architecture, security, and complex systems work, the non-coder parity finding affects you less than the headline suggests. If it’s high, the implications are more direct.
Analysis
The study's most important finding for workforce planning isn't the headline number, it's the modest gap between intermediate and expert users. That gap means the expertise ladder is compressible. Training intermediate-level domain knowledge into non-coders is now a viable productivity strategy. That's a different problem than 'replace engineers' and it leads to different decisions: invest in domain training for adjacent roles, not headcount reduction.
Unanswered Questions
- Do these findings hold for other agentic coding tools, or are they specific to Claude Code's architecture and training?
- What constitutes an 'average task' in Anthropic's classification, and does your team's work distribution match that definition?
- How does the expertise multiplier interact with task complexity as the model continues to update, will the novice/expert gap narrow or widen over the next year?
Two. Assess current team composition against the expertise multiplier finding. Teams with higher domain expertise across the board will gain more from agentic AI adoption than teams with wide expertise distribution. That has implications for hiring criteria, not just headcount levels.
Three. Don’t use this study alone to justify workforce change. It documents capability shifts. Layoff decisions, role restructuring, and hiring freezes involve legal obligations, particularly under EU AI Act provisions covering employment-affecting AI systems, and under employment law in most jurisdictions. A research paper published by the tool’s developer is not sufficient basis for workforce change without independent analysis.
Four. Track the comparison studies. When Cursor, GitHub Copilot, or other platforms publish comparable empirical data, the picture will either converge or diverge. Divergence would be informative about the degree to which these findings are Claude-specific.
Five. The seven-month trend line matters more than the endpoint. The trajectory, more agentic use, less debugging, rising task value, is moving in a consistent direction. Where it sits at twelve months post-study will be more useful for planning than the six-month snapshot.
TJS synthesis
Anthropic’s study is the best empirical data available on what agentic coding tools do in practice, and it should be read carefully rather than summarized loosely. Domain expertise multiplies, it isn’t obsolete. Non-coder parity applies to average tasks, not all tasks, and the gap between intermediate and expert users is explicitly noted as modest in the source. The seven-month trend is directional and consistent. None of this supports “AI is replacing software engineers” and none of it supports “nothing is changing.” For engineering managers: run your own task distribution analysis before drawing workforce conclusions. For compliance teams: flag this study in your AI impact assessment documentation, but do not treat it as evidence of displacement requiring immediate action. Wait for independent replication and for comparable data from other platforms. The study sits alongside a wave of agentic capability demonstrations that together suggest the nature of software work is changing faster than the workforce conversation has caught up with. The analytical work is: figure out which parts of your team’s work fit the “average task” definition, and plan from there.