There’s no single answer to the question of what AI training data costs. The answer depends on jurisdiction, legal framework, the content category at issue, and whether the question is being resolved through settlement or judicial ruling. Right now, two proceedings are producing data points that will shape how that question gets answered in courts, boardrooms, and compliance departments for years.
The Two Cases
*Anthropic (United States, civil settlement):* A class-action settlement over Anthropic’s use of copyrighted text in training data is approaching final approval. A status report filed April 15, 2026, updated April 21, reflects a reported $1.5 billion settlement figure with approximately 91% of eligible authors, roughly 120,000 individuals, filing claims, according to The Bookseller. The class covers approximately 480,000 works. These figures haven’t been independently verified against court filings; Reuters coverage of the settlement couldn’t be confirmed from a verified URL. Final approval hearing is reported for May 2026.
*ANI v. OpenAI (India, judicial proceeding):* The Delhi High Court has reserved its judgment as of April 21, 2026, in a case brought by the Asian News International wire service, per the Observer Research Foundation. The case tests whether training large language models on copyrighted news content falls within “fair dealing” under India’s Copyright Act. The critical distinction: India’s Copyright Act contains no Text and Data Mining exception. The court isn’t interpreting a statutory framework designed to address AI training. It’s being asked to apply a general copyright concept to a technology that didn’t exist when the framework was written.
These are different kinds of proceedings with different legal consequences. One is a civil settlement, no admission of liability, no judicial ruling on infringement, but a financial outcome and a participation rate that are now part of the public record. The other is a judicial interpretation of statutory law that will either create or foreclose a legal pathway for AI training in India. Both matter.
What the Anthropic Number Actually Means
A settlement isn’t a ruling. Anthropic agreed to a reported $1.5 billion payment without admitting that training on copyrighted text was infringement. That’s a standard settlement posture, and it’s important not to overread the legal significance.
But the financial significance is real and distinct from the legal question. For any AI company’s legal team assessing litigation exposure over training data, the Anthropic settlement is now a reference point. The arithmetic is visible: settlement amount, class size, number of works, participation rate. Those numbers can be applied to any comparable fact pattern. Other frontier labs training on similar corpora can run that math against their own situations.
The 91% claim rate adds another dimension. Class-action settlements frequently achieve low participation because eligible claimants don’t engage with the process. A 91% rate, if it holds through final approval, signals that the author community had the organizational infrastructure to reach its members and the motivation to do so. According to The Bookseller, this rate is described as unusually high. The comparison benchmark cited in initial reporting couldn’t be independently sourced, so we’re not publishing it. But the organizational implication is clear: the plaintiff infrastructure in AI training data cases is mature. It can be deployed again.
What the India Gap Actually Means
India’s lack of a TDM exception in its Copyright Act is the legal condition that makes the Delhi ruling consequential regardless of how it lands. If the court rules for OpenAI, finds that AI training constitutes fair dealing, it creates judicial precedent in the absence of statute. That precedent is narrower than a legislative TDM exception would be: it applies to the specific facts of this case and would need to be extended through subsequent rulings.
If the court rules against OpenAI, the immediate practical effect is that AI companies training on Indian news content without licenses face legal exposure under Indian copyright law. The broader effect is that it pushes the question to the Indian legislature, either to pass a TDM exception or to allow the adverse precedent to stand.
Either outcome changes the operational environment. As the Observer Research Foundation’s analysis notes, a ruling against OpenAI could require AI companies to obtain licenses for training data in the Indian market. That framing reflects legal analysis of a pending judgment, not a court finding, but it’s the right planning horizon for companies with Indian-language training corpora.
The Emerging Global Pattern
Place these two cases alongside the broader landscape and a pattern emerges. Four distinct approaches to the same underlying question, who has rights in the content AI trains on – are developing simultaneously.
*United States:* Litigation-driven cost discovery. No statutory framework governing AI training specifically; cases filed under existing copyright law; outcomes through settlement. Anthropic’s reported settlement establishes a cost data point, not a legal rule. The cost discovery continues through other pending cases.
*India:* Judicial interpretation of a statutory gap. No TDM exception exists; the court must apply existing concepts. The ruling will either create precedent or surface the need for legislation. Either way, the legal uncertainty that’s existed since LLMs emerged is about to be reduced by one data point.
*European Union:* Prospective regulatory licensing. The EU AI Act and the EU Copyright Directive’s TDM exception together create a framework where AI training on certain content is permissible under defined conditions, with transparency requirements. The regulatory path is defined; the compliance work is ongoing.
*Japan:* Voluntary and permissive. Japan’s recent AI law combined with its APPI amendment – addressed in a prior brief – reflects a deliberate policy choice to minimize barriers to AI training data access. The stance is the most permissive of the four jurisdictions. It’s also the one with the least enforcement infrastructure if the policy changes.
These aren’t stages on a path toward a unified global standard. They’re diverging answers to the same question, and that divergence appears durable.
What Compliance Teams Should Know
For each jurisdiction, the current state of legal risk for AI training data:
*US:* High uncertainty, litigation-active. The Anthropic settlement reduces one specific uncertainty, what one case cost, but doesn’t resolve the legal question for others. Monitor pending cases for additional settlements or rulings. If training corpus is similar to Anthropic’s, the settlement provides a modeling anchor.
*India:* Pending judicial ruling. The Delhi HC decision is a binary event. Before it: treat training on Indian copyrighted content as legally uncertain. After it: the outcome defines the next action (licensing negotiations if adverse; continued monitoring if favorable).
*EU:* Active compliance obligation. The TDM exception exists with conditions; the AI Act’s GPAI transparency requirements are in force for systems above defined thresholds. This is not a monitoring task, it’s an ongoing compliance process.
*Japan:* Monitoring posture. The voluntary compliance model carries no enforcement risk under the current framework. Track the Basic AI Plan publication expected in the coming weeks; it will define what “voluntary adherence” means operationally.
What to Watch
Two near-term events are determinative. The Anthropic final approval hearing, expected in May, will either confirm the reported $1.5 billion figure or surface objections that could change the settlement’s terms or timeline. The Delhi HC ruling, timing unknown, will resolve one of the most significant outstanding questions in international AI copyright law. Both deserve immediate follow-up briefs when they occur.
TJS Synthesis
The global economics of AI training data aren’t being set by legislation. They’re being discovered through litigation, judicial interpretation, and deliberate policy choices made jurisdiction by jurisdiction. The Anthropic settlement and ANI v. OpenAI are two data points in that discovery process, different in kind, different in jurisdiction, but answering the same underlying question from two directions. Compliance teams that track only their domestic jurisdiction are navigating one corner of a much larger map. The pattern that matters is cross-jurisdictional: the price of AI training data is being set simultaneously in multiple forums, and the results don’t add up to a single number. They add up to a risk landscape that requires a global legal framework to manage.