Gallery

Contacts

405 W. Greenlawn Ave Lansing, Michigan 48910

contact@techjacksolutions.com

+1-616-320-4064

Skip to content
Anthropic Regulation
Regulation Deep Dive

AI Copyright's New Battleground: Why Plaintiffs Are Abandoning the Training Debate for Torrenting

6 min read Multiple (Pascal's Substack; Courthouse News Service; The Atlantic) Partial Very Weak S
Three AI copyright developments this week share a single structural shift: plaintiffs are moving the fight from what AI companies did with copyrighted material to how they obtained it in the first place. The procurement theory, the securities disclosure theory, and the dataset-mapping approach represent three different legal instruments aimed at the same vulnerability, and together they suggest that the fair use debate that has dominated AI copyright litigation for two years may be losing its strategic value for plaintiffs. If that shift holds, the compliance question for every AI company with a training data gap stops being "was training transformative?" and becomes "can you document where you got it?"
Cases filed this week, 3

Key Takeaways

  • Three copyright-related cases this week share a structural shift: plaintiffs are moving the fight from training-phase fair use to dataset acquisition and investor disclosure
  • Shakespeare v. Anthropic's procurement theory argues that torrenting pirated datasets is infringement before any training occurs, making fair use arguments irrelevant to those claims
  • Hirschberger v. Adobe establishes that "commercially safe AI" training claims may constitute material investor representations, a new liability vector separate from copyright suits
  • The Atlantic's 21-million-track dataset mapping demonstrates that AI training corpora are identifiable at scale, strengthening the evidentiary record in existing music copyright suits
  • Organizations need two compliance records, not one: dataset acquisition provenance documentation in addition to training methodology documentation

Three AI Copyright Cases, Comparative Overview (June 2026)

Case Defendant Legal Instrument Fair Use Relevance
Shakespeare v. Anthropic Anthropic Copyright, procurement theory Irrelevant, targets acquisition, not training use
Hirschberger v. Narayen Adobe executives Shareholder derivative (securities-adjacent) Not applicable, investor disclosure framing
Suno/Udio label suits (expanded) Suno, Udio Copyright, existing suits, expanded dataset record Contested, training-phase fair use still active

Fair use has been the center of gravity in AI copyright litigation since the first lawsuits landed. The argument is straightforward: AI training ingests copyrighted material and produces something new, and under the transformative use doctrine, that ingestion might qualify as fair use. Courts have been wrestling with this for years. Thomson Reuters v. Ross has been in litigation since 2020. Bartz v. Anthropic ran through class certification, generated a settlement, and still hasn’t produced a definitive ruling on training-phase fair use.

Three cases filed or reported this week suggest that some plaintiffs have stopped waiting for that ruling. They’re filing around it.

The Procurement Theory: Shakespeare v. Anthropic

One hundred authors who rejected the Bartz v. Anthropic settlement filed a new copyright complaint in the Northern District of California on or around June 17, 2026. Lead plaintiff Thomas William Shakespeare, described as a British sociologist and bioethicist, heads a group that made a deliberate choice: they opted out of a settlement rather than accept it, signaling they believe they have a stronger position.

The legal theory, as characterized by legal analysts covering the filing, focuses on the alleged acquisition of training data via BitTorrent. The datasets reportedly at issue, Books3, LibGen, and PiLiMi, are familiar from prior AI copyright litigation. What’s new is where the complaint allegedly places the infringement: not at model training, but at the moment of downloading and retaining pirated material.

This is a meaningful distinction. Legal commentary analyzing the filing frames it this way: if downloading a pirated dataset is itself an act of infringement, separate from any subsequent use, then Anthropic’s transformative training argument doesn’t engage with the claim at all. Fair use is a defense to using copyrighted material. It isn’t a defense to stealing it.

The factual question the procurement theory puts to courts is narrower than the training-phase question: did the defendant acquire this specific material through channels that constitute infringement? Either the BitTorrent history is there or it isn’t. Discovery on dataset acquisition is a different animal than discovery on model training methodology.

Plaintiffs reportedly seek up to $71.4 million in maximum statutory damages, a figure from newsletter reporting that hasn’t been confirmed against the complaint text. The docket number remains unconfirmed in public records. Treat those figures as placeholders. The legal theory is the substance.

The Securities Disclosure Vector: Hirschberger v. Narayen

Filed one day earlier, on June 16, 2026, *Hirschberger et al. v. Narayen et al.* targets Adobe Systems executives over statements about its Firefly AI image generation models. The complaint, as reported by Courthouse News Service, alleges that Adobe’s public statements that Firefly was trained on “licensed, commercially safe” content were materially false representations to investors.

This isn’t a copyright suit. It’s a shareholder derivative action, filed on behalf of the company, not directly by shareholders seeking personal recovery. The legal instrument matters: the plaintiffs aren’t arguing that Adobe infringed their copyrights. They’re arguing that Adobe’s executives misled investors about the company’s legal exposure, and that the misrepresentation caused harm to the company.

The complaint alleges Adobe used materials from the SlimPajama dataset, which is subject to active copyright disputes. Adobe hasn’t publicly responded to this allegation.

AI Copyright Litigation, Plaintiff Strategy Shift

Author plaintiffs (Shakespeare et al.)
for
Filing procurement theory, bypasses training-phase fair use entirely
Music label plaintiffs
for
Expanding existing suits using The Atlantic dataset mapping as evidentiary basis
Adobe shareholders (Hirschberger)
for
Securities-adjacent derivative theory, investor disclosure misrepresentation
Anthropic / Adobe / Suno / Udio
against
Defendants, likely to challenge via motion to dismiss; Adobe has not publicly responded

The significance isn’t in the specific allegation. It’s in the legal category the case establishes. AI companies have been making training data claims in press releases, investor presentations, and regulatory filings for years. Those claims, “trained on licensed data,” “commercially safe,” “copyright-compliant”, have been understood primarily as product marketing. The Adobe suit treats them as investor representations. If that framing survives a motion to dismiss, the consequences extend far beyond Adobe.

Every AI company that has made public training data sourcing claims in investor-facing materials needs to evaluate whether those claims are supportable against the actual training corpus. That audit isn’t just a copyright compliance exercise. It’s a securities disclosure exercise.

The Dataset Mapping Approach: The Atlantic Investigation

The third development this week doesn’t involve a new lawsuit. It involves a new tool plaintiffs will use in existing ones.

The Atlantic published an investigation on June 16, 2026 identifying four datasets totaling approximately 21 million music tracks allegedly used to train Suno and Udio, the AI music generation platforms already facing copyright suits from major labels. The investigation names Taylor Swift, Bad Bunny, Billie Eilish, and Nirvana among the artists whose recordings appear in the identified datasets, according to The Atlantic’s reporting.

The investigation is original journalism, not an official finding. Courts will determine what actually happened. But investigative dataset-mapping work of this scale tends to enter litigation through discovery requests and expert analysis. Music labels are reportedly seeking to add approximately 61,000 specific recordings to existing Suno/Udio suits, per Hypebeast, a single-source figure, treated here as reported rather than confirmed.

The investigative methodology is the point. If a journalist can identify and map a training corpus at scale, so can opposing counsel with discovery tools. The Atlantic’s work demonstrates that AI training data isn’t opaque, it’s mappable. That changes the litigation landscape for any platform that assembled training data without thorough provenance documentation.

What the Three Cases Have in Common

These aren’t three variations on the same lawsuit. They’re three different legal instruments aimed at the same structural vulnerability: AI companies that assembled training data through channels that can’t withstand scrutiny.

Case Defendant Legal Instrument Core Allegation Fair Use Relevance
Shakespeare v. Anthropic Anthropic Copyright (procurement theory) Torrenting datasets is itself infringement Irrelevant, defense to use, not acquisition
Hirschberger v. Narayen Adobe executives Shareholder derivative (securities-adjacent) “Commercially safe” claims were materially false Not applicable, securities framing
Suno/Udio (labels) Suno, Udio Copyright (existing suits, expanded record) 21M tracks in training corpus identified Contested, training-phase fair use still live

The common thread: each approach sidesteps or supplements the training-phase fair use argument. The procurement theory avoids it entirely. The securities theory reframes the question. The dataset-mapping approach strengthens the factual record for claims that don’t depend on fair use to succeed.

The Compliance Implication

Two years of AI copyright litigation have produced a compliance posture in many organizations that looks like this: assume training on publicly available data is defensible under fair use, document training methodology, and wait for courts to settle the question.

Dataset Provenance Compliance Audit, Priority Actions

  • Audit dataset acquisition records: purchase agreements, scraping logs, archive licensing contracts
  • Review public statements about training data sourcing for investor-facing materiality
  • Document BitTorrent avoidance policy and enforcement in data acquisition pipeline
  • Assess whether existing training data claims in filings or press materials are documentably accurate

Analysis

The organizations that navigate the next phase of AI copyright litigation well won't be the ones with the best fair use arguments. They'll be the ones that can document dataset provenance from acquisition through training, and whose investor-facing training data claims match that documentation. The compliance gap most organizations have isn't in training documentation. It's in acquisition documentation.

That posture has a gap. It assumes the fight happens at training. These three cases suggest the fight is moving earlier, to acquisition, and sideways into investor disclosure.

The dataset provenance question has a different answer than the training methodology question. For training methodology, the compliance record is your model card, your training pipeline documentation, your technical architecture. For dataset acquisition, the compliance record is your data purchase agreements, your scraping logs, your BitTorrent avoidance policy, your archive licensing contracts. Many organizations have robust documentation of the former and thin documentation of the latter.

The 2026 compliance program framework brief addresses the broader patchwork challenge. The specific addition this week requires is a dataset provenance audit: not “what did we train on” but “how did we acquire what we trained on, and can we prove it?”

For investor relations teams at AI companies: any public statement describing training data as “licensed,” “commercially safe,” or “copyright-compliant” now requires documentation that supports the claim. The Adobe case may not succeed. But it establishes that the claim is material, and material claims that prove false carry consequences beyond the copyright domain.

The procurement theory’s legal fate depends on courts that haven’t yet ruled. The securities disclosure theory depends on whether a motion to dismiss survives. The dataset-mapping approach depends on what discovery produces. None of these outcomes is settled.

What’s settled is the direction. Plaintiffs are filing around fair use. That trend will accelerate if any of these three approaches produces a favorable ruling. The organizations that will navigate this period well are the ones that can answer two questions with documentation: how did we get our training data, and what did we tell investors about it?

View Source
More Regulation intelligence
View all Regulation

Related Coverage

Stay ahead on Regulation

Get verified AI intelligence delivered daily. No hype, no speculation, just what matters.

Explore the AI News Hub