Fair use has been the center of gravity in AI copyright litigation since the first lawsuits landed. The argument is straightforward: AI training ingests copyrighted material and produces something new, and under the transformative use doctrine, that ingestion might qualify as fair use. Courts have been wrestling with this for years. Thomson Reuters v. Ross has been in litigation since 2020. Bartz v. Anthropic ran through class certification, generated a settlement, and still hasn’t produced a definitive ruling on training-phase fair use.
Three cases filed or reported this week suggest that some plaintiffs have stopped waiting for that ruling. They’re filing around it.
The Procurement Theory: Shakespeare v. Anthropic
One hundred authors who rejected the Bartz v. Anthropic settlement filed a new copyright complaint in the Northern District of California on or around June 17, 2026. Lead plaintiff Thomas William Shakespeare, described as a British sociologist and bioethicist, heads a group that made a deliberate choice: they opted out of a settlement rather than accept it, signaling they believe they have a stronger position.
The legal theory, as characterized by legal analysts covering the filing, focuses on the alleged acquisition of training data via BitTorrent. The datasets reportedly at issue, Books3, LibGen, and PiLiMi, are familiar from prior AI copyright litigation. What’s new is where the complaint allegedly places the infringement: not at model training, but at the moment of downloading and retaining pirated material.
This is a meaningful distinction. Legal commentary analyzing the filing frames it this way: if downloading a pirated dataset is itself an act of infringement, separate from any subsequent use, then Anthropic’s transformative training argument doesn’t engage with the claim at all. Fair use is a defense to using copyrighted material. It isn’t a defense to stealing it.
The factual question the procurement theory puts to courts is narrower than the training-phase question: did the defendant acquire this specific material through channels that constitute infringement? Either the BitTorrent history is there or it isn’t. Discovery on dataset acquisition is a different animal than discovery on model training methodology.
Plaintiffs reportedly seek up to $71.4 million in maximum statutory damages, a figure from newsletter reporting that hasn’t been confirmed against the complaint text. The docket number remains unconfirmed in public records. Treat those figures as placeholders. The legal theory is the substance.
The Securities Disclosure Vector: Hirschberger v. Narayen
Filed one day earlier, on June 16, 2026, *Hirschberger et al. v. Narayen et al.* targets Adobe Systems executives over statements about its Firefly AI image generation models. The complaint, as reported by Courthouse News Service, alleges that Adobe’s public statements that Firefly was trained on “licensed, commercially safe” content were materially false representations to investors.
This isn’t a copyright suit. It’s a shareholder derivative action, filed on behalf of the company, not directly by shareholders seeking personal recovery. The legal instrument matters: the plaintiffs aren’t arguing that Adobe infringed their copyrights. They’re arguing that Adobe’s executives misled investors about the company’s legal exposure, and that the misrepresentation caused harm to the company.
The complaint alleges Adobe used materials from the SlimPajama dataset, which is subject to active copyright disputes. Adobe hasn’t publicly responded to this allegation.
AI Copyright Litigation, Plaintiff Strategy Shift
The significance isn’t in the specific allegation. It’s in the legal category the case establishes. AI companies have been making training data claims in press releases, investor presentations, and regulatory filings for years. Those claims, “trained on licensed data,” “commercially safe,” “copyright-compliant”, have been understood primarily as product marketing. The Adobe suit treats them as investor representations. If that framing survives a motion to dismiss, the consequences extend far beyond Adobe.
Every AI company that has made public training data sourcing claims in investor-facing materials needs to evaluate whether those claims are supportable against the actual training corpus. That audit isn’t just a copyright compliance exercise. It’s a securities disclosure exercise.
The Dataset Mapping Approach: The Atlantic Investigation
The third development this week doesn’t involve a new lawsuit. It involves a new tool plaintiffs will use in existing ones.
The Atlantic published an investigation on June 16, 2026 identifying four datasets totaling approximately 21 million music tracks allegedly used to train Suno and Udio, the AI music generation platforms already facing copyright suits from major labels. The investigation names Taylor Swift, Bad Bunny, Billie Eilish, and Nirvana among the artists whose recordings appear in the identified datasets, according to The Atlantic’s reporting.
The investigation is original journalism, not an official finding. Courts will determine what actually happened. But investigative dataset-mapping work of this scale tends to enter litigation through discovery requests and expert analysis. Music labels are reportedly seeking to add approximately 61,000 specific recordings to existing Suno/Udio suits, per Hypebeast, a single-source figure, treated here as reported rather than confirmed.
The investigative methodology is the point. If a journalist can identify and map a training corpus at scale, so can opposing counsel with discovery tools. The Atlantic’s work demonstrates that AI training data isn’t opaque, it’s mappable. That changes the litigation landscape for any platform that assembled training data without thorough provenance documentation.
What the Three Cases Have in Common
These aren’t three variations on the same lawsuit. They’re three different legal instruments aimed at the same structural vulnerability: AI companies that assembled training data through channels that can’t withstand scrutiny.
| Case | Defendant | Legal Instrument | Core Allegation | Fair Use Relevance |
|---|---|---|---|---|
| Shakespeare v. Anthropic | Anthropic | Copyright (procurement theory) | Torrenting datasets is itself infringement | Irrelevant, defense to use, not acquisition |
| Hirschberger v. Narayen | Adobe executives | Shareholder derivative (securities-adjacent) | “Commercially safe” claims were materially false | Not applicable, securities framing |
| Suno/Udio (labels) | Suno, Udio | Copyright (existing suits, expanded record) | 21M tracks in training corpus identified | Contested, training-phase fair use still live |
The common thread: each approach sidesteps or supplements the training-phase fair use argument. The procurement theory avoids it entirely. The securities theory reframes the question. The dataset-mapping approach strengthens the factual record for claims that don’t depend on fair use to succeed.
The Compliance Implication
Two years of AI copyright litigation have produced a compliance posture in many organizations that looks like this: assume training on publicly available data is defensible under fair use, document training methodology, and wait for courts to settle the question.
Dataset Provenance Compliance Audit, Priority Actions
- Audit dataset acquisition records: purchase agreements, scraping logs, archive licensing contracts
- Review public statements about training data sourcing for investor-facing materiality
- Document BitTorrent avoidance policy and enforcement in data acquisition pipeline
- Assess whether existing training data claims in filings or press materials are documentably accurate
Analysis
The organizations that navigate the next phase of AI copyright litigation well won't be the ones with the best fair use arguments. They'll be the ones that can document dataset provenance from acquisition through training, and whose investor-facing training data claims match that documentation. The compliance gap most organizations have isn't in training documentation. It's in acquisition documentation.
That posture has a gap. It assumes the fight happens at training. These three cases suggest the fight is moving earlier, to acquisition, and sideways into investor disclosure.
The dataset provenance question has a different answer than the training methodology question. For training methodology, the compliance record is your model card, your training pipeline documentation, your technical architecture. For dataset acquisition, the compliance record is your data purchase agreements, your scraping logs, your BitTorrent avoidance policy, your archive licensing contracts. Many organizations have robust documentation of the former and thin documentation of the latter.
The 2026 compliance program framework brief addresses the broader patchwork challenge. The specific addition this week requires is a dataset provenance audit: not “what did we train on” but “how did we acquire what we trained on, and can we prove it?”
For investor relations teams at AI companies: any public statement describing training data as “licensed,” “commercially safe,” or “copyright-compliant” now requires documentation that supports the claim. The Adobe case may not succeed. But it establishes that the claim is material, and material claims that prove false carry consequences beyond the copyright domain.
The procurement theory’s legal fate depends on courts that haven’t yet ruled. The securities disclosure theory depends on whether a motion to dismiss survives. The dataset-mapping approach depends on what discovery produces. None of these outcomes is settled.
What’s settled is the direction. Plaintiffs are filing around fair use. That trend will accelerate if any of these three approaches produces a favorable ruling. The organizations that will navigate this period well are the ones that can answer two questions with documentation: how did we get our training data, and what did we tell investors about it?