Publishers v. Meta: Proposed Class Action Filed in SDNY as Meta Claims Fair Use Defense

May 14, 2026 2 min read The Guardian / Mishcon de Reya Partial Moderate

Tech Jacks Solutions AI News Coverage

Elsevier, Hachette, Macmillan, and additional publishers have filed a copyright lawsuit against Meta in the Southern District of New York as a proposed class action, alleging that Meta sourced training data for its Llama model family through shadow library sites including The Pile and Anna's Archive. Meta has responded that its use of copyrighted material qualifies as fair use under US copyright law, a legal position, not a judicial determination.

Named plaintiffs, 3+ major publishers

Key Takeaways

Publishers filed a proposed class action against Meta in SDNY alleging Llama was trained on shadow library content, class-action certification is pending, not granted
CMI removal allegation converts the question from whether infringement occurred to whether it was intentional, this is the most legally consequential element of the complaint
Meta's fair use defense is Meta's legal position; fair use applied to systematic large-scale AI training data ingestion is unsettled law
A merits ruling from SDNY would carry significant persuasive weight industry-wide, both parties have strong incentives that may push toward settlement before a ruling

The complaint is specific. The legal question isn’t settled.

Elsevier, Hachette, Macmillan, and a group of additional publishers filed a copyright suit against Meta in Manhattan’s federal court, the Southern District of New York, in early May 2026, according to The Guardian’s reporting. The case was filed as a proposed class action. Class-action status is a separate legal determination from the filing itself, it requires certification by the court, which hasn’t occurred. “Proposed class action” is the accurate framing until certification is granted.

The complaint alleges that Meta used shadow library sites, specifically naming The Pile and Anna’s Archive, to source copyrighted books, journals, and other content used to train the Llama model family. The plaintiffs allege mass-scale copyright infringement and, in prior filings, have raised the removal of Copyright Management Information (CMI) as evidence of intentional concealment. The CMI removal theory, that deliberately stripping authorship data from training content indicates willful infringement, is the most legally significant element in the complaint. It converts the question from whether infringement occurred to whether it was intentional, which affects both liability exposure and damages calculation.

Meta’s formal response invokes fair use. Meta claims that training on copyrighted material qualifies as transformative use under US copyright law, a legal position Meta has asserted, not a judicial conclusion. Fair use is a four-factor test. Applying it to systematic, large-scale ingestion of copyrighted works for commercial AI training is genuinely unsettled legal territory. Mishcon de Reya’s AI IP case tracker, updated to include this filing, notes the significance of this case for training data liability across the industry.

That’s the thesis: if fair use covers systematic ingestion at this scale, the entire AI training data legal architecture changes. If it doesn’t, every major AI company faces the same exposure.

The SDNY venue matters. The Southern District of New York handles major intellectual property and commercial litigation. A ruling from SDNY on AI training data fair use would carry persuasive weight in other circuits before any appellate process. The court’s docket and judicial assignment are factors that legal teams at AI companies are watching alongside the substantive legal arguments.

The real question is whether this case reaches a merits ruling or settles before it gets there. The CMI removal allegation, if proven, substantially weakens a fair use defense by introducing intent evidence. Meta has strong incentives to settle a case that could produce unfavorable precedent. Publishers have strong incentives to push toward a ruling that would constrain AI training practices industry-wide. The tension between those incentives will shape the case’s trajectory more than the legal arguments alone.

Don’t expect a quick resolution. Copyright litigation of this complexity, in federal court, with class-action mechanics in play, moves on a multi-year timeline. The hub will track procedural milestones, class certification hearing, preliminary injunction motion if any, summary judgment briefing, as they occur.