A lawsuit doesn’t become significant the moment it’s filed. It becomes significant when a judge refuses to dismiss it.
On May 5, five major publishers, Hachette, Macmillan, McGraw Hill, Elsevier, and Cengage, along with author Scott Turow filed a class-action complaint in the U.S. District Court for the Southern District of New York. The defendants are Meta Platforms, Inc. and Mark Zuckerberg, individually. The targeted models are Llama 2, Llama 3, and Llama 4.
The complaint is new. The core allegation, that Meta used copyrighted works without authorization to train its AI systems, is not. What is new is the theory of personal executive liability. And it’s worth understanding precisely why that theory is being advanced here, and what structural features of the alleged conduct make it legally available to plaintiffs.
The CMI Claim: Why Intentionality Is the Legal Hinge
Most AI copyright disputes have turned on fair use. That framing favors defendants: fair use is a broad, fact-specific doctrine, and several major cases have produced mixed or inconclusive results at early stages. The publisher complaint takes a different angle.
The Wall Street Journal reported that plaintiffs allege Meta stripped Copyright Management Information from works before training. CMI includes authorship data, copyright notices, and licensing terms embedded in digital files. The statutory provision at issue is 17 U.S.C. § 1202, which prohibits removing or altering CMI when a party “knows, or with reasonable grounds to know,” that the removal will facilitate infringement.
That “knows or has reason to know” standard is load-bearing. Stripping CMI isn’t a neutral technical preprocessing step. It’s a documented act that, under § 1202’s logic, suggests the party doing it understood the material was protected and removed the identifier to obscure that fact. A fair use defense is harder to maintain when the record shows the defendant affirmatively removed the signals that would have identified the work as protected. A federal judge has already found that Meta must answer the CMI claim, meaning the theory is sufficiently plausible to proceed past initial review.
This is the legal mechanism that makes individual executive liability available. If the complaint can show that the CMI removal was a deliberate, authorized step in the training data pipeline, not a rogue technical choice, then the question becomes: who authorized it?
The Zuckerberg Authorization Theory
According to multiple news reports citing the complaint, plaintiffs allege that Zuckerberg “personally authorized and actively encouraged” the alleged infringement. That language, if it appears in the complaint as reported, is drafted to thread a specific legal needle.
Corporate officers can be held personally liable for intellectual property infringement, but not automatically, and not without showing something beyond their role as CEO. Courts have generally required plaintiffs to demonstrate that the individual had direct knowledge of the infringing activity, had the ability to supervise or control it, and received a financial benefit from it. Some circuits have applied what’s called the “volitional conduct” standard: the officer must have made a meaningful choice, not just presided over a company that infringed.
The phrase “personally authorized and actively encouraged” maps onto that standard. It’s not alleging that Zuckerberg ran Meta when the alleged infringement occurred. It’s alleging he knew, directed, and benefited. Whether plaintiffs can produce evidence for that allegation through discovery is a separate question. For now, a judge must determine whether the theory is legally plausible. Given that the CMI claim has already cleared an early hurdle, the personal liability theory is not obviously defective.
How This Case Sits Alongside Prior AI Copyright Actions
The litigation landscape around AI training data has moved quickly. What this case adds is a structural comparison worth mapping:
Anthropic reportedly settled a comparable author-led class action in 2025 for $1.5 billion without admitting wrongdoing, but no individual executive was named. The settlement established a rough scale of corporate exposure for comparable conduct. It didn’t establish what happens when the theory extends to persons.
Several other actions, including proceedings in international jurisdictions, have focused on model outputs (generated content that reproduces protected expression) rather than training inputs (using protected works without authorization in the first place). The SDNY complaint focuses on training inputs, specifically the alleged intentional acquisition and CMI stripping of those inputs. That framing keeps the case in a narrower channel with a clearer evidentiary theory.
The CMI removal allegation also distinguishes this case from the broader “did AI training constitute fair use” question that courts are still working through. § 1202 doesn’t ask whether the use was fair. It asks whether the defendant removed identifying information knowing, or having reason to know, it would enable infringement. That’s a different question, with a different evidentiary path.
What the Stakeholder Map Looks Like
The publishers and plaintiffs are seeking class-action certification, which means the outcome, in either direction, has implications beyond this particular complaint. A successful resolution for plaintiffs could bring every major AI developer’s training data practices into examination. A dismissal on the personal liability theory would signal that CMO-level copyright theories don’t transfer to individual executives, at least in this circuit.
Meta’s position will almost certainly involve a motion to dismiss. The personal liability theory against Zuckerberg is the most likely target. If the CMI claim advances but the individual defendant theory is dismissed, the case reverts to a corporate copyright dispute with Anthropic’s settlement as the financial reference point.
For AI developers watching this case, the relevant question isn’t the verdict, it’s what discovery, if it proceeds, requires them to produce. Training data acquisition decisions, internal communications about licensing, and documentation of preprocessing choices are the categories at risk. Companies that made those decisions in writing, at the executive level, face a different discovery exposure than those that made them at the engineering level with no documented authorization chain.
The Compliance and Governance Implication
The CMI removal allegation points toward a specific governance gap: the chain of authorization for training data acquisition decisions. In most AI organizations, data acquisition is treated as an engineering problem. The legal team reviews outputs or assesses fair use in the abstract. The question of who authorized the inclusion of specific data sources, and whether that authorization was documented, often has no clear answer.
If this complaint advances, it will make that gap expensive. Not because every company stripped CMI, but because discovery asks the question and the inability to produce a clean authorization chain looks like something it may not be.
Researchers tracking this litigation have noted that the CMI claim represents one of the most concrete statutory hooks available to copyright plaintiffs in the AI training context. It sidesteps some of the uncertainty around fair use by targeting a specific documented act rather than a legal characterization of the use itself.
The governance question worth asking now: does your organization have documented records of who authorized training data acquisition decisions, what due diligence was performed on the copyright status of included sources, and what, if anything, happened to embedded copyright identifiers during preprocessing? Those records either protect executive decision-makers or expose them. The SDNY complaint is a clear signal that plaintiffs’ counsel knows how to use that distinction.
This case is in its earliest stages. The allegations are unproven. No liability has been found. But the theory is on the table, it has cleared an initial procedural checkpoint, and it will now generate discovery pressure that has never existed in AI copyright litigation before.
The industry doesn’t get to wait for a verdict before building the governance infrastructure that would answer the questions discovery will ask.