From Training Data to Supercomputers: Has AI Copyright Litigation Found Its New Target?

June 29, 2026 6 min read 36Kr Partial Strong

Tech Jacks Solutions AI News Coverage

The New York Times' third amended complaint against OpenAI and Microsoft doesn't just add claims, it shifts the entire theory of liability from what was done with copyrighted data to who built the machinery that made it possible. If courts accept even a portion of the infrastructure liability argument, the risk exposure for AI compute providers, API distributors, and cloud infrastructure operators changes in ways that existing indemnification contracts weren't written to handle. Two separate lawsuits this week make that theory more than a novelty, it's becoming the front edge of AI copyright litigation.

ai-copyright copyright-litigation openai-microsoft dmca infrastructure-liability fair-use local-journalism nyt-openai wehco

Microsoft training system, 285,000 CPUs

Key Takeaways

NYT's third amended complaint introduces infrastructure liability theory: Microsoft's purpose-built supercomputer (285,000+ CPUs, 10,000 GPUs) allegedly enabled the infringement
The Grokster precedent, not Betamax, is the legal framework NYT is reaching for; intent and purpose-built enablement, not neutral infrastructure provision
WEHCO coalition (~400 newspapers, reported, unconfirmed in detail) represents collective action by local publishers unable to litigate individually at this scale
Companies providing AI-specific compute infrastructure should review indemnification contracts now, existing clauses weren't written for infrastructure liability theories

AI Copyright Litigation: Infrastructure Liability Theory

New York Times

against

Third amended complaint: Microsoft purpose-built 285,000-CPU supercomputer to enable allegedly infringing OpenAI training

WEHCO coalition (~400 papers, reported)

against

Separate suit: automated tools scraped paywalled content and stripped DMCA copyright management information (details unconfirmed)

OpenAI

neutral

Defendant in both suits; reportedly asserting fair use defense for publicly available training data

Microsoft

against

Named in NYT amended complaint as active inducer, not neutral infrastructure provider, via purpose-specific AI training supercomputer

Timeline

2023-12-01 NYT original suit filed

2026-06-18 Litigation theory shift identified

2026-06-24 WEHCO coalition suit filed (reported)

2026-06-25 NYT third amended complaint

AI copyright litigation has been asking the same question for two years: did training on
copyrighted data without authorization constitute infringement? Courts are still working
through it. The cases are complex, the fair use defenses are contested, and the outcome
remains genuinely uncertain.

This week, the NYT’s lawyers filed a third amended complaint that asks a structurally
different question. Not “was training on our content infringement?” but “who built the
computer that made the training possible, and do they share liability?”

That’s the infrastructure liability theory. It’s newer, less tested, and, if it develops
traction, more far-reaching than anything the first generation of AI copyright suits
raised.

—

The two actions: what each actually alleges

Two legally distinct actions arrived in the same week. They’re related but shouldn’t be
conflated.

The WEHCO coalition. According to reports, a coalition of approximately 400 local and
regional U.S. newspapers, reportedly led by WEHCO Newspapers Inc. (publisher of the
Arkansas Democrat-Gazette and the Chattanooga Times Free Press), filed a federal copyright
suit against OpenAI and Microsoft around June 24, 2026. The complaint reportedly alleges
that automated tools were used to systematically scrape paywalled content from member
publications and strip copyright management information from articles in violation of the
DMCA. Specific tool names cited in the complaint could not be independently confirmed. These details are drawn from a single secondary source, court document verification is
pending and should be treated as unconfirmed specifics.

The NYT amended complaint. On June 25, 2026, the New York Times applied to submit a third
amended complaint in its existing copyright suit against OpenAI and Microsoft. This filing
is confirmed. 36Kr’s reporting on the
complaint confirms the core allegation: that Microsoft didn’t just provide generic cloud
computing services but “actively induced, assisted, and facilitated large-scale copyright
infringement” by building a purpose-specific supercomputing system. That system, according
to the complaint, contains more than 285,000 CPU cores and 10,000 GPUs, infrastructure
designed and built to enable OpenAI’s AI model training at scale.

The distinction between these two suits matters legally. The WEHCO coalition is asserting
direct and DMCA violations, it’s a claim about what happened to their content. The NYT’s
amended complaint is adding an affirmative inducement theory, it’s a claim about Microsoft’s
role in making the alleged infringement possible.

—

The infrastructure liability theory: what it means legally

Copyright law has long recognized a spectrum of liability beyond direct infringement. Contributory infringement holds a party liable if it knows of infringing activity and
materially contributes to it. Vicarious liability applies if a party profits from the
infringement and has the right and ability to supervise it. Inducement liability, the
theory closest to what NYT is asserting, holds that actively encouraging or enabling
infringement creates liability, even for a party that didn’t directly copy anything.

The NYT’s characterization of Microsoft as an entity that “actively induced, assisted, and
facilitated” the alleged infringement positions the argument firmly in inducement territory. That’s a deliberate choice. Pure contributory liability requires showing Microsoft knew
about specific infringing acts. Inducement can be shown through evidence of intent and
purpose-built enablement, and a 285,000-CPU supercomputer built specifically for AI
training is exactly the kind of purpose-built infrastructure that inducement arguments
reach for.

Microsoft’s likely response is that it was providing infrastructure services, not making
editorial decisions about what content to train on. General-purpose infrastructure
providers have historically had strong defenses against secondary liability. But the NYT’s
framing, purpose-built, customized, specific to OpenAI’s training operations, is
designed to undercut the “neutral infrastructure” defense.

Legal Theory Comparison: Two Precedents

Sony Betamax (1984)

Technology capable of substantial non-infringing use generally doesn't create manufacturer liability

MGM v. Grokster (2005)

Distribution with intent to promote infringing use creates liability, intent and purpose matter

NYT v. Microsoft (2026 theory)

Purpose-built, customized supercomputer for specific AI training operations, Grokster framing applied to infrastructure

Infrastructure Liability: Supply Chain Exposure

AI model developershighDirect defendants in both suits; primary copyright liability exposure

Purpose-built AI infrastructure providershighNYT theory explicitly reaches custom AI training compute; indemnification contracts likely inadequate

General cloud providers with AI workloadsmediumLess exposed than purpose-built providers; neutral infrastructure defense stronger but not untested

API distributors and embedding serviceslowFurther from training activity; indirect exposure depends on how broadly courts define the supply chain

Courts have seen this argument structure before. The Sony Betamax case established that
technologies capable of substantial non-infringing uses generally don’t create liability
for their manufacturers. The Grokster case modified that: when a device is distributed
with the intent to promote infringing use, the manufacturer can be liable. The NYT’s
amended complaint is building a Grokster-style argument: Microsoft didn’t just sell
general cloud capacity, it built a custom machine intended to enable specific operations
that NYT claims were infringing.

Whether that argument succeeds depends entirely on what evidence exists about Microsoft’s
intent and knowledge when it built the system. That’s a fact-intensive inquiry that won’t
resolve quickly.

—

The litigation trajectory: how we got here

The NYT’s original suit against OpenAI was filed in December 2023, the opening shot in
what has become a wave of AI copyright litigation. TJS covered a significant development on
June 18 when the litigation landscape shifted toward new theories beyond the training
data question. The third amended complaint represents a further escalation: dropping some
theories that proved harder to sustain (TJS covered NYT dropping contributory claims in
its June 26 brief) and adding the infrastructure theory that NYT’s legal team apparently
now views as more promising.

That pattern, narrowing and sharpening claims over successive amendments, is common in
complex copyright litigation. It doesn’t necessarily indicate weakness; it often signals
that plaintiffs have learned what evidence exists and are focusing on the strongest claims.

The WEHCO coalition, if the reports are accurate, represents a different development:
organized collective action by local news organizations, many of which lack the legal
budget to litigate individually against companies of OpenAI’s and Microsoft’s scale. Coalition suits of this type are a recognized strategy for amplifying plaintiff leverage
in copyright cases.

—

Supply chain exposure: who else should be watching

The infrastructure liability theory, if it develops legal traction, has consequences well
beyond Microsoft.

Any company that provides purpose-built AI training infrastructure, specialized compute
clusters, optimized networking, model-specific storage systems, faces a version of the
same question: is providing infrastructure to a company that subsequently infringes
copyright enough for liability, when the infrastructure was designed with that company’s
training operations in mind?

That question reaches cloud providers, co-location facilities with AI-specific buildouts,
and hardware manufacturers who’ve sold purpose-configured systems to AI labs. It also
reaches API distributors and embedding service providers who might be characterized as
amplifying the reach of allegedly infringing model outputs.

Compliance and legal teams at companies in the AI supply chain should be reviewing their
indemnification arrangements now. Standard cloud computing contracts typically include
indemnification for IP claims arising from customer content, but those clauses weren’t
written with purpose-built AI training infrastructure in mind. Whether they cover
infrastructure liability theories is a question that should be answered before a court
asks it.

Warning

Standard cloud computing indemnification contracts typically cover IP claims arising from customer content, not infrastructure liability theories. Companies providing purpose-built AI training compute should verify whether their contracts cover the specific legal theory NYT is now asserting before courts develop the factual record.

What to Watch

Federal court ruling on acceptance of NYT's third amended complaintQ3 2026

PACER, WEHCO coalition complaint filing; verify tool names, newspaper count, DMCA specifics2-4 weeks

Additional publisher coalitions filing similar suitsQ3-Q4 2026

Microsoft legal response to infrastructure liability theoryQ3 2026

—

What to watch

Three near-term signals matter:

First, whether the federal court accepts the NYT’s third amended complaint. If the court
rejects it, the infrastructure theory loses its most prominent test vehicle. If accepted,
discovery proceeds and the factual record about Microsoft’s knowledge and intent starts
to develop.

Second, court documents in the WEHCO coalition suit. The specific tool names, newspaper
count, and DMCA CMI removal allegations in the WEHCO complaint are currently single-source
and unconfirmed. PACER filings will either confirm or complicate the reported details.

Third, whether other publishers move to file similar coalition suits. The WEHCO model –
if confirmed, demonstrates an organizational structure for smaller publishers to pursue
claims they couldn’t sustain individually.

—

TJS synthesis

The infrastructure liability theory is the most consequential legal development in AI
copyright litigation since the original NYT filing in 2023. It doesn’t ask whether AI
training was lawful. It asks whether building the machine that made training possible
creates liability for the builder, and it does so with a specific, named supercomputing
system as its exhibit. Courts will take years to resolve this. But the companies whose
lawyers are reading this complaint carefully today and asking “does our indemnification
structure cover this?” are better positioned than those who wait for a ruling to find
out the answer is no.

More coverage of OpenAI

Markets Jul 2

Tech Sector H1 2026 Job Cuts Surge 83% Year-Over-Year as AI Restructuring Tops Challenger...

Markets Jun 28

GPT-5.6 Sol Is Previewed, Not Released: The US Government's Role in OpenAI's Rollout

Regulation Jun 29

400 Newspapers and NYT Target OpenAI and Microsoft, Now the Computers Are the Defendants

Markets Jun 27

OpenAI Reportedly Leans Toward 2027 IPO as Advisers Set $1 Trillion Valuation Threshold

Markets Deep Dive Jun 26

Beyond GPUs: What Jalapeño Signals About the Custom Silicon Race and Nvidia's Inference Moat

View Source

More Regulation intelligence

View all Regulation

Gallery

Contacts