On May 4, a federal court ruled that authors could proceed with copyright infringement claims against NVIDIA Corp. in Nazemian et al. v. NVIDIA Corp., Case No. 4:24-cv-01454-JST, before Judge Jon S. Tigar in the Northern District of California. The ruling is a procedural milestone, not a liability finding, but the legal theory it allowed through is worth understanding carefully.
According to legal analysis of the ruling, the direct infringement claims survived based on the act of copying training data inputs, not on whether NVIDIA’s model outputs resemble the original works. This is the distinction that makes the case worth tracking. Most AI copyright litigation has centered on whether generated outputs are substantially similar to protected works. Nazemian runs on a different theory: that the act of ingesting copyrighted material into a training dataset is itself the infringement, regardless of what the model produces.
The datasets named in the complaint, Books3, The Pile, SlimPajama, and Anna’s Archive, are data sources that appear across multiple AI copyright cases. Books3 and Anna’s Archive in particular have been characterized as “shadow libraries” in other proceedings, meaning collections assembled from copyrighted books without license. Whether those datasets were actually used by NVIDIA in the manner alleged is a fact-dependent question, and the court reportedly treated it as such, meaning that question goes to discovery, not out the door at the pleading stage.
Fair use wasn’t decided. Per established federal civil procedure doctrine, fair use is a mixed question of law and fact and isn’t typically resolved on a Rule 12(b)(6) motion to dismiss. That defense remains available to NVIDIA, it simply wasn’t disposed of at this early stage.
What the ruling adds to the broader copyright picture: the input-copying theory now has a federal court’s acknowledgment that it’s legally cognizable at the pleading stage. That’s different from courts that have declined cases, and different from cases where the focus is on output similarity. Prior hub coverage of executive liability in copyright suits and the Publishers v. Meta case address related but distinct legal theories. This case runs parallel to, not identical with, those proceedings.
For legal and compliance teams: this is a single T3 source item, and the underlying ruling should be verified via PACER (Case No. 4:24-cv-01454-JST, N.D. Cal.) before any compliance program update. This briefing is legal reporting, not legal advice. Organizations making training data acquisition decisions based on copyright exposure should engage qualified legal counsel before acting on news coverage of this ruling.
What to watch: whether other defendants named in shadow library complaints face similar 12(b)(6) survival, and whether the discovery process in Nazemian produces documentation of NVIDIA’s actual data sourcing practices that becomes relevant to other cases.