The Llama Open Source and Copyright Debate (2026)
Two questions follow Meta Llama everywhere it goes. Is it really open source? And did Meta break the law by training it on copyrighted books? Both questions have generated heated public argument, a standards-body rebuke, and a federal lawsuit that is still partly unresolved. This breakdown separates what has been decided from what is still in dispute, attributes every criticism to the party that made it, and pairs each criticism with Meta's stated position.
The single most important distinction in this article is also the easiest one to get wrong. On June 25, 2025, a federal judge ruled that Meta's use of authors' books to train Llama qualified as fair use. That is a court decision, a fact. But a separate allegation in the same case, that Meta downloaded and shared pirated copies of those books to obtain the training data, was not resolved by that ruling and continues to be litigated. Meta winning the training question does not mean Meta was cleared of everything.
What the Llama Community License Actually Says
Before weighing the open source debate, it helps to read the license on its own terms. The Llama Community License grants a royalty-free, limited license to use, reproduce, and modify the Llama materials. It is permissive in many respects, and most developers and businesses can use Llama freely. The conditions below are the ones that shape the open source argument, because they are restrictions that a standard open source license would not impose.
| License term | What it means in practice |
|---|---|
| Royalty-free grant | You may use, reproduce, and modify the Llama materials at no cost, subject to the conditions below. |
| 700 million MAU threshold | If your products had more than 700 million monthly active users in the calendar month before the relevant Llama version was released, you must request a commercial license from Meta, granted at Meta's sole discretion. |
| Attribution | You must include a copy of the license and prominently display Built with Llama on related materials. |
| Derivative naming | Any other AI model you create using the Llama materials must include Llama at the start of its name. |
| No improving other models | You may not use Llama or its outputs to train or improve any large language model other than Llama or a Llama derivative. Distilling Llama outputs into a competitor's model is treated as a breach. |
| Acceptable Use Policy | Prohibits illegal use, child sexual abuse material, chemical, biological, radiological, nuclear, or high-yield explosive weapons, defamation, and unauthorized medical or legal advice, among other uses. |
| EU multimodal carve-out | For Llama 4, the multimodal rights are not granted to individuals or companies domiciled in the European Union. |
License terms are version-specific. The 700 million monthly active user threshold and the Llama 4 European Union multimodal carve-out apply to particular Llama versions. Always read the license attached to the exact version you intend to deploy.
Is Llama Open Source? The Contested Label
Meta consistently describes Llama as open source, and that framing is central to how the company positions the project. The dispute is not about whether the weights are downloadable, which they are. The dispute is about whether a model released under the conditions above can accurately be called open source. Several standards bodies and academics say it cannot, and they have said so on the record.
The criticism, attributed
The Open Source Initiative. In July 2023, the executive director of the Open Source Initiative, Stefano Maffulli, argued that calling Llama open source is polluting the term. The Open Source Initiative's position is that the Llama Community License violates the Open Source Definition because it discriminates against persons or groups, through the 700 million monthly active user restriction, and against fields of endeavor, through the Acceptable Use Policy. In October 2024, the Open Source Initiative published its Open Source AI Definition, which requires a level of transparency about training data that the Open Source Initiative says Meta does not provide.
Mark Dingemanse, Radboud University. In July 2023, the linguist Mark Dingemanse called the open source labeling positively misleading, pointing to the absence of released source and the undocumented training data behind the model.
Nature, November 2024. In an analysis published in Nature, authors Widder, Whittaker, and Myers West described models marketed this way as examples of openwashing, arguing that such systems are better understood as closed than as open.
The Free Software Foundation. In January 2025, the Free Software Foundation classified the Llama 3.1 license as a nonfree software license.
Because of these objections, many reviewers and outlets now prefer the terms open weight or source-available to describe Llama. These terms acknowledge that the weights can be downloaded and run while signaling that the broader freedoms of open source, and full training-data transparency, are not present. It is worth being precise here: open weight is the critics' preferred framing, not an undisputed fact. Meta markets Llama as open source and disagrees with that reframing.
Meta's position
Meta's defense of the open source label is public and consistent. In a July 2024 open letter, Mark Zuckerberg argued that open source AI is the path that best prevents a concentration of power in the hands of a few companies, framing Meta's release strategy as a benefit to the broader ecosystem. When the Open Source Initiative's definition drew renewed attention, a Meta spokesperson told The Verge in October 2024 that Meta disagrees with the Open Source Initiative's definition of open source AI. Meta's view is that its release approach delivers the practical benefits people associate with open source, even if it does not satisfy every condition the Open Source Initiative sets out.
Both sides agree on the underlying mechanics: the weights are available, and the training data is not fully disclosed. The disagreement is over what to call that arrangement. Reasonable readers can weigh the standards bodies' definitions against Meta's stated rationale and decide which framing they find more persuasive.
Kadrey v. Meta: The Copyright Case
On July 7, 2023, authors Richard Kadrey, Sarah Silverman, and Christopher Golden filed suit against Meta in the United States District Court for the Northern District of California. The case is captioned Kadrey v. Meta Platforms, case number 3:23-cv-03417. The plaintiffs alleged that Meta trained Llama on pirated books drawn from shadow libraries, specifically the Books3 section of a dataset known as ThePile, sourced from a site called Bibliotik, and the shadow library LibGen.
It is important to separate the distinct claims in the case, because they have reached very different stages.
What was dismissed
In November 2023, Judge Vince Chhabria dismissed the plaintiffs' claims that Llama's outputs themselves infringe their copyrights and that Llama as a model is itself an infringing work. The plaintiffs were given leave to amend. This early ruling narrowed the case but did not end it.
The June 25, 2025 fair-use ruling
On June 25, 2025, Judge Chhabria granted summary judgment for Meta on the question of fair use of the books for training. The court ruled that the use was highly transformative, and it found that Meta's guardrails kept Llama from reproducing more than roughly 50 words of any of the books, so the plaintiffs had not proven that Llama acts as a market substitute for the works. Meta's position throughout was that training is transformative and does not reproduce, or provide meaningful access to, the books.
The judge was unusually explicit that the ruling was narrow. Chhabria stressed that the decision turned on the specific record before him and that future plaintiffs who develop stronger evidence of market harm could prevail on similar facts. In other words, the ruling is a win for Meta on these plaintiffs' evidence, not a blanket declaration that training on copyrighted books is always fair use.
What is still being litigated
The fair-use ruling did not resolve a separate allegation: that Meta acquired the training data by downloading, and seeding through torrenting, pirated copies of books from LibGen. That claim, focused on the method of obtaining the data rather than the act of training, continues. A Fourth Amended Complaint was filed in April 2026. As of this writing, that part of the dispute is unresolved and remains an allegation, not a finding. Litigation status last checked June 2026; consult the court docket for any later ruling or settlement.
Using the authors' books to train Llama was ruled fair use on June 25, 2025. The court called it highly transformative and found no proven market substitution, while stressing the ruling was narrow.
Court decisionThe separate allegation that Meta downloaded and seeded pirated books from LibGen to obtain the data continues. A Fourth Amended Complaint was filed in April 2026. This remains an unresolved allegation.
Open allegationWhether Llama is open source is a matter of definition. Meta says yes; the Open Source Initiative and the Free Software Foundation say no, and critics prefer open weight or source-available. No court has ruled on the label.
Public disputeA common misreading of the June 2025 ruling is that Meta was cleared of all wrongdoing in the case. That is not accurate. The fair-use ruling addressed the use of the books for training. The allegation about how Meta obtained the books through LibGen was not decided and is still active.
Timeline: How Both Debates Unfolded
Why These Debates Matter for Anyone Using Llama
For teams evaluating Llama, the two debates have practical consequences that go beyond terminology and headlines.
The label affects compliance and procurement
If your organization has a policy that requires genuinely open source components, or that maps to the Open Source Initiative's definition, then the open weight distinction is not academic. Under the Open Source Initiative's and Free Software Foundation's reasoning, Llama would not qualify as open source for that policy, and the 700 million monthly active user clause, the derivative naming requirement, and the restriction on improving other models are real contractual obligations. Reading the license attached to your specific Llama version is the safe path.
The copyright case is still developing
The fair-use ruling is a meaningful data point for anyone weighing the legal posture of models trained on large text corpora, but Judge Chhabria's own caution means it is not a settled, portable precedent. The surviving allegation about how the training data was obtained is also unresolved. Organizations with low risk tolerance should treat the legal landscape around training-data provenance as still in motion rather than closed.