The Atlantic published an investigation on June 16, 2026 identifying four datasets totaling approximately 21 million music tracks that the publication says were used to train Suno and Udio – the AI music generation platforms already facing copyright suits from Sony Music, Universal Music Group, and other major labels. The investigation names Taylor Swift, Bad Bunny, Billie Eilish, and Nirvana among the artists whose recordings appear in the identified datasets, according to The Atlantic’s reporting.
Twenty-one million tracks is a number that needs context. The datasets The Atlantic identified break down roughly into a 12-million-track collection and a 9-million-track collection, two distinct bodies of material, not a single monolithic repository. The 21-million-track total comes from The Atlantic’s investigation; the specific datasets aren’t named in available reporting.
The litigation angle is direct. Suno and Udio are already defendants in active copyright suits brought by major labels. Hypebeast reported that labels are now seeking to add approximately 61,000 specific recordings to those existing suits, according to Hypebeast’s coverage, though that figure is single- source and should be treated as reported, not confirmed. What The Atlantic’s investigation adds is a broader mapping of the alleged training corpus: if the dataset scale is accurate, the litigation exposure runs well beyond the recordings already named in the complaints.
The investigation is original reporting by a credible publication conducting its own dataset analysis. That makes it a different category of source than a court filing or regulatory action – it’s investigative journalism, not an official finding. Courts will decide what actually happened. But investigative findings of this scale tend to enter the litigation record through discovery requests and expert analysis, so the The Atlantic investigation is likely to become evidence in the active suits, not just context for them.
The named artists matter symbolically but they also matter legally. Swift, Eilish, Bad Bunny – these aren’t obscure catalog entries. They’re current, commercially active recordings with substantial rights owners who have the resources and motivation to pursue enforcement. The presence of Nirvana recordings adds a catalog layer: estate-controlled rights are often vigorously protected.
Warning
The Atlantic's dataset-mapping methodology matters as much as its findings. If a journalist can identify training corpora at scale, opposing counsel in active litigation can too. AI music platforms that haven't documented training data provenance are now operating in an environment where that documentation gap is discoverable.
What this means for AI music platforms beyond Suno and Udio: the dataset mapping work The Atlantic conducted can in principle be replicated or extended. If investigators can identify training corpora at scale, so can opposing counsel in litigation. Any AI music platform that assembled training data without clear licensing documentation is now operating in an environment where that data’s provenance can be identified, mapped, and introduced in court.
Don’t expect the active Suno/Udio suits to pause while this investigation is absorbed. Courts don’t wait for press coverage. But the investigation materially strengthens the factual record plaintiffs can draw on. The real question is how quickly labels amend existing complaints to incorporate the dataset-scale findings The Atlantic has published.