Gallery

Contacts

405 W. Greenlawn Ave Lansing, Michigan 48910

contact@techjacksolutions.com

+1-616-320-4064

Skip to content
Regulation
Regulation Daily Brief

The Atlantic: Four AI Training Datasets Contain Approximately 21 Million Artists' Tracks

2 min read The Atlantic Partial Very Weak
An investigation by The Atlantic identified four datasets totaling approximately 21 million music tracks, including recordings by Taylor Swift, Bad Bunny, Billie Eilish, and Nirvana, allegedly used to train the AI music platforms Suno and Udio. The finding arrives as music labels are reportedly seeking to add 61,000 specific recordings to existing copyright litigation.
Training tracks identified, ~21M

Key Takeaways

  • The Atlantic investigation identified approximately 21 million music tracks in four datasets allegedly used to train Suno and Udio, including recordings by Taylor Swift, Bad Bunny,
  • Billie Eilish, and Nirvana
  • The 21-million-track figure comes from a single investigative source (The Atlantic); it has not been verified against the datasets or confirmed by Suno, Udio, or the labels
  • Music labels are reportedly seeking to add approximately 61,000 specific recordings to existing copyright suits, figure attributed to Hypebeast reporting, single source
  • Investigative dataset-mapping work of this kind typically enters litigation through discovery requests and expert analysis, making it likely to influence the active suits, not just contextualize them
Music tracks in alleged training datasets
~21M
Across four datasets identified in The Atlantic investigation, allegedly used to train Suno and Udio

Verification

Partial The Atlantic (investigative journalism); Hypebeast (61,000-recording figure) Dataset totals are investigative findings, not court-confirmed. 61,000 recording amendment figure is single-source.

The Atlantic published an investigation on June 16, 2026 identifying four datasets totaling approximately 21 million music tracks that the publication says were used to train Suno and Udio – the AI music generation platforms already facing copyright suits from Sony Music, Universal Music Group, and other major labels. The investigation names Taylor Swift, Bad Bunny, Billie Eilish, and Nirvana among the artists whose recordings appear in the identified datasets, according to The Atlantic’s reporting.

Twenty-one million tracks is a number that needs context. The datasets The Atlantic identified break down roughly into a 12-million-track collection and a 9-million-track collection, two distinct bodies of material, not a single monolithic repository. The 21-million-track total comes from The Atlantic’s investigation; the specific datasets aren’t named in available reporting.

The litigation angle is direct. Suno and Udio are already defendants in active copyright suits brought by major labels. Hypebeast reported that labels are now seeking to add approximately 61,000 specific recordings to those existing suits, according to Hypebeast’s coverage, though that figure is single- source and should be treated as reported, not confirmed. What The Atlantic’s investigation adds is a broader mapping of the alleged training corpus: if the dataset scale is accurate, the litigation exposure runs well beyond the recordings already named in the complaints.

The investigation is original reporting by a credible publication conducting its own dataset analysis. That makes it a different category of source than a court filing or regulatory action – it’s investigative journalism, not an official finding. Courts will decide what actually happened. But investigative findings of this scale tend to enter the litigation record through discovery requests and expert analysis, so the The Atlantic investigation is likely to become evidence in the active suits, not just context for them.

The named artists matter symbolically but they also matter legally. Swift, Eilish, Bad Bunny – these aren’t obscure catalog entries. They’re current, commercially active recordings with substantial rights owners who have the resources and motivation to pursue enforcement. The presence of Nirvana recordings adds a catalog layer: estate-controlled rights are often vigorously protected.

Warning

The Atlantic's dataset-mapping methodology matters as much as its findings. If a journalist can identify training corpora at scale, opposing counsel in active litigation can too. AI music platforms that haven't documented training data provenance are now operating in an environment where that documentation gap is discoverable.

What this means for AI music platforms beyond Suno and Udio: the dataset mapping work The Atlantic conducted can in principle be replicated or extended. If investigators can identify training corpora at scale, so can opposing counsel in litigation. Any AI music platform that assembled training data without clear licensing documentation is now operating in an environment where that data’s provenance can be identified, mapped, and introduced in court.

Don’t expect the active Suno/Udio suits to pause while this investigation is absorbed. Courts don’t wait for press coverage. But the investigation materially strengthens the factual record plaintiffs can draw on. The real question is how quickly labels amend existing complaints to incorporate the dataset-scale findings The Atlantic has published.

View Source
More Regulation intelligence
View all Regulation

Stay ahead on Regulation

Get verified AI intelligence delivered daily. No hype, no speculation, just what matters.

Explore the AI News Hub