âšď¸ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0.3 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value | |||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| URL | https://www.theatlantic.com/technology/archive/2025/09/youtube-ai-training-data-sets/684116/ | |||||||||||||||||||||||||||||||||
| Last Crawled | 2026-04-15 13:13:45 (7 days ago) | |||||||||||||||||||||||||||||||||
| First Indexed | 2025-09-10 15:14:19 (7 months ago) | |||||||||||||||||||||||||||||||||
| HTTP Status Code | 200 | |||||||||||||||||||||||||||||||||
| Content | ||||||||||||||||||||||||||||||||||
| Meta Title | At Least 15 Million YouTube Videos Have Been Snatched by AI Companies - The Atlantic | |||||||||||||||||||||||||||||||||
| Meta Description | At least 15 million videos have been snatched by tech companies. | |||||||||||||||||||||||||||||||||
| Meta Canonical | null | |||||||||||||||||||||||||||||||||
| Boilerpipe Text | Editorâs note: This analysis is part of
The Atlantic
âs investigation into how YouTube videos are taken to train AI tools. You can use the search tool directly
here
, to see whether videos youâve created or watched are included in the data sets. This work is part of
AI Watchdog
,
The Atlantic
âs ongoing investigation into the generative-AI industry.
W
hen Jon Peters uploaded his first video
to YouTube in 2010, he had no idea where it would lead. He was a professional woodworker running a small business who decided to film himself making a dining table with some old legs he had found in a barn. It turned out that people liked his candid style, and as he posted more videos, a fan base began to grow. âAll of a sudden thereâs people who appreciate the work Iâm doing,â he told me. âThe comments were a motivator.â Fifteen years later, his channel has more than 1 million subscribers. Sometimes he gets photos of people in their shops, following his guidance from a big TV on the wallâmost of his viewers, Peters told me, are woodworkers looking to him for instruction.
But
Petersâs channel
could soon be obsolete, along with millions of other videos created by people who share their expertise and advice on YouTube. Over the past few months, Iâve discovered more than 15.8 million videos from more than 2 million channels that tech companies have, without permission, downloaded to train AI products. Nearly 1 million of them, by my count, are how-to videos. You can find these videos in at least 13 different data sets distributed by AI developers at tech companies, universities, and research organizations, through websites such as Hugging Face, an online AI-development hub.
In most cases the videos are anonymized, meaning that titles and creator names are not included. I was able to identify the videos by extracting unique identifiers from the data sets and looking them up on YouTubeâsimilar to the process I followed when I revealed the contents of the
Books3
,
OpenSubtitles
, and
LibGen
data sets. You can search the data sets using the tool below, typing in channel names like âMrBeastâ or âJames Charles,â for example.
(
A note for users: Just because a video appears in these data sets does not mean it was used for training by AI companies, which could choose to omit certain videos when developing their products.
)
To create AI products capable of generating video, developers need huge quantities of videos, and YouTube has become a common source. Although YouTube does offer paying subscribers the ability to download videos and watch them through the companyâs app whenever theyâd like, this is something different: Video files are being ripped from YouTube en masse and saved in files that are then fed to AI algorithms. This kind of downloading
violates the platformâs terms of service
, but many tools allow AI developers to download videos in this way. YouTube appears to have done little, if anything, to stop the mass downloading, and the company did not respond to my request for comment.
Not all YouTube videos are copyrighted (and some are uploaded by people who donât own the copyrights), but many are. Unauthorized copying or distribution of those videos is illegal, but whether AI training constitutes a form of copying or distribution is still a question being debated in many ongoing lawsuits. Tech companies have argued that training is a âfair useâ of copyrighted work, and some
judges have disagreed
in their responses. How the courts ultimately apply the law to this novel technology could have massive consequences for creatorsâ motivations to post their work on YouTube and similar platformsâif tech companies are able to continue taking creatorsâ work to build AI products that compete with them, then creators may have little choice but to stop sharing.
G
enerative-AI tools are already producing
videos that compete with human-made work on YouTube. AI-generated history videos with hundreds of thousands of views and many inaccuracies
are drowning out
fact-checked, expert-produced content. Popular music-remix videos are frequently created
using this technology
, and many of them perform better than human-made videos.
The problem extends far beyond YouTube, however. Most modern chatbots are âmultimodal,â meaning they can respond to a question by creating relevant media. Googleâs Gemini chatbot, for instance, will produce short clips for paying users. Soon, you may be able to ask ChatGPT or another generative-AI tool about how to build a table from found legs and get a custom how-to video in response. Even if that response isnât as good as any video Peters would make, it will be immediate, and it will be tailor-made to your specifications. The online-publishing business has already been
decimated by text-generation tools
; video creators should expect similar challenges from generative-AI tools in the near future.
Many major tech companies have used these data sets to train AI, according to research papers Iâve read and AI developers Iâve spoken with. The group includes Microsoft, Meta, Amazon, Nvidia, Runway, ByteDance, Snap, and Tencent. I reached out to each of these companies to ask about their use of these data sets. Only Meta, Amazon, and Nvidia responded. All three said they ârespectâ content creators and believe that their use of the work is legal under existing copyright law. Amazon also shared that, where video is concerned, it is currently focused on developing ways to generate âcompelling, high-quality advertisements from simple prompts.â
We canât be certain whether all these companies will use the videos to create for-profit video-generating tools. Some of the work theyâve done may be simply experimental. But a few of these companies have an obvious interest in pursuing commercial products: Meta, for instance, is developing a suite of tools called
Movie Gen
that creates videos from text prompts, and Snap offers
âAI Video Lensesâ
that allow users to augment their videos with generative AI. Videos such as the ones in these data sets are the raw material for products like these; much as ChatGPT couldnât write like Shakespeare without first âreadingâ Shakespeare, a video generator couldnât construct a fake newscast without âwatchingâ tons of recorded broadcasts. In fact, a large number of the videos in these data sets are from news and educational channels, such as the BBC (which has at least 33,000 videos in the data sets, across its various brands) and TED (nearly 50,000). Hundreds of thousands of othersâif not moreâare from individual creators, such as Peters.
AI companies are more interested in some videos than others. A spreadsheet leaked
to
404 Media
by a former employee at Runway, which builds AI video-generation tools, shows what the company valued about certain channels: âhigh camera movement,â âbeautiful cinematic landscapes,â âhigh quality scenes from movies,â âsuper high quality sci-fi short films.â One channel was labeled âTHE HOLY GRAIL OF CAR CINEMATICS SO FARâ; another was labeled âonly 4 videos but they are really well done.â
Developers seek out high-quality videos in a variety of ways. Curators of two of the data sets collected hereâHowTo100M and HD-VILA-100Mâprioritized videos with high view counts on YouTube, equating popularity with quality. The creators of another data set, HD-VG-130M,
noted
that âhigh view count does not guarantee video quality,â and used an AI model to select videos of high âaesthetic quality.â Data-set creators often try to avoid videos that contain overlaid text, such as subtitles and logos, so these identifying features donât appear in videos generated by their model. So, some advice for YouTubers: Putting a watermark or logo on your videos, even a small one, makes them less desirable for training.
To prepare the videos for training, developers split the footage into short clips, in many cases cutting wherever there is a scene or camera change. Each clip is then given an English-language description of the visual scene so the model can be trained to correlate words with moving images, and to generate videos from text prompts. AI developers have a few methods of writing these captions. One way is to pay workers to do it. Another is to use separate AI models to generate a description automatically. The latter is more common, because of its lower cost.
A
I video tools arenât yet
as mainstream as chatbots or image generators, but they are already in wide use. You may already have seen AI-manipulated video without realizing it. For example, TED has been using AI to dub speakersâ talks in different languages. This includes the video as well as the audio: Speakersâ mouths are
lip-synched
with the new words so it looks like theyâre speaking Japanese, French, or Russian. Nishat Ruiter, TEDâs general counsel, told me this is done with the speakersâ knowledge and consent.
There are also consumer-facing products for tweaking videos with AI. If your face doesnât look right, for example, you can try a face-enhancer such as
Facetune
, or ditch your mug entirely with a face-swapper such as
Facewow
. With Runwayâs
Aleph
, you can change the colors of objects, or turn sunshine into a snowstorm.
Then there are tools that generate new videos based on an image you provide. Google
encourages Gemini users
to animate their âfavorite photos.â The result is a clip that extrapolates eight seconds of movement from an initial image, making a person dance, cook, or
swing a golf club
. These are often both amazing and creepy. âTalking head generationââfor
employee-orientation videos
, for exampleâis also advancing.
Vidnoz AI
promises to generate âRealistic AI Spokespersons of Any Style.â A company called
Arcads
will generate a complete advertisement, with actors and voiceover. ByteDance, the company that operates TikTok, offers a similar product called Symphony Creative Studio. Other applications of AI video generation include
virtual try-on of clothes
,
generating custom video games
, and animating
cartoon characters and people
.
Some companies are both working with AI and simultaneously fighting to defend their content from being pilfered by AI companies. This reflects the Wild West mentality in AI right nowâcompanies exploiting legal gray areas to see how they can profit. As I investigated these data sets, I learned about an incident involving TEDâagain, one of the most-pilfered organizations in the data sets captured here, and one that is attempting to employ AI to advance its own business. In June, the Cannes Lions international advertising festival gave one of its Grand Prix awards to an ad that included deepfaked footage from a TED talk by DeAndrea Salvador, currently a state senator in North Carolina. The ad agency, DM9, âused AI cloning to change her talk and repurposed it for a commercial ad campaign,â Ruiter told me on a video call recently. When the manipulation was discovered, the Cannes Lions festival
withdrew the award
. Last month, Salvador
sued
DM9 along with its clientsâWhirlpool and Consulâfor misappropriation of her likeness, among other things. DM9 apologized for the incident and
cited
âa series of failures in the production and sendingâ of the ad. A spokesperson from Whirlpool told me the company was unaware the senatorâs remarks had been altered.
Others in the film industry have filed lawsuits against AI companies for training with their content. In June, Disney and Universal sued Midjourney, the maker of an image-generating tool that can produce images containing recognizable characters (Warner Brothers
joined
the lawsuit last week). The lawsuit called Midjourney a âbottomless pit of plagiarism.â The following month, two adult-film companies sued Meta for downloading (and distributing through BitTorrent) more than 2,000 of their videos. Neither Midjourney nor Meta has responded to the allegations, and neither responded to my request for comment. One YouTuber filed their own lawsuit: In August of last year,
David Millette sued Nvidia
for unjust enrichment and unfair competition with regard to the training of its
Cosmos AI
, but the case was voluntarily dismissed months later.
The Disney characters and the deepfaked Salvador ad are just two instances of how these tools can be damaging. The floodgates may soon be opening further. Thanks to the enormous amount of investment in the technology, generated videos are beginning to appear everywhere. One company, DeepBrain AI,
pays âcreatorsâ
to post AI-generated videos made with its tools on YouTube. It currently offers $500 for a video that gets 10,000 views, a relatively low threshold. Companies that run social-media platforms, such as Google and Meta, also pay users for content, through ad-revenue sharing, and many directly
encourage
the posting of AI-generated content. Not surprisingly, a coterie of
gurus
has arrived to teach the secrets of making money with AI-generated content.
Google
and
Meta
have also trained AI tools on large quantities of videos from their own platforms: Google has taken
at least 70 million
clips from YouTube, and Meta has taken more than
65 million clips from Instagram
. If these companies succeed in flooding their platforms with synthetic videos, human creators could be left with the unenviable task of competing with machines that churn out endless content based on their original work. And social media will become even less social than it is.
I asked Peters if he knew his videos had been taken from YouTube to train AI. He said he didnât, but he wasnât surprised. âI think everythingâs gonna get stolen,â he told me. But he didnât know what to do about it. âDo I quit, or do I just keep making videos and hope people want to connect with a person?â | |||||||||||||||||||||||||||||||||
| Markdown | [Skip to content](https://www.theatlantic.com/technology/archive/2025/09/youtube-ai-training-data-sets/684116/#main-content)
## Site Navigation
- [Popular](https://www.theatlantic.com/most-popular/)[Latest](https://www.theatlantic.com/latest/)[Newsletters](https://www.theatlantic.com/newsletters/)
## Sections
- [Ideas](https://www.theatlantic.com/ideas/)
- [Politics](https://www.theatlantic.com/politics/)
- [Economy](https://www.theatlantic.com/economy/)
- [Global](https://www.theatlantic.com/international/)
- [National Security](https://www.theatlantic.com/national-security/)
- [Washington Week](https://www.theatlantic.com/category/washington-week-atlantic/)
- [Features](https://www.theatlantic.com/category/features/)
- [Technology](https://www.theatlantic.com/technology/)
- [AI Watchdog](https://www.theatlantic.com/category/ai-watchdog/)
- [Science](https://www.theatlantic.com/science/)
- [Planet](https://www.theatlantic.com/projects/planet/)
- [Health](https://www.theatlantic.com/health/)
- [Philosophy](https://www.theatlantic.com/category/philosophy/)
- [Education](https://www.theatlantic.com/education/)
- [Culture](https://www.theatlantic.com/culture/)
- [Comedy](https://www.theatlantic.com/category/comedy/)
- [Family](https://www.theatlantic.com/family/)
- [Books](https://www.theatlantic.com/books/)
- [Fiction](https://www.theatlantic.com/category/fiction/)
- [Photography](https://www.theatlantic.com/photo/)
- [Events](https://www.theatlantic.com/atlantic-across-america/)
- [Explore The Atlantic Archive](https://www.theatlantic.com/archive/)
- [Play The Atlantic Games](https://www.theatlantic.com/games/)
- [Listen to Podcasts and Articles](https://www.theatlantic.com/audio/)
## The Print Edition
[](https://www.theatlantic.com/magazine/)
[Latest Issue](https://www.theatlantic.com/magazine/)[Past Issues](https://www.theatlantic.com/magazine/backissues/)
***
[Give a Gift](https://accounts.theatlantic.com/products/gift)
- [Popular](https://www.theatlantic.com/most-popular/)
- [Latest](https://www.theatlantic.com/latest/)
- [Newsletters](https://www.theatlantic.com/newsletters/)
- [Sign In](https://accounts.theatlantic.com/login/)
- [Subscribe](https://www.theatlantic.com/subscribe/navbar/)
[Technology](https://www.theatlantic.com/technology/)
# AI Is Coming for YouTube Creators
At least 15 million videos have been snatched by tech companies.
By [Alex Reisner](https://www.theatlantic.com/author/alex-reisner/)

Illustration by Matteo Giuseppe Pani / The Atlantic
September 10, 2025
Share
Save

Listen
â
1\.0x
\+
Seek
0:0014:50
*Editorâs note: This analysis is part of* The Atlantic*âs investigation into how YouTube videos are taken to train AI tools. You can use the search tool directly [here](https://www.theatlantic.com/technology/archive/2025/09/search-youtube-videos-generative-ai/684158/), to see whether videos youâve created or watched are included in the data sets. This work is part of [AI Watchdog](https://www.theatlantic.com/category/ai-watchdog/),* The Atlantic*âs ongoing investigation into the generative-AI industry.*
***
When Jon Peters uploaded his first video to YouTube in 2010, he had no idea where it would lead. He was a professional woodworker running a small business who decided to film himself making a dining table with some old legs he had found in a barn. It turned out that people liked his candid style, and as he posted more videos, a fan base began to grow. âAll of a sudden thereâs people who appreciate the work Iâm doing,â he told me. âThe comments were a motivator.â Fifteen years later, his channel has more than 1 million subscribers. Sometimes he gets photos of people in their shops, following his guidance from a big TV on the wallâmost of his viewers, Peters told me, are woodworkers looking to him for instruction.
But [Petersâs channel](https://www.youtube.com/@JonPetersArtHome) could soon be obsolete, along with millions of other videos created by people who share their expertise and advice on YouTube. Over the past few months, Iâve discovered more than 15.8 million videos from more than 2 million channels that tech companies have, without permission, downloaded to train AI products. Nearly 1 million of them, by my count, are how-to videos. You can find these videos in at least 13 different data sets distributed by AI developers at tech companies, universities, and research organizations, through websites such as Hugging Face, an online AI-development hub.
In most cases the videos are anonymized, meaning that titles and creator names are not included. I was able to identify the videos by extracting unique identifiers from the data sets and looking them up on YouTubeâsimilar to the process I followed when I revealed the contents of the [Books3](https://www.theatlantic.com/technology/archive/2023/09/books3-database-generative-ai-training-copyright-infringement/675363/), [OpenSubtitles](https://www.theatlantic.com/technology/archive/2024/11/opensubtitles-ai-data-set/680650/), and [LibGen](https://www.theatlantic.com/technology/archive/2025/03/libgen-meta-openai/682093/) data sets. You can search the data sets using the tool below, typing in channel names like âMrBeastâ or âJames Charles,â for example.
(*A note for users: Just because a video appears in these data sets does not mean it was used for training by AI companies, which could choose to omit certain videos when developing their products.*)
To create AI products capable of generating video, developers need huge quantities of videos, and YouTube has become a common source. Although YouTube does offer paying subscribers the ability to download videos and watch them through the companyâs app whenever theyâd like, this is something different: Video files are being ripped from YouTube en masse and saved in files that are then fed to AI algorithms. This kind of downloading [violates the platformâs terms of service](https://www.bloomberg.com/news/articles/2024-04-04/youtube-says-openai-training-sora-with-its-videos-would-break-the-rules), but many tools allow AI developers to download videos in this way. YouTube appears to have done little, if anything, to stop the mass downloading, and the company did not respond to my request for comment.
Not all YouTube videos are copyrighted (and some are uploaded by people who donât own the copyrights), but many are. Unauthorized copying or distribution of those videos is illegal, but whether AI training constitutes a form of copying or distribution is still a question being debated in many ongoing lawsuits. Tech companies have argued that training is a âfair useâ of copyrighted work, and some [judges have disagreed](https://www.theatlantic.com/technology/archive/2025/07/anthropic-meta-ai-rulings/683526/) in their responses. How the courts ultimately apply the law to this novel technology could have massive consequences for creatorsâ motivations to post their work on YouTube and similar platformsâif tech companies are able to continue taking creatorsâ work to build AI products that compete with them, then creators may have little choice but to stop sharing.
Generative-AI tools are already producing videos that compete with human-made work on YouTube. AI-generated history videos with hundreds of thousands of views and many inaccuracies [are drowning out](https://www.404media.co/ai-generated-boring-history-videos-are-flooding-youtube-and-drowning-out-real-history/) fact-checked, expert-produced content. Popular music-remix videos are frequently created [using this technology](https://www.youtube.com/watch?v=eIahbtBz6Uo), and many of them perform better than human-made videos.
The problem extends far beyond YouTube, however. Most modern chatbots are âmultimodal,â meaning they can respond to a question by creating relevant media. Googleâs Gemini chatbot, for instance, will produce short clips for paying users. Soon, you may be able to ask ChatGPT or another generative-AI tool about how to build a table from found legs and get a custom how-to video in response. Even if that response isnât as good as any video Peters would make, it will be immediate, and it will be tailor-made to your specifications. The online-publishing business has already been [decimated by text-generation tools](https://www.theatlantic.com/technology/archive/2025/06/generative-ai-pirated-articles-books/683009/); video creators should expect similar challenges from generative-AI tools in the near future.
Many major tech companies have used these data sets to train AI, according to research papers Iâve read and AI developers Iâve spoken with. The group includes Microsoft, Meta, Amazon, Nvidia, Runway, ByteDance, Snap, and Tencent. I reached out to each of these companies to ask about their use of these data sets. Only Meta, Amazon, and Nvidia responded. All three said they ârespectâ content creators and believe that their use of the work is legal under existing copyright law. Amazon also shared that, where video is concerned, it is currently focused on developing ways to generate âcompelling, high-quality advertisements from simple prompts.â
We canât be certain whether all these companies will use the videos to create for-profit video-generating tools. Some of the work theyâve done may be simply experimental. But a few of these companies have an obvious interest in pursuing commercial products: Meta, for instance, is developing a suite of tools called [Movie Gen](https://ai.meta.com/research/movie-gen/) that creates videos from text prompts, and Snap offers [âAI Video Lensesâ](http://theverge.com/news/628354/snap-snapchat-ai-video-lenses) that allow users to augment their videos with generative AI. Videos such as the ones in these data sets are the raw material for products like these; much as ChatGPT couldnât write like Shakespeare without first âreadingâ Shakespeare, a video generator couldnât construct a fake newscast without âwatchingâ tons of recorded broadcasts. In fact, a large number of the videos in these data sets are from news and educational channels, such as the BBC (which has at least 33,000 videos in the data sets, across its various brands) and TED (nearly 50,000). Hundreds of thousands of othersâif not moreâare from individual creators, such as Peters.
AI companies are more interested in some videos than others. A spreadsheet leaked [to *404 Media*](https://www.404media.co/runway-ai-image-generator-training-data-youtube/) by a former employee at Runway, which builds AI video-generation tools, shows what the company valued about certain channels: âhigh camera movement,â âbeautiful cinematic landscapes,â âhigh quality scenes from movies,â âsuper high quality sci-fi short films.â One channel was labeled âTHE HOLY GRAIL OF CAR CINEMATICS SO FARâ; another was labeled âonly 4 videos but they are really well done.â
Developers seek out high-quality videos in a variety of ways. Curators of two of the data sets collected hereâHowTo100M and HD-VILA-100Mâprioritized videos with high view counts on YouTube, equating popularity with quality. The creators of another data set, HD-VG-130M, [noted](https://arxiv.org/pdf/2305.10874) that âhigh view count does not guarantee video quality,â and used an AI model to select videos of high âaesthetic quality.â Data-set creators often try to avoid videos that contain overlaid text, such as subtitles and logos, so these identifying features donât appear in videos generated by their model. So, some advice for YouTubers: Putting a watermark or logo on your videos, even a small one, makes them less desirable for training.
To prepare the videos for training, developers split the footage into short clips, in many cases cutting wherever there is a scene or camera change. Each clip is then given an English-language description of the visual scene so the model can be trained to correlate words with moving images, and to generate videos from text prompts. AI developers have a few methods of writing these captions. One way is to pay workers to do it. Another is to use separate AI models to generate a description automatically. The latter is more common, because of its lower cost.
AI video tools arenât yet as mainstream as chatbots or image generators, but they are already in wide use. You may already have seen AI-manipulated video without realizing it. For example, TED has been using AI to dub speakersâ talks in different languages. This includes the video as well as the audio: Speakersâ mouths are [lip-synched](https://blog.ted.com/announcing-ai-adapted-multilingual-ted-talks/) with the new words so it looks like theyâre speaking Japanese, French, or Russian. Nishat Ruiter, TEDâs general counsel, told me this is done with the speakersâ knowledge and consent.
There are also consumer-facing products for tweaking videos with AI. If your face doesnât look right, for example, you can try a face-enhancer such as [Facetune](https://www.facetuneapp.com/create/video-face-editor), or ditch your mug entirely with a face-swapper such as [Facewow](https://facewow.ai/face-swap/video/). With Runwayâs [Aleph](https://runwayml.com/research/introducing-runway-aleph), you can change the colors of objects, or turn sunshine into a snowstorm.
Then there are tools that generate new videos based on an image you provide. Google [encourages Gemini users](https://blog.google/products/gemini/photo-to-video/) to animate their âfavorite photos.â The result is a clip that extrapolates eight seconds of movement from an initial image, making a person dance, cook, or [swing a golf club](https://chromeunboxed.com/i-just-tried-geminis-new-photo-to-video-feature-and-im-blown-away/). These are often both amazing and creepy. âTalking head generationââfor [employee-orientation videos](https://www.youtube.com/watch?v=2nzdDQr_LqA), for exampleâis also advancing. [Vidnoz AI](https://www.vidnoz.com/) promises to generate âRealistic AI Spokespersons of Any Style.â A company called [Arcads](https://www.arcads.ai/) will generate a complete advertisement, with actors and voiceover. ByteDance, the company that operates TikTok, offers a similar product called Symphony Creative Studio. Other applications of AI video generation include [virtual try-on of clothes](https://github.com/showlab/Awesome-Video-Diffusion?tab=readme-ov-file#virtual-try-on), [generating custom video games](https://github.com/showlab/Awesome-Video-Diffusion?tab=readme-ov-file#game-generation), and animating [cartoon characters and people](https://shiyi-zh0408.github.io/projectpages/FlexiAct/).
Some companies are both working with AI and simultaneously fighting to defend their content from being pilfered by AI companies. This reflects the Wild West mentality in AI right nowâcompanies exploiting legal gray areas to see how they can profit. As I investigated these data sets, I learned about an incident involving TEDâagain, one of the most-pilfered organizations in the data sets captured here, and one that is attempting to employ AI to advance its own business. In June, the Cannes Lions international advertising festival gave one of its Grand Prix awards to an ad that included deepfaked footage from a TED talk by DeAndrea Salvador, currently a state senator in North Carolina. The ad agency, DM9, âused AI cloning to change her talk and repurposed it for a commercial ad campaign,â Ruiter told me on a video call recently. When the manipulation was discovered, the Cannes Lions festival [withdrew the award](https://www.canneslions.com/news/cannes-lions-statement-dm9-entries-into-cannes-lions-2025). Last month, Salvador [sued](https://www.courthousenews.com/wp-content/uploads/2025/08/deandrea-salvador-whirlpool-complaint.pdf) DM9 along with its clientsâWhirlpool and Consulâfor misappropriation of her likeness, among other things. DM9 apologized for the incident and [cited](https://www.linkedin.com/posts/dm9_nota-de-esclarecimento-na-semana-passada-activity-7343346894644875264-sHmV/?rcm=ACoAAABPu1wBY1EaJ5vQb_gdSm1BybbXG1_20hE) âa series of failures in the production and sendingâ of the ad. A spokesperson from Whirlpool told me the company was unaware the senatorâs remarks had been altered.
Others in the film industry have filed lawsuits against AI companies for training with their content. In June, Disney and Universal sued Midjourney, the maker of an image-generating tool that can produce images containing recognizable characters (Warner Brothers [joined](https://apnews.com/article/warner-bros-midjourney-ai-copyright-lawsuit-dc-studios-b87d80d7b4a4dfdcf0ee149d30830551) the lawsuit last week). The lawsuit called Midjourney a âbottomless pit of plagiarism.â The following month, two adult-film companies sued Meta for downloading (and distributing through BitTorrent) more than 2,000 of their videos. Neither Midjourney nor Meta has responded to the allegations, and neither responded to my request for comment. One YouTuber filed their own lawsuit: In August of last year, [David Millette sued Nvidia](https://www.courtlistener.com/docket/69045427/millette-v-nvidia-corporation/) for unjust enrichment and unfair competition with regard to the training of its [Cosmos AI](https://www.nvidia.com/en-us/ai/cosmos/), but the case was voluntarily dismissed months later.
The Disney characters and the deepfaked Salvador ad are just two instances of how these tools can be damaging. The floodgates may soon be opening further. Thanks to the enormous amount of investment in the technology, generated videos are beginning to appear everywhere. One company, DeepBrain AI, [pays âcreatorsâ](https://www.aistudios.com/promotion/creator-join) to post AI-generated videos made with its tools on YouTube. It currently offers \$500 for a video that gets 10,000 views, a relatively low threshold. Companies that run social-media platforms, such as Google and Meta, also pay users for content, through ad-revenue sharing, and many directly [encourage](https://blog.youtube/news-and-events/new-shorts-creation-tools-2025/) the posting of AI-generated content. Not surprisingly, a coterie of [gurus](https://www.youtube.com/watch?v=TWpg1RmzAbc) has arrived to teach the secrets of making money with AI-generated content.
[Google](https://arxiv.org/abs/2007.14937) and [Meta](https://arxiv.org/abs/1905.00561) have also trained AI tools on large quantities of videos from their own platforms: Google has taken [at least 70 million](https://arxiv.org/abs/2007.14937) clips from YouTube, and Meta has taken more than [65 million clips from Instagram](https://arxiv.org/abs/1905.00561). If these companies succeed in flooding their platforms with synthetic videos, human creators could be left with the unenviable task of competing with machines that churn out endless content based on their original work. And social media will become even less social than it is.
I asked Peters if he knew his videos had been taken from YouTube to train AI. He said he didnât, but he wasnât surprised. âI think everythingâs gonna get stolen,â he told me. But he didnât know what to do about it. âDo I quit, or do I just keep making videos and hope people want to connect with a person?â
### About the Author
[Alex Reisner](https://www.theatlantic.com/author/alex-reisner/)
[Alex Reisner](https://www.theatlantic.com/author/alex-reisner/) is a staff writer at *The Atlantic.*
Explore More Topics
[YouTube](https://www.theatlantic.com/tag/product/youtube/)
## Popular Links
- ### About
- [Our History](https://www.theatlantic.com/history/)
- [Careers](https://www.theatlantic.com/jobs/)
- ### Contact
- [Help Center](https://support.theatlantic.com/)
- [Contact Us](https://www.theatlantic.com/contact/)
- [Atlantic Brand Partners](https://atlanticbrandpartners.com/)
- [Press](https://www.theatlantic.com/press-releases/)
- [Reprints & Permissions](https://support.theatlantic.com/hc/en-us/articles/360011460753-Permissions-to-reprint-or-reproduce-content-from-The-Atlantic)
- ### Podcasts
- [Radio Atlantic](https://www.theatlantic.com/podcasts/radio-atlantic/)
- [The David Frum Show](https://www.theatlantic.com/podcasts/the-david-frum-show/)
- [Galaxy Brain](https://www.theatlantic.com/podcasts/galaxy-brain/)
- [Autocracy in America](https://www.theatlantic.com/podcasts/autocracy-in-america/)
- [How to Age Up](https://www.theatlantic.com/podcasts/how-to-build-a-happy-life/)
- ### Subscription
- [Purchase](https://www.theatlantic.com/subscribe/footer-cover/)
- [Give a Gift](https://www.theatlantic.com/subscribe/footer-gift/)
- [Manage Subscription](https://accounts.theatlantic.com/)
- [Group Subscriptions](https://www.theatlantic.com/group-subscriptions/)
- [Atlantic Editions](https://www.theatlantic.com/atlantic-editions/)
- [Newsletters](https://www.theatlantic.com/newsletters/)
- ### Follow
### About
- [Our History](https://www.theatlantic.com/history/)
- [Careers](https://www.theatlantic.com/jobs/)
### Contact
- [Help Center](https://support.theatlantic.com/)
- [Contact Us](https://www.theatlantic.com/contact/)
- [Atlantic Brand Partners](https://atlanticbrandpartners.com/)
- [Press](https://www.theatlantic.com/press-releases/)
- [Reprints & Permissions](https://support.theatlantic.com/hc/en-us/articles/360011460753-Permissions-to-reprint-or-reproduce-content-from-The-Atlantic)
### Podcasts
- [Radio Atlantic](https://www.theatlantic.com/podcasts/radio-atlantic/)
- [The David Frum Show](https://www.theatlantic.com/podcasts/the-david-frum-show/)
- [Galaxy Brain](https://www.theatlantic.com/podcasts/galaxy-brain/)
- [Autocracy in America](https://www.theatlantic.com/podcasts/autocracy-in-america/)
- [How to Age Up](https://www.theatlantic.com/podcasts/how-to-build-a-happy-life/)
### Subscription
- [Purchase](https://www.theatlantic.com/subscribe/footer-cover/)
- [Give a Gift](https://www.theatlantic.com/subscribe/footer-gift/)
- [Manage Subscription](https://accounts.theatlantic.com/)
- [Group Subscriptions](https://www.theatlantic.com/group-subscriptions/)
- [Atlantic Editions](https://www.theatlantic.com/atlantic-editions/)
- [Newsletters](https://www.theatlantic.com/newsletters/)
### Follow
## Site Information
- [Privacy Policy](https://www.theatlantic.com/privacy-policy/)
- [Your Privacy Choices](https://www.theatlantic.com/do-not-sell-my-personal-information/)
- [Advertising Guidelines](https://www.theatlantic.com/advertising-guidelines/)
- [Terms & Conditions](https://www.theatlantic.com/terms-and-conditions/)
- [Terms of Sale](https://www.theatlantic.com/terms-of-sale/)
- [Responsible Disclosure](https://www.theatlantic.com/responsible-disclosure-policy/)
- [Site Map](https://www.theatlantic.com/site-map/)
TheAtlantic.com Š 2026 The Atlantic Monthly Group. All Rights Reserved.
This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply | |||||||||||||||||||||||||||||||||
| Readable Markdown | *Editorâs note: This analysis is part of* The Atlantic*âs investigation into how YouTube videos are taken to train AI tools. You can use the search tool directly [here](https://www.theatlantic.com/technology/archive/2025/09/search-youtube-videos-generative-ai/684158/), to see whether videos youâve created or watched are included in the data sets. This work is part of [AI Watchdog](https://www.theatlantic.com/category/ai-watchdog/),* The Atlantic*âs ongoing investigation into the generative-AI industry.*
***
When Jon Peters uploaded his first video to YouTube in 2010, he had no idea where it would lead. He was a professional woodworker running a small business who decided to film himself making a dining table with some old legs he had found in a barn. It turned out that people liked his candid style, and as he posted more videos, a fan base began to grow. âAll of a sudden thereâs people who appreciate the work Iâm doing,â he told me. âThe comments were a motivator.â Fifteen years later, his channel has more than 1 million subscribers. Sometimes he gets photos of people in their shops, following his guidance from a big TV on the wallâmost of his viewers, Peters told me, are woodworkers looking to him for instruction.
But [Petersâs channel](https://www.youtube.com/@JonPetersArtHome) could soon be obsolete, along with millions of other videos created by people who share their expertise and advice on YouTube. Over the past few months, Iâve discovered more than 15.8 million videos from more than 2 million channels that tech companies have, without permission, downloaded to train AI products. Nearly 1 million of them, by my count, are how-to videos. You can find these videos in at least 13 different data sets distributed by AI developers at tech companies, universities, and research organizations, through websites such as Hugging Face, an online AI-development hub.
In most cases the videos are anonymized, meaning that titles and creator names are not included. I was able to identify the videos by extracting unique identifiers from the data sets and looking them up on YouTubeâsimilar to the process I followed when I revealed the contents of the [Books3](https://www.theatlantic.com/technology/archive/2023/09/books3-database-generative-ai-training-copyright-infringement/675363/), [OpenSubtitles](https://www.theatlantic.com/technology/archive/2024/11/opensubtitles-ai-data-set/680650/), and [LibGen](https://www.theatlantic.com/technology/archive/2025/03/libgen-meta-openai/682093/) data sets. You can search the data sets using the tool below, typing in channel names like âMrBeastâ or âJames Charles,â for example.
(*A note for users: Just because a video appears in these data sets does not mean it was used for training by AI companies, which could choose to omit certain videos when developing their products.*)
To create AI products capable of generating video, developers need huge quantities of videos, and YouTube has become a common source. Although YouTube does offer paying subscribers the ability to download videos and watch them through the companyâs app whenever theyâd like, this is something different: Video files are being ripped from YouTube en masse and saved in files that are then fed to AI algorithms. This kind of downloading [violates the platformâs terms of service](https://www.bloomberg.com/news/articles/2024-04-04/youtube-says-openai-training-sora-with-its-videos-would-break-the-rules), but many tools allow AI developers to download videos in this way. YouTube appears to have done little, if anything, to stop the mass downloading, and the company did not respond to my request for comment.
Not all YouTube videos are copyrighted (and some are uploaded by people who donât own the copyrights), but many are. Unauthorized copying or distribution of those videos is illegal, but whether AI training constitutes a form of copying or distribution is still a question being debated in many ongoing lawsuits. Tech companies have argued that training is a âfair useâ of copyrighted work, and some [judges have disagreed](https://www.theatlantic.com/technology/archive/2025/07/anthropic-meta-ai-rulings/683526/) in their responses. How the courts ultimately apply the law to this novel technology could have massive consequences for creatorsâ motivations to post their work on YouTube and similar platformsâif tech companies are able to continue taking creatorsâ work to build AI products that compete with them, then creators may have little choice but to stop sharing.
Generative-AI tools are already producing videos that compete with human-made work on YouTube. AI-generated history videos with hundreds of thousands of views and many inaccuracies [are drowning out](https://www.404media.co/ai-generated-boring-history-videos-are-flooding-youtube-and-drowning-out-real-history/) fact-checked, expert-produced content. Popular music-remix videos are frequently created [using this technology](https://www.youtube.com/watch?v=eIahbtBz6Uo), and many of them perform better than human-made videos.
The problem extends far beyond YouTube, however. Most modern chatbots are âmultimodal,â meaning they can respond to a question by creating relevant media. Googleâs Gemini chatbot, for instance, will produce short clips for paying users. Soon, you may be able to ask ChatGPT or another generative-AI tool about how to build a table from found legs and get a custom how-to video in response. Even if that response isnât as good as any video Peters would make, it will be immediate, and it will be tailor-made to your specifications. The online-publishing business has already been [decimated by text-generation tools](https://www.theatlantic.com/technology/archive/2025/06/generative-ai-pirated-articles-books/683009/); video creators should expect similar challenges from generative-AI tools in the near future.
Many major tech companies have used these data sets to train AI, according to research papers Iâve read and AI developers Iâve spoken with. The group includes Microsoft, Meta, Amazon, Nvidia, Runway, ByteDance, Snap, and Tencent. I reached out to each of these companies to ask about their use of these data sets. Only Meta, Amazon, and Nvidia responded. All three said they ârespectâ content creators and believe that their use of the work is legal under existing copyright law. Amazon also shared that, where video is concerned, it is currently focused on developing ways to generate âcompelling, high-quality advertisements from simple prompts.â
We canât be certain whether all these companies will use the videos to create for-profit video-generating tools. Some of the work theyâve done may be simply experimental. But a few of these companies have an obvious interest in pursuing commercial products: Meta, for instance, is developing a suite of tools called [Movie Gen](https://ai.meta.com/research/movie-gen/) that creates videos from text prompts, and Snap offers [âAI Video Lensesâ](http://theverge.com/news/628354/snap-snapchat-ai-video-lenses) that allow users to augment their videos with generative AI. Videos such as the ones in these data sets are the raw material for products like these; much as ChatGPT couldnât write like Shakespeare without first âreadingâ Shakespeare, a video generator couldnât construct a fake newscast without âwatchingâ tons of recorded broadcasts. In fact, a large number of the videos in these data sets are from news and educational channels, such as the BBC (which has at least 33,000 videos in the data sets, across its various brands) and TED (nearly 50,000). Hundreds of thousands of othersâif not moreâare from individual creators, such as Peters.
AI companies are more interested in some videos than others. A spreadsheet leaked [to *404 Media*](https://www.404media.co/runway-ai-image-generator-training-data-youtube/) by a former employee at Runway, which builds AI video-generation tools, shows what the company valued about certain channels: âhigh camera movement,â âbeautiful cinematic landscapes,â âhigh quality scenes from movies,â âsuper high quality sci-fi short films.â One channel was labeled âTHE HOLY GRAIL OF CAR CINEMATICS SO FARâ; another was labeled âonly 4 videos but they are really well done.â
Developers seek out high-quality videos in a variety of ways. Curators of two of the data sets collected hereâHowTo100M and HD-VILA-100Mâprioritized videos with high view counts on YouTube, equating popularity with quality. The creators of another data set, HD-VG-130M, [noted](https://arxiv.org/pdf/2305.10874) that âhigh view count does not guarantee video quality,â and used an AI model to select videos of high âaesthetic quality.â Data-set creators often try to avoid videos that contain overlaid text, such as subtitles and logos, so these identifying features donât appear in videos generated by their model. So, some advice for YouTubers: Putting a watermark or logo on your videos, even a small one, makes them less desirable for training.
To prepare the videos for training, developers split the footage into short clips, in many cases cutting wherever there is a scene or camera change. Each clip is then given an English-language description of the visual scene so the model can be trained to correlate words with moving images, and to generate videos from text prompts. AI developers have a few methods of writing these captions. One way is to pay workers to do it. Another is to use separate AI models to generate a description automatically. The latter is more common, because of its lower cost.
AI video tools arenât yet as mainstream as chatbots or image generators, but they are already in wide use. You may already have seen AI-manipulated video without realizing it. For example, TED has been using AI to dub speakersâ talks in different languages. This includes the video as well as the audio: Speakersâ mouths are [lip-synched](https://blog.ted.com/announcing-ai-adapted-multilingual-ted-talks/) with the new words so it looks like theyâre speaking Japanese, French, or Russian. Nishat Ruiter, TEDâs general counsel, told me this is done with the speakersâ knowledge and consent.
There are also consumer-facing products for tweaking videos with AI. If your face doesnât look right, for example, you can try a face-enhancer such as [Facetune](https://www.facetuneapp.com/create/video-face-editor), or ditch your mug entirely with a face-swapper such as [Facewow](https://facewow.ai/face-swap/video/). With Runwayâs [Aleph](https://runwayml.com/research/introducing-runway-aleph), you can change the colors of objects, or turn sunshine into a snowstorm.
Then there are tools that generate new videos based on an image you provide. Google [encourages Gemini users](https://blog.google/products/gemini/photo-to-video/) to animate their âfavorite photos.â The result is a clip that extrapolates eight seconds of movement from an initial image, making a person dance, cook, or [swing a golf club](https://chromeunboxed.com/i-just-tried-geminis-new-photo-to-video-feature-and-im-blown-away/). These are often both amazing and creepy. âTalking head generationââfor [employee-orientation videos](https://www.youtube.com/watch?v=2nzdDQr_LqA), for exampleâis also advancing. [Vidnoz AI](https://www.vidnoz.com/) promises to generate âRealistic AI Spokespersons of Any Style.â A company called [Arcads](https://www.arcads.ai/) will generate a complete advertisement, with actors and voiceover. ByteDance, the company that operates TikTok, offers a similar product called Symphony Creative Studio. Other applications of AI video generation include [virtual try-on of clothes](https://github.com/showlab/Awesome-Video-Diffusion?tab=readme-ov-file#virtual-try-on), [generating custom video games](https://github.com/showlab/Awesome-Video-Diffusion?tab=readme-ov-file#game-generation), and animating [cartoon characters and people](https://shiyi-zh0408.github.io/projectpages/FlexiAct/).
Some companies are both working with AI and simultaneously fighting to defend their content from being pilfered by AI companies. This reflects the Wild West mentality in AI right nowâcompanies exploiting legal gray areas to see how they can profit. As I investigated these data sets, I learned about an incident involving TEDâagain, one of the most-pilfered organizations in the data sets captured here, and one that is attempting to employ AI to advance its own business. In June, the Cannes Lions international advertising festival gave one of its Grand Prix awards to an ad that included deepfaked footage from a TED talk by DeAndrea Salvador, currently a state senator in North Carolina. The ad agency, DM9, âused AI cloning to change her talk and repurposed it for a commercial ad campaign,â Ruiter told me on a video call recently. When the manipulation was discovered, the Cannes Lions festival [withdrew the award](https://www.canneslions.com/news/cannes-lions-statement-dm9-entries-into-cannes-lions-2025). Last month, Salvador [sued](https://www.courthousenews.com/wp-content/uploads/2025/08/deandrea-salvador-whirlpool-complaint.pdf) DM9 along with its clientsâWhirlpool and Consulâfor misappropriation of her likeness, among other things. DM9 apologized for the incident and [cited](https://www.linkedin.com/posts/dm9_nota-de-esclarecimento-na-semana-passada-activity-7343346894644875264-sHmV/?rcm=ACoAAABPu1wBY1EaJ5vQb_gdSm1BybbXG1_20hE) âa series of failures in the production and sendingâ of the ad. A spokesperson from Whirlpool told me the company was unaware the senatorâs remarks had been altered.
Others in the film industry have filed lawsuits against AI companies for training with their content. In June, Disney and Universal sued Midjourney, the maker of an image-generating tool that can produce images containing recognizable characters (Warner Brothers [joined](https://apnews.com/article/warner-bros-midjourney-ai-copyright-lawsuit-dc-studios-b87d80d7b4a4dfdcf0ee149d30830551) the lawsuit last week). The lawsuit called Midjourney a âbottomless pit of plagiarism.â The following month, two adult-film companies sued Meta for downloading (and distributing through BitTorrent) more than 2,000 of their videos. Neither Midjourney nor Meta has responded to the allegations, and neither responded to my request for comment. One YouTuber filed their own lawsuit: In August of last year, [David Millette sued Nvidia](https://www.courtlistener.com/docket/69045427/millette-v-nvidia-corporation/) for unjust enrichment and unfair competition with regard to the training of its [Cosmos AI](https://www.nvidia.com/en-us/ai/cosmos/), but the case was voluntarily dismissed months later.
The Disney characters and the deepfaked Salvador ad are just two instances of how these tools can be damaging. The floodgates may soon be opening further. Thanks to the enormous amount of investment in the technology, generated videos are beginning to appear everywhere. One company, DeepBrain AI, [pays âcreatorsâ](https://www.aistudios.com/promotion/creator-join) to post AI-generated videos made with its tools on YouTube. It currently offers \$500 for a video that gets 10,000 views, a relatively low threshold. Companies that run social-media platforms, such as Google and Meta, also pay users for content, through ad-revenue sharing, and many directly [encourage](https://blog.youtube/news-and-events/new-shorts-creation-tools-2025/) the posting of AI-generated content. Not surprisingly, a coterie of [gurus](https://www.youtube.com/watch?v=TWpg1RmzAbc) has arrived to teach the secrets of making money with AI-generated content.
[Google](https://arxiv.org/abs/2007.14937) and [Meta](https://arxiv.org/abs/1905.00561) have also trained AI tools on large quantities of videos from their own platforms: Google has taken [at least 70 million](https://arxiv.org/abs/2007.14937) clips from YouTube, and Meta has taken more than [65 million clips from Instagram](https://arxiv.org/abs/1905.00561). If these companies succeed in flooding their platforms with synthetic videos, human creators could be left with the unenviable task of competing with machines that churn out endless content based on their original work. And social media will become even less social than it is.
I asked Peters if he knew his videos had been taken from YouTube to train AI. He said he didnât, but he wasnât surprised. âI think everythingâs gonna get stolen,â he told me. But he didnât know what to do about it. âDo I quit, or do I just keep making videos and hope people want to connect with a person?â | |||||||||||||||||||||||||||||||||
| ML Classification | ||||||||||||||||||||||||||||||||||
| ML Categories |
Raw JSON{
"/Internet_and_Telecom": 743,
"/Internet_and_Telecom/Web_Services": 656,
"/Science": 445,
"/Science/Computer_Science": 441,
"/Science/Computer_Science/Machine_Learning_and_Artificial_Intelligence": 440,
"/Internet_and_Telecom/Web_Services/Search_Engine_Optimization_and_Marketing": 276,
"/Law_and_Government": 238,
"/Law_and_Government/Legal": 231,
"/News": 128,
"/News/Technology_News": 125,
"/Law_and_Government/Legal/Intellectual_Property": 120
} | |||||||||||||||||||||||||||||||||
| ML Page Types |
Raw JSON{
"/Article": 999,
"/Article/News_Update": 904
} | |||||||||||||||||||||||||||||||||
| ML Intent Types |
Raw JSON{
"Informational": 999
} | |||||||||||||||||||||||||||||||||
| Content Metadata | ||||||||||||||||||||||||||||||||||
| Language | en | |||||||||||||||||||||||||||||||||
| Author | Alex Reisner | |||||||||||||||||||||||||||||||||
| Publish Time | 2025-09-10 14:59:00 (7 months ago) | |||||||||||||||||||||||||||||||||
| Original Publish Time | 2025-09-01 00:00:00 (7 months ago) | |||||||||||||||||||||||||||||||||
| Republished | No | |||||||||||||||||||||||||||||||||
| Word Count (Total) | 2,535 | |||||||||||||||||||||||||||||||||
| Word Count (Content) | 2,234 | |||||||||||||||||||||||||||||||||
| Links | ||||||||||||||||||||||||||||||||||
| External Links | 43 | |||||||||||||||||||||||||||||||||
| Internal Links | 84 | |||||||||||||||||||||||||||||||||
| Technical SEO | ||||||||||||||||||||||||||||||||||
| Meta Nofollow | No | |||||||||||||||||||||||||||||||||
| Meta Noarchive | No | |||||||||||||||||||||||||||||||||
| JS Rendered | No | |||||||||||||||||||||||||||||||||
| Redirect Target | null | |||||||||||||||||||||||||||||||||
| Performance | ||||||||||||||||||||||||||||||||||
| Download Time (ms) | 50 | |||||||||||||||||||||||||||||||||
| TTFB (ms) | 48 | |||||||||||||||||||||||||||||||||
| Download Size (bytes) | 46,146 | |||||||||||||||||||||||||||||||||
| Shard | 21 (laksa) | |||||||||||||||||||||||||||||||||
| Root Hash | 13119341252700813021 | |||||||||||||||||||||||||||||||||
| Unparsed URL | com,theatlantic!www,/technology/archive/2025/09/youtube-ai-training-data-sets/684116/ s443 | |||||||||||||||||||||||||||||||||