AI Firms Under Fire for Using Unauthorized YouTube Transcripts

Date:

Major AI companies like Apple and Nvidia have come under fire for training their artificial intelligence models with YouTube content without obtaining permission from content creators. An investigation by Proof News and Wired revealed that these companies, along with Anthropic and others, have utilized a dataset called YouTube Subtitles, containing transcripts from approximately 175,000 videos across 48,000 channels, without the knowledge of the video creators.

The YouTube Subtitles dataset, developed by EleutherAI as part of a larger collection called the Pile, includes text from video subtitles, often with translations in various languages. Despite EleutherAI’s goal of democratizing access to AI development, major tech firms have been leveraging this dataset to train their models. Apple, for example, utilized the Pile to train its OpenELM AI model, while Salesforce’s AI model, released two years ago, also relied on this dataset.

The dataset includes content from a wide range of YouTube channels spanning news, education, entertainment, and popular creators like MrBeast and Marques Brownlee. Notably, some videos used in the dataset have been deleted by their creators, potentially leading to concerns about unauthorized content usage and lack of compensation.

Utilizing YouTube’s API to automatically download subtitles, the dataset collection process raises questions about compliance with YouTube’s terms of service, which explicitly prohibit automated scraping of video content. The revelation has sparked outrage among content creators, who were surprised to learn that their work was being used in AI models without consent.

While EleutherAI has not provided a comment on the matter, the ethical implications of using unauthorized content in AI development have become a point of contention. As the legal and regulatory landscape of AI continues to evolve, this discovery underscores the need for a balance between technological innovation and ethical responsibility in the industry.

See also  Apple Adopts RCS Messaging; Amazon Executes Job Cuts in Alexa Division; Google Introduces Notes Feature, India

Frequently Asked Questions (FAQs) Related to the Above News

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Advait Gupta
Advait Gupta
Advait is our expert writer and manager for the Artificial Intelligence category. His passion for AI research and its advancements drives him to deliver in-depth articles that explore the frontiers of this rapidly evolving field. Advait's articles delve into the latest breakthroughs, trends, and ethical considerations, keeping readers at the forefront of AI knowledge.

Share post:

Subscribe

Popular

More like this
Related

Obama’s Techno-Optimism Shifts as Democrats Navigate Changing Tech Landscape

Explore the evolution of tech policy from Obama's optimism to Harris's vision at the Democratic National Convention. What's next for Democrats in tech?

Tech Evolution: From Obama’s Optimism to Harris’s Vision

Explore the evolution of tech policy from Obama's optimism to Harris's vision at the Democratic National Convention. What's next for Democrats in tech?

Tonix Pharmaceuticals TNXP Shares Fall 14.61% After Q2 Earnings Report

Tonix Pharmaceuticals TNXP shares decline 14.61% post-Q2 earnings report. Evaluate investment strategy based on company updates and market dynamics.

The Future of Good Jobs: Why College Degrees are Essential through 2031

Discover the future of good jobs through 2031 and why college degrees are essential. Learn more about job projections and AI's influence.