Authors file lawsuit against OpenAI alleging use of pirated content for ChatGPT training

Date:

Title: Authors Sue OpenAI, Accusing Use of Pirated Content to Train ChatGPT

Authors Paul Tremblay and Mona Awad have taken legal action against OpenAI, the parent company of ChatGPT, by filing a class action lawsuit. The authors claim that their copyrighted works were used without permission in the training of ChatGPT, alleging copyright infringement and violations of the DMCA.

According to the plaintiffs, they never granted OpenAI permission to utilize their works, yet ChatGPT can accurately provide summaries of their writings. This suggests that the information must have been derived from somewhere. While OpenAI has not revealed the specific datasets used in training ChatGPT, an older paper references Books1 and Books2 as sources. Books1 contains approximately 63,000 titles, while Books2 comprises around 294,000 titles.

However, Tremblay and Awad argue that legitimate databases with such extensive collections of books do not exist. They believe OpenAI likely resorted to using pirated resources from shadow library websites like Library Genesis (LibGen), Z-Library (Bok), Sci-Hub, and Bibliotik. These websites are infamous for aggregating books available for bulk download through torrent systems.

The complaint states, Indeed, when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works – something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works.

Based on these allegations, the complaint claims that OpenAI has infringed upon copyright laws. The plaintiffs are seeking statutory damages, which could amount to $150,000 per work. They are also considering additional damages related to the alleged removal of copyright management information, which would violate the DMCA.

See also  Exploring the Mystery of ChatGPT

This lawsuit stands out because it highlights the accusation that OpenAI used pirate websites for training data. Notably, Z-Library, a shadow library that houses millions of pirated books, is currently facing criminal prosecution by the U.S. Department of Justice.

The resolution of copyright-related issues in the realm of AI remains uncertain. Governments worldwide are adopting different approaches, with the U.S. Congress taking a cautious stance. However, rights holders are actively pursuing their interests and are unlikely to remain passive.

Although there is no direct evidence implicating OpenAI in the use of pirate sites for training ChatGPT, it is known that some AI projects have utilized pirated material in the past. Instances have been reported where AI models developed by Google and Facebook were trained on the C4 dataset, which included Z-Library and other pirate sites, as highlighted by a comprehensive summary from Search Engine Journal.

This lawsuit is expected to garner significant attention from both AI enthusiasts and rights holders. The outcome could potentially compel OpenAI to disclose aspects of its training data, which would be of great interest in its own right.

Even if it is established that ChatGPT was indeed trained using pirated books, the court would still need to determine whether such usage constitutes copyright infringement. Some experts argue that this type of AI training could fall under fair use.

Fair use protects transformative applications of copyrighted works that do not directly compete with the original content. Several experts believe this defense may apply to AI training scenarios.

The outcome of this lawsuit will undoubtedly shape the future landscape of AI and copyright law, carrying significant implications for both technology developers and content creators.

See also  OpenAI Agrees to Data Privacy Regulations in Italy, Lifts Ban on ChatGPT

Frequently Asked Questions (FAQs) Related to the Above News

Who has filed a lawsuit against OpenAI?

Authors Paul Tremblay and Mona Awad have filed a class action lawsuit against OpenAI.

What are the authors claiming in their lawsuit?

The authors claim that their copyrighted works were used without permission in the training of ChatGPT, alleging copyright infringement and violations of the DMCA.

Where do the authors believe OpenAI sourced the copyrighted content from?

The authors believe that OpenAI likely used pirated resources from shadow library websites such as Library Genesis (LibGen), Z-Library (Bok), Sci-Hub, and Bibliotik.

What datasets did OpenAI reference in an older paper?

OpenAI referenced Books1 and Books2 as sources in an older paper. Books1 contains approximately 63,000 titles, while Books2 comprises around 294,000 titles.

What damages are the plaintiffs seeking in the lawsuit?

The plaintiffs are seeking statutory damages, which could amount to $150,000 per work, and they are also considering additional damages related to the alleged removal of copyright management information.

Why is this lawsuit significant?

This lawsuit is significant because it raises the accusation that OpenAI used pirate websites for training data. It also comes at a time when shadow libraries like Z-Library are facing criminal prosecution.

Has it been proven that OpenAI used pirate sites for training ChatGPT?

There is currently no direct evidence implicating OpenAI in the use of pirate sites for training ChatGPT, but past incidents have shown that some AI projects have used pirated materials.

How might the outcome of this lawsuit impact the future of AI and copyright law?

The outcome of this lawsuit could shape the future landscape of AI and copyright law, holding implications for both technology developers and content creators. It could lead to further discussions on the fair use defense and the disclosure of training data.

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Aniket Patel
Aniket Patel
Aniket is a skilled writer at ChatGPT Global News, contributing to the ChatGPT News category. With a passion for exploring the diverse applications of ChatGPT, Aniket brings informative and engaging content to our readers. His articles cover a wide range of topics, showcasing the versatility and impact of ChatGPT in various domains.

Share post:

Subscribe

Popular

More like this
Related

Obama’s Techno-Optimism Shifts as Democrats Navigate Changing Tech Landscape

Explore the evolution of tech policy from Obama's optimism to Harris's vision at the Democratic National Convention. What's next for Democrats in tech?

Tech Evolution: From Obama’s Optimism to Harris’s Vision

Explore the evolution of tech policy from Obama's optimism to Harris's vision at the Democratic National Convention. What's next for Democrats in tech?

Tonix Pharmaceuticals TNXP Shares Fall 14.61% After Q2 Earnings Report

Tonix Pharmaceuticals TNXP shares decline 14.61% post-Q2 earnings report. Evaluate investment strategy based on company updates and market dynamics.

The Future of Good Jobs: Why College Degrees are Essential through 2031

Discover the future of good jobs through 2031 and why college degrees are essential. Learn more about job projections and AI's influence.