Authors file lawsuit against OpenAI alleging use of pirated content for ChatGPT training

Title: Authors Sue OpenAI, Accusing Use of Pirated Content to Train ChatGPT

Authors Paul Tremblay and Mona Awad have taken legal action against OpenAI, the parent company of ChatGPT, by filing a class action lawsuit. The authors claim that their copyrighted works were used without permission in the training of ChatGPT, alleging copyright infringement and violations of the DMCA.

According to the plaintiffs, they never granted OpenAI permission to utilize their works, yet ChatGPT can accurately provide summaries of their writings. This suggests that the information must have been derived from somewhere. While OpenAI has not revealed the specific datasets used in training ChatGPT, an older paper references Books1 and Books2 as sources. Books1 contains approximately 63,000 titles, while Books2 comprises around 294,000 titles.

However, Tremblay and Awad argue that legitimate databases with such extensive collections of books do not exist. They believe OpenAI likely resorted to using pirated resources from shadow library websites like Library Genesis (LibGen), Z-Library (Bok), Sci-Hub, and Bibliotik. These websites are infamous for aggregating books available for bulk download through torrent systems.

The complaint states, Indeed, when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works – something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works.

Based on these allegations, the complaint claims that OpenAI has infringed upon copyright laws. The plaintiffs are seeking statutory damages, which could amount to $150,000 per work. They are also considering additional damages related to the alleged removal of copyright management information, which would violate the DMCA.

This lawsuit stands out because it highlights the accusation that OpenAI used pirate websites for training data. Notably, Z-Library, a shadow library that houses millions of pirated books, is currently facing criminal prosecution by the U.S. Department of Justice.

The resolution of copyright-related issues in the realm of AI remains uncertain. Governments worldwide are adopting different approaches, with the U.S. Congress taking a cautious stance. However, rights holders are actively pursuing their interests and are unlikely to remain passive.

Although there is no direct evidence implicating OpenAI in the use of pirate sites for training ChatGPT, it is known that some AI projects have utilized pirated material in the past. Instances have been reported where AI models developed by Google and Facebook were trained on the C4 dataset, which included Z-Library and other pirate sites, as highlighted by a comprehensive summary from Search Engine Journal.

This lawsuit is expected to garner significant attention from both AI enthusiasts and rights holders. The outcome could potentially compel OpenAI to disclose aspects of its training data, which would be of great interest in its own right.

Even if it is established that ChatGPT was indeed trained using pirated books, the court would still need to determine whether such usage constitutes copyright infringement. Some experts argue that this type of AI training could fall under fair use.

Fair use protects transformative applications of copyrighted works that do not directly compete with the original content. Several experts believe this defense may apply to AI training scenarios.

The outcome of this lawsuit will undoubtedly shape the future landscape of AI and copyright law, carrying significant implications for both technology developers and content creators.

Authors file lawsuit against OpenAI alleging use of pirated content for ChatGPT training

Frequently Asked Questions (FAQs) Related to the Above News

Who has filed a lawsuit against OpenAI?

What are the authors claiming in their lawsuit?

Where do the authors believe OpenAI sourced the copyrighted content from?

What datasets did OpenAI reference in an older paper?

What damages are the plaintiffs seeking in the lawsuit?

Why is this lawsuit significant?

Has it been proven that OpenAI used pirate sites for training ChatGPT?

How might the outcome of this lawsuit impact the future of AI and copyright law?

Subscribe

How to Use Chat GPT: Step by Step Guide to Start Open AI ChatGPT

Fascinating Facts on ChatGPT

ChatGPT Global News Offers Comprehensive AI-Powered News Coverage

An Overview of ChatGPT

Meet the Experts Who Trained ChatGPT

More like this
Related

Obama’s Techno-Optimism Shifts as Democrats Navigate Changing Tech Landscape

Tech Evolution: From Obama’s Optimism to Harris’s Vision

Tonix Pharmaceuticals TNXP Shares Fall 14.61% After Q2 Earnings Report

The Future of Good Jobs: Why College Degrees are Essential through 2031

About us

Company

The latest

Obama’s Techno-Optimism Shifts as Democrats Navigate Changing Tech Landscape

Tech Evolution: From Obama’s Optimism to Harris’s Vision

Tonix Pharmaceuticals TNXP Shares Fall 14.61% After Q2 Earnings Report

Subscribe

Authors file lawsuit against OpenAI alleging use of pirated content for ChatGPT training

Frequently Asked Questions (FAQs) Related to the Above News

Who has filed a lawsuit against OpenAI?

What are the authors claiming in their lawsuit?

Where do the authors believe OpenAI sourced the copyrighted content from?

What datasets did OpenAI reference in an older paper?

What damages are the plaintiffs seeking in the lawsuit?

Why is this lawsuit significant?

Has it been proven that OpenAI used pirate sites for training ChatGPT?

How might the outcome of this lawsuit impact the future of AI and copyright law?

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

More like this
Related