Shadow libraries driving mounting copyright lawsuits against OpenAI.

Title: Shadow Libraries Stir Copyright Lawsuits Against OpenAI over ChatGPT’s Training Material

Artificial intelligence technology company OpenAI is facing copyright infringement lawsuits related to its AI chatbot, ChatGPT, filed by three writers, including Sarah Silverman. The plaintiffs claim that their copyrighted books were used without their consent as training material for ChatGPT, alleging that the texts were ingested by the AI bot during its training process.

In order to produce human-like responses, AI bots are trained on extensive datasets sourced from various internet materials. OpenAI, however, remains secretive about the specific source texts used to train its models, citing safety reasons and competition in the industry. Among the dataset components, books play a crucial role as they offer lengthy examples of high-quality writing. However, the lawsuit filed by Silverman suggests that much of the book data used in training ChatGPT is sourced from illegal shadow libraries that contain the works of these writers.

OpenAI has disclosed that approximately 15% of the training set for GPT-3, the current language model employed by the free version of ChatGPT, consists of two internet-based book collections referred to as Books1 and Books2, as mentioned in the lawsuit. Clues suggest that Books1 is linked to Project Gutenberg, an online e-book library featuring over 60,000 titles and commonly used by AI researchers due to the absence of copyright restrictions. On the other hand, Books2 likely encompasses about 294,000 titles.

Most of the internet-based books corpora used in training ChatGPT is presumed to originate from shadow library websites such as Library Genesis, Z-Library, Sci-Hub, and Bibliotik. These platforms aggregate books that are out of print, difficult to access, and often behind paywalls. Originating in Russia, these shadow libraries gained popularity among financially constrained researchers seeking affordable access to scholarly journals that were prohibitively expensive, with individual articles supposedly priced at up to $500.

Shadow libraries have drawn the label of pirate libraries due to their involvement in copyright infringement and the potential negative impact they have on the publishing industry’s revenue stream. According to a 2017 study conducted by Nielsen and Digimarc, pirated books can depress legitimate book sales by up to 14%.

To combat shadow libraries, various governments worldwide have taken actions such as seizing websites associated with these platforms. For example, the FBI seized several websites linked to Z-Library and charged two Russian nationals with criminal copyright infringement, wire fraud, and money laundering. Despite these efforts, shadow library websites have been able to create mirror sites after an initial takedown by the US government, as reported by Vice. Additionally, courts in France and India have ordered internet service providers to block Z-Library.

The lawsuit filed by Sarah Silverman against OpenAI concerning ChatGPT’s training material is not an isolated incident. Similar copyright infringement lawsuits have been brought against other generative AI companies as well. For instance, visual artists filed a lawsuit against Stability AI, Midjourney, and DeviantArt earlier this year. Furthermore, GitHub programmers initiated a class-action lawsuit against GitHub, its parent company Microsoft Corp., and OpenAI in November, alleging that GitHub Copilot relies on widespread open-source software piracy.

In response to the mounting lawsuits, Pau Garcia, the founder of art consulting firm Domestic Data Streamers, suggested that AI companies should either shift their training models to exclusively use material in the public domain or obtain explicit permission from artists to use their content as training data, with artists being compensated accordingly.

Some companies are also exploring the possibility of granting artists control over the content AI models can be trained on. For example, music streaming platform Audius recently introduced a feature allowing artists to create a dedicated page for their work that anyone can use for generating AI-based tracks.

As OpenAI faces legal battles concerning the use of copyrighted material in training its AI models, discussions around fair use, copyright permissions, and artistic control are entering center stage in the rapidly evolving field of artificial intelligence.

Shadow libraries driving mounting copyright lawsuits against OpenAI.

Frequently Asked Questions (FAQs) Related to the Above News

What is OpenAI facing lawsuits for?

How are AI bots trained?

Why is OpenAI secretive about the specific source texts used to train their models?

What role do books play in training AI models?

Which illegal libraries are alleged to be the source of copyrighted materials?

What actions have governments taken against shadow libraries?

What are some concerns regarding shadow libraries?

Are there other similar lawsuits against generative AI companies?

What suggestions have been made to address the use of copyrighted material in AI training?

How are some companies exploring artist control over AI-generated content?

What broader discussions are arising in the field of artificial intelligence as a result of these lawsuits?

Subscribe

How to Use Chat GPT: Step by Step Guide to Start Open AI ChatGPT

Fascinating Facts on ChatGPT

ChatGPT Global News Offers Comprehensive AI-Powered News Coverage

An Overview of ChatGPT

Meet the Experts Who Trained ChatGPT

More like this
Related

Obama’s Techno-Optimism Shifts as Democrats Navigate Changing Tech Landscape

Tech Evolution: From Obama’s Optimism to Harris’s Vision

Tonix Pharmaceuticals TNXP Shares Fall 14.61% After Q2 Earnings Report

The Future of Good Jobs: Why College Degrees are Essential through 2031

About us

Company

The latest

Obama’s Techno-Optimism Shifts as Democrats Navigate Changing Tech Landscape

Tech Evolution: From Obama’s Optimism to Harris’s Vision

Tonix Pharmaceuticals TNXP Shares Fall 14.61% After Q2 Earnings Report

Subscribe

Shadow libraries driving mounting copyright lawsuits against OpenAI.

Frequently Asked Questions (FAQs) Related to the Above News

What is OpenAI facing lawsuits for?

How are AI bots trained?

Why is OpenAI secretive about the specific source texts used to train their models?

What role do books play in training AI models?

Which illegal libraries are alleged to be the source of copyrighted materials?

What actions have governments taken against shadow libraries?

What are some concerns regarding shadow libraries?

Are there other similar lawsuits against generative AI companies?

What suggestions have been made to address the use of copyrighted material in AI training?

How are some companies exploring artist control over AI-generated content?

What broader discussions are arising in the field of artificial intelligence as a result of these lawsuits?

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

More like this
Related