OpenAI and Meta involved in AI copyright complaints

Date:

OpenAI and Meta, two prominent companies in the field of artificial intelligence (AI), are facing copyright infringement complaints related to their language models. The complaints allege that OpenAI’s ChatGPT and Meta’s LLaMA have used copyrighted material without the consent, credit, or compensation of the authors.

The issue at hand revolves around the legal status of the data used to train these large language models. Both OpenAI and Meta have utilized publicly available data from the internet in their model training processes.

The complaints, filed in the Northern District of California US District Court under the case numbers 3:23-cv-03417 and 3:23-cv-03417, are brought forward by several plaintiffs, including renowned actor and author Sarah Silverman. They claim that their copyrighted works were used as training material for the LLaMA and ChatGPT models.

According to the complaint against OpenAI, a significant portion of the training datasets used by the company consists of copyrighted works, including books written by the plaintiffs. The complaint alleges that OpenAI copied these copyrighted works without obtaining consent, providing credit, or offering compensation. It further states that the generated summaries produced by ChatGPT can only be possible if the model was trained on copyrighted works authored by the plaintiffs.

The complaint against Meta shares similarities, but explicitly mentions the existence of ‘shadow libraries’ accessible through torrent systems. In Meta’s LLaMA paper, it is stated that the model was trained on Project Gutenberg, an online repository of books that have entered the public domain. Additionally, the Books3 section of The Pile was used. However, the complainants take issue with the source of the Books3 dataset, which was derived from a shadow library website called Bibliotik. This website contains copyrighted material, and it is this aspect that has sparked the complaints against Meta.

See also  OpenAI's ChatGPT Business Experiment Goes Bust

ITPro reached out to both Meta and OpenAI for comments, but as of now, neither organization has responded.

OpenAI has faced legal scrutiny before due to concerns over the content used in its training models, and similar cases are already progressing through the court system.

Businesses utilizing generative AI tools with models trained on publicly available content face two primary issues. Firstly, there is the risk that the output generated by these tools, such as ChatGPT, may contain falsehoods or infringe upon intellectual property rights. The latter concern forms the basis of the complaints filed against OpenAI and Meta, as businesses fear the unintentional utilization of illegally acquired material by their employees.

The second issue involves the potential risk of employees inputting confidential information into generative AI systems without realizing that it becomes part of the training dataset for a language model and may resurface elsewhere.

These concerns have led some workplaces to ban certain generative AI tools entirely or impose restrictions on their usage.

To mitigate these risks, some businesses opt for a closed approach by using only internal datasets for training generative AI models. Open-source alternatives exist, and established vendors like Oracle have ventured into this space by allowing customers to train specific models using their own data, theoretically avoiding copyright challenges faced by OpenAI and Meta.

Frequently Asked Questions (FAQs) Related to the Above News

What are OpenAI and Meta facing copyright infringement complaints for?

OpenAI and Meta are facing copyright infringement complaints related to their language models, specifically OpenAI's ChatGPT and Meta's LLaMA. It is alleged that these models have used copyrighted material without the consent, credit, or compensation of the authors.

What is the main issue surrounding the complaints against OpenAI and Meta?

The main issue revolves around the legal status of the data used to train the language models. Both OpenAI and Meta have utilized publicly available data from the internet, but the complaints claim that copyrighted material was included without proper authorization.

Who filed the copyright complaints against OpenAI and Meta?

The copyright complaints were brought forward by several plaintiffs, including renowned actor and author Sarah Silverman. They claim that their copyrighted works were used as training material for the ChatGPT and LLaMA models.

What does the complaint against OpenAI state?

The complaint against OpenAI alleges that a significant portion of the training datasets used by the company consists of copyrighted works, including books written by the plaintiffs. It claims that OpenAI copied these copyrighted works without obtaining consent, providing credit, or offering compensation.

What does the complaint against Meta highlight?

The complaint against Meta mentions the existence of 'shadow libraries' accessible through torrent systems. While Meta's LLaMA paper states that the model was trained on Project Gutenberg and the Books3 section of The Pile, the complainants take issue with the source of the Books3 dataset, which was derived from a shadow library website called Bibliotik that contains copyrighted material.

Have Meta and OpenAI responded to the copyright complaints?

As of now, neither Meta nor OpenAI has responded to the copyright complaints or provided any official comments on the matter.

What concerns do businesses have regarding generative AI tools?

Businesses using generative AI tools have two primary concerns. Firstly, there is a risk that the output generated by these tools may contain falsehoods or infringe upon intellectual property rights. The second concern involves the potential risk of employees inadvertently inputting confidential information into generative AI systems, which becomes part of the training dataset and may resurface elsewhere.

How have some workplaces addressed the risks associated with generative AI tools?

Some workplaces have chosen to entirely ban certain generative AI tools or impose restrictions on their usage. Additionally, businesses may opt for a closed approach by using only internal datasets for training generative AI models. Open-source alternatives and vendors like Oracle, who allow customers to train specific models using their own data, exist to mitigate copyright challenges faced by OpenAI and Meta.

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Aryan Sharma
Aryan Sharma
Aryan is our dedicated writer and manager for the OpenAI category. With a deep passion for artificial intelligence and its transformative potential, Aryan brings a wealth of knowledge and insights to his articles. With a knack for breaking down complex concepts into easily digestible content, he keeps our readers informed and engaged.

Share post:

Subscribe

Popular

More like this
Related

Global Data Center Market Projected to Reach $430 Billion by 2028

Global data center market to hit $430 billion by 2028, driven by surging demand for data solutions and tech innovations.

Legal Showdown: OpenAI and GitHub Escape Claims in AI Code Debate

OpenAI and GitHub avoid copyright claims in AI code debate, showcasing the importance of compliance in tech innovation.

Cloudflare Introduces Anti-Crawler Tool to Safeguard Websites from AI Bots

Protect your website from AI bots with Cloudflare's new anti-crawler tool. Safeguard your content and prevent revenue loss.

Paytm Founder Praises Indian Government’s Support for Startup Growth

Paytm founder praises Indian government for fostering startup growth under PM Modi's leadership. Learn how initiatives are driving innovation.