ChatGPT Trains on Copyrighted Books Such as Harry Potter: New Scientist Finds

Date:

Recent research has discovered that two language models — ChatGPT and its successor GPT-4 — appear to have memorised details from vast numbers of copyrighted books. This raises questions about the legality of how such large language models (LLMs) are created. Both of these artificial intelligences were developed by the company OpenAI and trained on huge amounts of data. However, it is unclear exactly which texts form the basis of this training data.

To answer this question, David Bamman from the University of California, Berkeley, and his colleagues tested if the AIs could reliably tell passages from copyrighted books apart from less-known books. The AI was tested on passages from famous books such as Harry Potter and Game of Thrones, as well as seemingly random less-known novels and poems. The test revealed that the model had memorised the contents of copyrighted books, indicating that it was trained on them.

This raises several issues — not least on the potential copyright infringement that has taken place. It implies that OpenAI used passages from these books without permission, making it difficult to see how they could remain compliant with copyright regulations. What’s more, this discovery could set a dangerous precedent, with LLMs trained on vast amounts of copyright-protected material becoming more common.

OpenAI, which is based in San Francisco in the United States, was co-founded by entrepreneur Elon Musk. The company is dedicated to advancing artificial intelligence technologies. OpenAI’s mission is to ensure that artificial intelligence benefits all of humanity. Co-founder and current CEO, Sam Altman, is a highly experienced leader in the fields of technology and innovation.

See also  Jack Dorsey, CEO of Block, to Invest Aggressively in AI and ChatGPT

David Bamman is a computer scientist from UC Berkeley whose work focuses on natural language processing and the use of artificial intelligence in digital humanities, publishing and media. He has published numerous research papers and books on artificial intelligence and natural language processing.

Frequently Asked Questions (FAQs) Related to the Above News

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Share post:

Subscribe

Popular

More like this
Related

Obama’s Techno-Optimism Shifts as Democrats Navigate Changing Tech Landscape

Explore the evolution of tech policy from Obama's optimism to Harris's vision at the Democratic National Convention. What's next for Democrats in tech?

Tech Evolution: From Obama’s Optimism to Harris’s Vision

Explore the evolution of tech policy from Obama's optimism to Harris's vision at the Democratic National Convention. What's next for Democrats in tech?

Tonix Pharmaceuticals TNXP Shares Fall 14.61% After Q2 Earnings Report

Tonix Pharmaceuticals TNXP shares decline 14.61% post-Q2 earnings report. Evaluate investment strategy based on company updates and market dynamics.

The Future of Good Jobs: Why College Degrees are Essential through 2031

Discover the future of good jobs through 2031 and why college degrees are essential. Learn more about job projections and AI's influence.