Copyright Battles Erupt as AI Researchers Use Protected Material, OpenAI Responds
The world of artificial intelligence (AI) research has been shaken by copyright battles as companies like OpenAI, Microsoft, and Google commercialize generative AI. The use of copyrighted training material has come under fire, prompting regulators in the UK to ask for information regarding the issue. OpenAI recently responded to the UK’s Communications and Digital Select Committee, claiming that it is impossible to train large language models (LLMs) without using copyrighted material.
OpenAI’s popular consumer applications like ChatGPT and Dall-E are based on its GPT-3 model, which has been trained on billions of samples of writings, art, and photographs scraped from the internet. While some of the training material consists of protected works like books and websites, copyright law extends far beyond these traditional mediums.
According to OpenAI’s submission to the House of Lords, copyright today covers almost every form of human expression, including blog posts, photographs, software code, and government documents. This means that it would be impossible to train the leading AI models without utilizing copyrighted materials.
In the past, AI research was primarily academic, and training models using copyrighted material was considered fair use. However, as LLMs are entering the commercial realm, the fair use doctrine has become a gray area.
ChatGPT occasionally produces copyrighted snippets, which is a clear infringement that OpenAI is actively addressing. However, this issue is distinct from the issues arising when researchers train LLMs with protected material. The purpose of using these works, regardless of copyright status, is to teach the models about language structure and usage, enabling them to generate original content comprehensible to humans.
The lack of a legal definition of AI training within copyright law has led aggrieved parties to bring cases to courts. While companies like OpenAI and Microsoft argue that training falls under fair use, lawsuits have been filed against them to challenge this interpretation.
OpenAI firmly asserts that training AI models using publicly available internet materials is fair use, supported by long-standing and widely accepted precedents. The company believes this principle is fair to creators, essential for innovators, and critical for US competitiveness. Despite their stance, OpenAI provides an opt-out process for copyright holders who do not wish their materials to be used. The New York Times availed of this process but still filed a lawsuit against OpenAI.
Notably, OpenAI is also facing lawsuits from published authors, including well-known comedian Sarah Silverman. The complexity of these cases highlights the need for the US Patent and Trademark Office and lawmakers to clearly define the role of AI training in copyright rules.
To navigate this complex landscape, it is essential to strike a balance between protecting intellectual property and fostering innovation. As the AI field continues to evolve, policymakers must carefully consider the implications and establish guidelines that promote fair use while respecting copyright laws.
In conclusion, copyright battles surrounding AI training have intensified as leading AI organizations commercialize their generative models. OpenAI maintains that training AI models using copyrighted material falls under fair use, but legal challenges and the lack of clear definitions in copyright law persist. As the world grapples with the convergence of AI and copyright, striking a balance between innovation and intellectual property protection becomes crucial for the future of AI research and development.