OpenAI Disputes New York Times Lawsuit, Defending Use of Publicly Available Data
OpenAI, a leading artificial intelligence (AI) startup, has responded to a lawsuit filed by The New York Times and its collaborator Microsoft, defending their use of publicly available data. The Times accused OpenAI of violating copyright law by training their generative AI models on the newspaper’s content. In their public response, OpenAI claims that the case lacks merit.
OpenAI asserts that training AI models using publicly available data from the internet, including news articles like those from The New York Times, falls under fair use. The company argues that in the process of creating AI systems like GPT-4 and DALL-E 3, which generate human-like text and images by learning from vast amounts of examples, licensing or payment for the examples is not required.
Addressing the concern of regurgitation, where AI models replicate training data verbatim or near-verbatim, OpenAI states that this is less likely to occur with training data from a single source like The New York Times. They emphasize that users of their models have the responsibility to act ethically and avoid deliberately prompting regurgitation, which is against OpenAI’s terms of use.
OpenAI’s response has come amidst a heated copyright debate surrounding generative AI. AI critic Gary Marcus and visual effects artist Reid Southen recently demonstrated how AI systems, including DALL-E 3, regurgitate data even without specific prompts, casting doubt on OpenAI’s claims. Marcus and Southen referred to The New York Times lawsuit in their discussion, noting that the newspaper was able to elicit plagiaristic responses from OpenAI’s models simply by providing the initial words from a Times story.
The legal action taken by The New York Times is the latest in a string of copyright infringement lawsuits against OpenAI. Actress Sarah Silverman joined two lawsuits in July, accusing meta and OpenAI of using her memoir without permission for training AI models. Additionally, thousands of novelists, including Jonathan Franzen and John Grisham, claim that OpenAI utilized their work as training data without authorization or knowledge. Furthermore, Microsoft, OpenAI, and GitHub face legal action from several programmers over their AI-powered code-generating tool, Copilot, which the plaintiffs allege was developed using their protected code.
As the copyright debate surrounding generative AI intensifies, OpenAI maintains its stance that utilizing publicly available data for training AI models is fair use. The outcome of these lawsuits will likely have significant implications for the future of AI development and the boundaries of copyright law.
In conclusion, OpenAI’s response to The New York Times’ lawsuit defends their use of publicly available data, emphasizing fair use and user responsibility. The ongoing copyright debate continues to shape the landscape of generative AI, highlighting the need for clearer guidelines and regulations in the evolving field.