OpenAI’s appetite for data has become a liability for the company, with several Western data protection authorities initiating investigations of their methods for collecting data for their AI models. Leading the charge is the Italian data protection authority, which has blocked the use of ChatGPT, the AI chatbot from OpenAI, as a precautionary measure. Many other countries, such as Canada, France, Germany, and Ireland, have also started to conduct investigations into how OpenAI collects, processes, and uses data powering their bot.
The sheer size of the data needed to train OpenAI’s AI models has been a key factor in attracting the attention of the data protection authorities. For example, OpenAI’s GPT-2 model had a data set consisting of 40 gigabytes of text, and GPT-3, ChatGPT’s predecessor, was trained on 570 GB of data, though the precise size of GPT-4’s data set remains unknown. As AI development often depends on having vast amounts of data to be effective, this heavy reliance on data has become one of OpenAI’s core strengths, but this strength has become a potential liability in this instance.
Data protection authorities are concerned that OpenAI has scraped user data — such as names and emails — and used it without people’s consent. The European Data Protection Board is setting up a task force to coordinate any investigations and to enforce any enforcement actions against OpenAI. This task force has tasked OpenAI to be compliant with the law by April 30 — this means they will have to acquire permission to use data scraped from people, or prove they have a “legitimate interest” in collecting it. OpenAI may have to explain how ChatGPT is using users’ data, allow them to correct any mistakes about them, allow them to erase data, and even stop using data against objection. If OpenAI does not gain compliance to the law by April 30, they could be banned from certain countries or the entire European Union, face heavy fines, and be forced to delete models and associated data.
OpenAI is an American artificial intelligence research laboratory with Sam Altman as its CEO. OpenAI’s mission is to ensure that artificial general intelligence (AGI) benefits all of humanity. Its members come from some of the world’s leading organizations including Microsoft, Northrop Grumman, NVIDIA, Google, and Goldman Sachs.
Alexis Leautier is an AI expert at the French regulatory body CNIL. He is experienced in more than two decades in AI and advice on how corporations can comply with data protection law. He is also a key figure in the investigation of OpenAI and its practices, as part of the European Data Protection Board.