Last month, a Ph.D. candidate at Indiana University Bloomington named Rui Zhu sent me an email that left me alarmed. He explained that he had obtained my email address from GPT-3.5 Turbo, a powerful language model developed by OpenAI. Alongside a team of researchers, Zhu had successfully extracted a list of email addresses belonging to over 30 New York Times employees from the model. The experiment they conducted raises concerns about the potential for AI tools like ChatGPT to reveal sensitive personal information with just a few adjustments.
Unlike traditional search engines, ChatGPT doesn’t simply scour the web for information. Instead, it draws on a vast amount of training data to generate responses. This training data may contain personal information obtained from various sources. While the model is not designed to recall this information verbatim, recent findings suggest that these language models, much like human memory, can be jogged.
To extract the email addresses of New York Times employees, the research team provided GPT-3.5 Turbo with a short list of verified names and email addresses. The model then returned similar results from its training data. Although the model’s recall wasn’t perfect and occasionally produced false information, it successfully provided accurate work email addresses 80% of the time.
Companies like OpenAI have implemented safeguards to prevent users from requesting personal information, but researchers have found ways to bypass these measures. In the case of Zhu and his colleagues, they utilized the model’s fine-tuning feature through its application programming interface (API). Fine-tuning allows users to provide an AI model with specific knowledge in a chosen area. While this feature is intended to enhance the model’s performance, it can also undermine certain defenses, enabling the extraction of sensitive information that would typically be denied in a standard interface.
OpenAI asserts that it endeavors to ensure the safety of fine-tuned models. According to an OpenAI spokesperson, the models are trained to reject requests for private or sensitive information, even if that information is publicly available. However, the vulnerability exposed by Zhu and his team raises concerns as to what information lies within ChatGPT’s training-data memory. OpenAI’s secrecy regarding the information it utilizes leaves room for uncertainty and potential privacy risks.
The problem isn’t limited to OpenAI; the lack of strong privacy defenses is a common issue among commercially available large language models. Dr. Prateek Mittal, a professor at Princeton University, emphasizes the risks associated with these models retaining sensitive information, comparing it to the concern of biased or toxic content that may be inadvertently learned during training.
Language models like GPT-3.5 Turbo continuously learn when exposed to new data. OpenAI’s fine-tuned models, including GPT-3.5 Turbo and GPT-4, are among the most powerful publicly available models. The company sources natural language texts from numerous public sources, including websites, and utilizes various datasets for training. The Enron email dataset, featuring half a million emails from the early 2000s investigation into the corporation, is commonly used by AI developers due to the diverse examples of human communication it provides.
OpenAI released the fine-tuning interface for GPT-3.5 last year, which included the Enron dataset. Zhu and his team were able to extract over 5,000 pairs of Enron names and email addresses, demonstrating the model’s ability to recall this sensitive information with an accuracy rate of around 70% using just 10 known pairs.
The challenge of safeguarding personal information in commercial language models remains significant. Dr. Mittal stresses the need for accountability and transparency to address privacy concerns adequately. As these models continue to evolve and play a larger role in various applications, it is crucial to prioritize the protection of personal data and ensure that models are designed with robust privacy measures.
In conclusion, the experiment conducted by the research team at Indiana University serves as a warning about the potential privacy risks associated with AI language models like ChatGPT. The ease with which sensitive information could be obtained raises concerns for individuals and organizations alike. As the development and utilization of these models progress, it becomes imperative for AI companies and researchers to prioritize privacy protection to mitigate potential risks and safeguard user data.