The popular AI chatbot, ChatGPT, has come under scrutiny as Google researchers reveal its ability to reveal personal information from real people. The machine learning model behind ChatGPT, which is similar to other Large Language Models (LLMs), was trained on vast amounts of internet data. Although the aim is for the program to generate new text without regurgitating the original content it learned, Google’s research shows that ChatGPT can be manipulated to disclose sensitive information. This concerning discovery includes extracting identifying details such as names, email addresses, and phone numbers from its training data.
The researchers conducted an attack on ChatGPT by exploiting keywords that prompt the chatbot to divulge training data. By repeatedly asking ChatGPT to repeat certain words like poem, the aim was to divert the chatbot from its chat functionality and revert it back to its primary language modeling objective. Consequently, while much of the generated text was nonsensical, the researchers found instances where ChatGPT copied outputs directly from its training data.
Upon testing, Motherboard determined that the poem attack on the GPT 3.5 AI model generated an unrelated string of text. However, it was not found existing elsewhere on the internet. In contrast, when GPT-4, another model available to subscribers, was asked to repeat the word poem infinitely, it refused.
The in-depth analysis conducted by Google researchers raised concerns about latent vulnerabilities in language models like ChatGPT, especially considering its broad user base. OpenAI reported that ChatGPT has approximately a hundred million weekly users, translating to billions of hours of interactions with the model. The fact that no one had noticed this issue until now is disquieting and highlights the potential impact of such vulnerabilities.
OpenAI has not yet provided a comment regarding the research findings.
In conclusion, the Google researchers’ discovery exposes the potential risks associated with AI chatbots like ChatGPT, particularly their ability to divulge personal information from their training data. The use of keywords to prompt the chatbot’s disclosure raises questions about the overall vulnerability of language models and the privacy of user interactions. The implications of this research call for further examination and mitigation of potential risks.