A month after OpenAI introduced a program enabling users to create customized ChatGPT programs easily, a Northwestern University research team claims to have found a significant security vulnerability that potentially paved the way for leaked data.
According to Tech Xplore, in November, OpenAI unveiled the ability for ChatGPT subscribers to create custom GPTs effortlessly.
It highlighted the simplicity of the process, likening it to starting a conversation, providing instructions and additional knowledge, and selecting functionalities such as web searching, image creation, or data analysis. However, this approach is now under scrutiny due to potential security risks.
Jiahao Yu, a second-year doctoral student at Northwestern specializing in secure machine learning, acknowledged the positive aspects of OpenAI’s democratization of AI technology.
He praised the community of builders contributing to the expanding repository of specialized GPTs. Despite this, Yu expressed concerns about the security challenges arising from the instruction-following nature of these models.
In a study led by Yu and his colleagues, they uncovered a significant security vulnerability in custom GPTs. They found that malicious actors could exploit the vulnerability to extract system prompts and information from documents not meant for publication.
The research outlined two key security risks: one is the system prompt extraction, where GPTs could be manipulated into yielding prompt data, and the second is file leakage, potentially revealing confidential data behind customized GPTs.
Yu’s team tested over 200 GPTs for this vulnerability and reported a high success rate. Our success rate was 100% for file leakage and 97% for system prompt extraction, Yu stated, adding that these extractions were achievable without specialized knowledge or coding skills.
The study further notes that prompt injection attacks have become a growing concern with the rise of large language models. Colin Estep, a researcher at security firm Netskope, defined prompt injections as attacks involving crafting input prompts to manipulate the model’s behavior, generating biased, malicious, or undesirable outputs.
Prompt injection attacks can force language models to produce inaccurate information, generate biased content, and potentially expose personal data. In a 2022 study, Riley Goodside, an expert in large language models, demonstrated the ease of tricking GPT-3 with malicious prompts.
Yu concluded by expressing hope that the research would prompt the AI community to develop stronger safeguards, ensuring that security vulnerabilities do not compromise the potential benefits of custom GPTs. He stressed the need for a balanced approach, prioritizing innovation and security in AI technologies.
Our hope is that this research catalyzes the AI community towards developing stronger safeguards, ensuring that the innovative potential of custom GPTs is not undermined by security vulnerabilities, Yu noted.
A balanced approach that prioritizes both innovation and security will be crucial in the evolving landscape of AI technologies, he added.
The team’s findings were published in the arXiv.