A new study from the Allen Institute for Artificial Intelligence (AI2) has revealed that ChatGPT, the large language model (LLM) from OpenAI can inadvertently or maliciously be set to turn toxic by changing its assigned persona in the model’s system settings. According to the research, setting various personas in the model’s parameters can affect the output significantly, from the words and phrases used, to the topics expressed. The personas that were examined in the study included a diverse array of popular figures, including politicians, journalists, sportspersons and businesspersons, as well as a range of races, genders and sexual orientations.
Notably, the research shed light on how the AI model itself can develop an opinion of different personas, leading to varied levels of toxicity in response to topics. This means that even training data that is perceived as unbiased may still result in toxic behaviour if not moderated properly.
The study highlighted the potential risks of using AI models without proper oversight, as well as the potential of maliciously setting the model to produce toxic content. It also showed that the parameters necessary for assigning personas are available to anyone accessing OpenAI’s API allowing for widespread impact.
The report emphasized the importance of incorporating system settings when building chatbots and plugins with ChatGPT in order to appeal to a target audience, as well as to constrain the behaviour of the model. Companies such as Snap, Instacart, and Shopify which have already built chatbots and plugins on ChatGPT could be one of the many which would be at risk of exhibiting toxic behaviour if the model is not properly managed.
The Allen Institute for AI is a research institution dedicated to the study and advancement of AI, whose mission is to make AI more exploreable, reproducible and beneficent. It was founded in 2014 by Microsoft co-founder Paul Allen and works to drive cutting-edge research and science. The company’s ChatGPT report is the first large-scale toxicity analysis of the popular AI model, and the findings of this research could pave the way for other large language models in the future.